Standard and idiosyncratic schemata for text, annotation and user data, with a library of validation, (de-)serialization, a database interface and other utilities.
This module defines:
- shared standard schema for communicating and storing data (with a particular focus on sanskrit texts) of various types.
- various idiosyncratic notations used by various modules which deviate from the proposed standards.
- python classes (corresponding to the schema) and shared libraries for validating, (de-)serializing and storing sanskrit data of various types.
- a common database interface for accessing various databases (so that a downstream app can switch to a different database with a single line change).
While this package was originally motivated by Sanskrit text annotation needs, it is more generally useful.
Similar libraries in various other programming languages are being built:
- Scala (likely compatible with Java): db-interface .
- Various sanskrit modules need to communicate data amongst each other
(for example through a REST API or database stores or even function
calls). Examples of the data being communicated could be:
- Gramatical details of a given word
- Sentences in a given book chapter
- Annotations on a given phrase
- When it comes to serialization formats - two distinct approaches
present themselves to us:
- One possible route is to have each project defining and using its
own idiosyncratic notation. But this entails an additional
- Each communicating module having to convert the data from one idiosyncratic notation to another.
- Good schema design or notation is non trivial. Even if no external module is using the data, it is a waste to have to reinvent the wheel.
- A superior route is to have a common, standard format for encoding various data-types for storage/ communication.
- One possible route is to have each project defining and using its own idiosyncratic notation. But this entails an additional burdens:
- To the extant possible, we should take latter approach to data storage and communication.
- Where idiosyncratic notations are adapted for various reasons, it is still desirable to collect such definitions in a single module - to facilitate conversion to the standard format.
- Install this library (Replace pip2 with pip3 as needed)
- Latest release: sudo pip3 install sanskrit_data -U
- Development copy: sudo pip3 install git+https://github.com/vedavaapi/sanskrit_data@master -U
- Install libraries for the particular database you want to access through the sanskrit_data.db interface (as needed): pymongo, cloudant (for couchdb).
- Please see the generated python sphynx docs in one of the following places:
- Design considerations for data containers corresponding to the various submodules (such as books and annotations) are given below - or in the corresponding source files.
Have a problem or question? Please head to github.
~/.pypirc should have your pypi login credentials.
python3 setup.py bdist_wheel twine upload dist/* --skip-existing
- Sphynx html docs can be generated with cd docs; make html
- Ignore warnings like: missing attribute mentioned in :members: or __all__: module (explanation).
- http://sanskrit-data.readthedocs.io/en/latest/sanskrit_data.html should automatically have good updated documentation - unless there are build errors.
- To update UML diagrams, copy the outputs of the below to docs:
- pyreverse -ASmy -k -o png sanskrit_data.schema -p sanskrit_data_schema
- pyreverse -ASmy -k -o png sanskrit_data.db -p sanskrit_data_db