Standard and idiosyncratic schemata for Sanskrit data, with a library of validation, (de-)serialization, a database interface and other utilities.
Project description
Alert: PIP documentation could be badly formatted due to this pypan bug , please see https://github.com/sanskrit-coders/sanskrit_data/blob/master/README.md in the meantime.
Introduction
This module defines:
schema
shared standard schema for communicating and storing Sanskrit data of various types.
various idiosyncratic notations used by various modules which deviate from the proposed standards.
python classes (corresponding to the schema) and shared libraries for validating, (de-)serializing and storing sanskrit data of various types.
a common database interface for accessing various databases (so that a downstream app can switch to a different database with a single line change).
Similar libraries in various other languages are being built:
Scala (likely compatible with Java): db-interface .
Motivation
Various sanskrit modules need to communicate data amongst each other (for example through a REST API or database stores or even function calls). Examples of the data being communicated could be:
Gramatical details of a given word
Sentences in a given book chapter
Annotations on a given phrase
When it comes to serialization formats - two distinct approaches present themselves to us:
One possible route is to have each project defining and using its own idiosyncratic notation. But this entails an additional burdens:
Each communicating module having to convert the data from one idiosyncratic notation to another.
Good schema design or notation is non trivial. Even if no external module is using the data, it is a waste to have to reinvent the wheel.
A superior route is to have a common, standard format for encoding various data-types for storage/ communication.
To the extant possible, we should take latter approach to data storage and communication.
Where idiosyncratic notations are adapted for various reasons, it is still desirable to collect such definitions in a single module - to facilitate conversion to the standard format.
For users
Installation
Install this library
Latest release: sudo pip2 install sanskrit_data -U
Development copy: sudo pip2 install git+https://github.com/sanskrit-coders/sanskrit_data@master -U
Web.
Install libraries for the particular database you want to access through the sanskrit_data.db interface (as needed): pymongo, cloudant (for couchdb).
Usage
Please see the generated python sphynx docs in one of the following places:
under docs/_build/html/index.html
Design considerations for data containers corresponding to the various submodules (such as books and annotations) are given below - or in the corresponding source files.
For contributors
Contact
Have a problem or question? Please head to github.
Packaging
~/.pypirc should have your pypi login credentials.
python setup.py bdist_wheel twine upload dist/* --skip-existing
Document generation
Sphynx html docs can be generated with cd docs; make html
http://sanskrit-data.readthedocs.io/en/latest/sanskrit_data.html should automatically have good updated documentation - unless there are build errors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for sanskrit_data-0.3.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0429d84a2f0ce80baa36eca5285020494112b457bda45fb3d7cdf92fc51323ba |
|
MD5 | 8775b14fc820007d5f9fd725ebc4aea8 |
|
BLAKE2b-256 | 905601b9a5adf53411280556340bf8ee49521979a7564f0d7aabe0cf25b8d837 |