A Universal hub-and-spoke morphological representation converter for Sanskrit.
Project description
Sanskrit Morph Converter
A Python engine for unifying, standardizing, and converting Sanskrit morphological tags across multiple computational paradigms.
In Sanskrit Computational Linguistics, different tools like the Sanskrit Heritage engine, Samsaadhanii, neural models like ByT5, and baseline grammars like Svarupa output morphological analyses in vastly different formats and vocabularies. sanskrit-morph-converter provides a centralized, pivot-based architecture to translate these tagsets into a unified Canonical Representation.
Installation
Install the package directly from PyPI:
pip install sanskrit-morph-converter
Python API Usage
You can import the converter directly into your Python scripts to process strings or JSON outputs from various platforms. The core .convert() method takes a source platform, a target platform, and the raw input.
from sanskrit_morph_converter.converter import RepresentationConverter
# Initialize the converter (automatically loads the compiled mapping TSVs)
converter = RepresentationConverter()
Example 1: Converting ByT5 Output to Canonical
ByT5 outputs rely on underscore and pipe-separated strings. The converter easily parses these into standard Canonical properties.
byt5_raw = "devam_deva_Case=Acc|Gender=Masc|Number=Sing"
# Convert ByT5 to Canonical
canonical_tags = converter.convert('ByT5', 'Canonical', byt5_raw)
print(canonical_tags)
# Output: [{'input': 'देवम्', 'stem': 'देव', 'root': '', 'morph': 'Case=Accusative|Gender=Masculine|Number=Singular'}]
Example 2: Converting Sanskrit Heritage (SH) to DCS
The Sanskrit Heritage engine returns nested JSON dictionaries. You can pass the JSON string directly to convert it to another format, such as DCS.
sh_raw = """{
"input": "गच्छति",
"status": "Success",
"morph": [{"word": "गच्छति", "root": "गम्", "inflectional_morphs": ["pr. [1] ac. sg. 3"]}]
}"""
# Convert SH to DCS
dcs_tags = converter.convert('SH', 'DCS', sh_raw, output_format='string')
print(dcs_tags)
# Output (Example): ['gacchati\tgam\tMood=Ind|Number=Sing|Person=3|Tense=Pres']
Command Line Interface (CLI)
The package includes a built-in CLI for batch processing files or testing quick strings directly from your terminal.
Convert a single string:
smc convert ByT5 Canonical -i "devam_deva_Case=Acc|Gender=Masc|Number=Sing"
Process an entire file and save the output:
smc convert SH Canonical -f data/sh_analysis.tsv -o data/canonical_results.tsv
Change the output script (e.g., to WX or IAST):
smc convert ByT5 SH -i "devam_deva_Case=Acc|Gender=Masc|Number=Sing" --script WX
Architecture
This library operates on a flexible, three-stage pipeline: Adapters (to read the source format), a Mapper (to route to a mathematical Pivot), and an Converter (to format the target platform output).
The Google Sheets Integration
To ensure this tool remains accessible to linguists and researchers who may not write code, the mapping vocabulary is not hardcoded. Instead, tag standardizations and lexical exceptions (like pronouns and causatives) are maintained collaboratively in a Master Google Sheet.
When linguistic rules are updated in the sheet, you can use the built-in compiler to fetch the latest data and rebuild the internal .tsv files (pivot_mapping.tsv, normalization.tsv, etc.) without altering the Python engine.
To fetch the latest mappings from the Google Sheet:
sanskrit-morph update
(Note: The pre-compiled .tsv files are already bundled with the PyPI package, so standard users do not need to run the compiler to use the tool).
📜 License
This project is licensed under the GNU GENERAL PUBLIC LICENSE v3 - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sanskrit_morph_converter-0.2.0.tar.gz.
File metadata
- Download URL: sanskrit_morph_converter-0.2.0.tar.gz
- Upload date:
- Size: 55.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2ef65d2ad904a606d0b0ccb5e60c77f892f933253e4241b9d4b0ebb5ff31989
|
|
| MD5 |
2cde47392dcf1992ef452f377f193070
|
|
| BLAKE2b-256 |
b3f3490e2d2e6aa60509e1f54dfd699ce7567f5a4e3ddb4494296bd860b84188
|
Provenance
The following attestation bundles were made for sanskrit_morph_converter-0.2.0.tar.gz:
Publisher:
publish.yml on SriramKrishnan8/sanskrit-morph-converter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sanskrit_morph_converter-0.2.0.tar.gz -
Subject digest:
c2ef65d2ad904a606d0b0ccb5e60c77f892f933253e4241b9d4b0ebb5ff31989 - Sigstore transparency entry: 1415408633
- Sigstore integration time:
-
Permalink:
SriramKrishnan8/sanskrit-morph-converter@4b0274752745d0b682214c58587b48b1fa7caf8a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/SriramKrishnan8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4b0274752745d0b682214c58587b48b1fa7caf8a -
Trigger Event:
release
-
Statement type:
File details
Details for the file sanskrit_morph_converter-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sanskrit_morph_converter-0.2.0-py3-none-any.whl
- Upload date:
- Size: 54.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d83b6c203ac5dc64a540aed7588101a6b8ddf831cb71338ef1014a08ab63d670
|
|
| MD5 |
29f1db6537509b146bb25a6d5de22186
|
|
| BLAKE2b-256 |
3cad725045ffaabb95d36372bcfce6157b130a245700929ec9770190f2007608
|
Provenance
The following attestation bundles were made for sanskrit_morph_converter-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on SriramKrishnan8/sanskrit-morph-converter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sanskrit_morph_converter-0.2.0-py3-none-any.whl -
Subject digest:
d83b6c203ac5dc64a540aed7588101a6b8ddf831cb71338ef1014a08ab63d670 - Sigstore transparency entry: 1415408720
- Sigstore integration time:
-
Permalink:
SriramKrishnan8/sanskrit-morph-converter@4b0274752745d0b682214c58587b48b1fa7caf8a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/SriramKrishnan8
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4b0274752745d0b682214c58587b48b1fa7caf8a -
Trigger Event:
release
-
Statement type: