Skip to main content

A Universal hub-and-spoke morphological representation converter for Sanskrit.

Project description

Sanskrit Morph Converter

PyPI version License: GPL v3 Python 3.8+

A Python engine for unifying, standardizing, and converting Sanskrit morphological tags across multiple computational paradigms.

In Sanskrit Computational Linguistics, different tools like the Sanskrit Heritage engine, Samsaadhanii, neural models like ByT5, and baseline grammars like Svarupa output morphological analyses in vastly different formats and vocabularies. sanskrit-morph-converter provides a centralized, pivot-based architecture to translate these tagsets into a unified Canonical Representation.

Installation

Install the package directly from PyPI:

pip install sanskrit-morph-converter

Python API Usage

You can import the converter directly into your Python scripts to process strings or JSON outputs from various platforms. The core .convert() method takes a source platform, a target platform, and the raw input.

from sanskrit_morph_converter.converter import RepresentationConverter

# Initialize the converter (automatically loads the compiled mapping TSVs)
converter = RepresentationConverter()

Example 1: Converting ByT5 Output to Canonical

ByT5 outputs rely on underscore and pipe-separated strings. The converter easily parses these into standard Canonical properties.

byt5_raw = "devam_deva_Case=Acc|Gender=Masc|Number=Sing"

# Convert ByT5 to Canonical
canonical_tags = converter.convert('ByT5', 'Canonical', byt5_raw)
print(canonical_tags)
# Output: [{'input': 'देवम्', 'stem': 'देव', 'root': '', 'morph': 'Case=Accusative|Gender=Masculine|Number=Singular'}]

Example 2: Converting Sanskrit Heritage (SH) to DCS

The Sanskrit Heritage engine returns nested JSON dictionaries. You can pass the JSON string directly to convert it to another format, such as DCS.

sh_raw = """{
    "input": "गच्छति", 
    "status": "Success", 
    "morph": [{"word": "गच्छति", "root": "गम्", "inflectional_morphs": ["pr. [1] ac. sg. 3"]}]
}"""

# Convert SH to DCS
dcs_tags = converter.convert('SH', 'DCS', sh_raw, output_format='string')
print(dcs_tags)
# Output (Example): ['gacchati\tgam\tMood=Ind|Number=Sing|Person=3|Tense=Pres']

Command Line Interface (CLI)

The package includes a built-in CLI for batch processing files or testing quick strings directly from your terminal.

Convert a single string:

smc convert ByT5 Canonical -i "devam_deva_Case=Acc|Gender=Masc|Number=Sing"

Process an entire file and save the output:

smc convert SH Canonical -f data/sh_analysis.tsv -o data/canonical_results.tsv

Change the output script (e.g., to WX or IAST):

smc convert ByT5 SH -i "devam_deva_Case=Acc|Gender=Masc|Number=Sing" --script WX

Architecture

This library operates on a flexible, three-stage pipeline: Adapters (to read the source format), a Mapper (to route to a mathematical Pivot), and an Converter (to format the target platform output).

The Google Sheets Integration

To ensure this tool remains accessible to linguists and researchers who may not write code, the mapping vocabulary is not hardcoded. Instead, tag standardizations and lexical exceptions (like pronouns and causatives) are maintained collaboratively in a Master Google Sheet.

When linguistic rules are updated in the sheet, you can use the built-in compiler to fetch the latest data and rebuild the internal .tsv files (pivot_mapping.tsv, normalization.tsv, etc.) without altering the Python engine.

To fetch the latest mappings from the Google Sheet:

sanskrit-morph update

(Note: The pre-compiled .tsv files are already bundled with the PyPI package, so standard users do not need to run the compiler to use the tool).

📜 License

This project is licensed under the GNU GENERAL PUBLIC LICENSE v3 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanskrit_morph_converter-0.2.0.tar.gz (55.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sanskrit_morph_converter-0.2.0-py3-none-any.whl (54.9 kB view details)

Uploaded Python 3

File details

Details for the file sanskrit_morph_converter-0.2.0.tar.gz.

File metadata

  • Download URL: sanskrit_morph_converter-0.2.0.tar.gz
  • Upload date:
  • Size: 55.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sanskrit_morph_converter-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c2ef65d2ad904a606d0b0ccb5e60c77f892f933253e4241b9d4b0ebb5ff31989
MD5 2cde47392dcf1992ef452f377f193070
BLAKE2b-256 b3f3490e2d2e6aa60509e1f54dfd699ce7567f5a4e3ddb4494296bd860b84188

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanskrit_morph_converter-0.2.0.tar.gz:

Publisher: publish.yml on SriramKrishnan8/sanskrit-morph-converter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sanskrit_morph_converter-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sanskrit_morph_converter-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d83b6c203ac5dc64a540aed7588101a6b8ddf831cb71338ef1014a08ab63d670
MD5 29f1db6537509b146bb25a6d5de22186
BLAKE2b-256 3cad725045ffaabb95d36372bcfce6157b130a245700929ec9770190f2007608

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanskrit_morph_converter-0.2.0-py3-none-any.whl:

Publisher: publish.yml on SriramKrishnan8/sanskrit-morph-converter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page