Skip to main content

A Universal hub-and-spoke morphological representation converter for Sanskrit.

Project description

Sanskrit Morph Converter

PyPI version License: GPL v3 Python 3.8+

A Python engine for unifying, standardizing, and converting Sanskrit morphological tags across multiple computational paradigms.

In Sanskrit Computational Linguistics, different tools like the Sanskrit Heritage engine, Samsaadhanii, neural models like ByT5, and baseline grammars like Svarupa output morphological analyses in vastly different formats and vocabularies. sanskrit-morph-converter provides a centralized, pivot-based architecture to translate these tagsets into a unified Canonical Representation.

Installation

Install the package directly from PyPI:

pip install sanskrit-morph-converter

Python API Usage

You can import the converter directly into your Python scripts to process strings or JSON outputs from various platforms. The core .convert() method takes a source platform, a target platform, and the raw input.

from sanskrit_morph_converter.converter import RepresentationConverter

# Initialize the converter (automatically loads the compiled mapping TSVs)
converter = RepresentationConverter()

Example 1: Converting ByT5 Output to Canonical

ByT5 outputs rely on underscore and pipe-separated strings. The converter easily parses these into standard Canonical properties.

byt5_raw = "devam_deva_Case=Acc|Gender=Masc|Number=Sing"

# Convert ByT5 to Canonical
canonical_tags = converter.convert('ByT5', 'Canonical', byt5_raw)
print(canonical_tags)
# Output: [{'input': 'देवम्', 'stem': 'देव', 'root': '', 'morph': 'Case=Accusative|Gender=Masculine|Number=Singular'}]

Example 2: Converting Sanskrit Heritage (SH) to DCS

The Sanskrit Heritage engine returns nested JSON dictionaries. You can pass the JSON string directly to convert it to another format, such as DCS.

sh_raw = """{
    "input": "गच्छति", 
    "status": "Success", 
    "morph": [{"word": "गच्छति", "root": "गम्", "inflectional_morphs": ["pr. [1] ac. sg. 3"]}]
}"""

# Convert SH to DCS
dcs_tags = converter.convert('SH', 'DCS', sh_raw, output_format='string')
print(dcs_tags)
# Output (Example): ['gacchati\tgam\tMood=Ind|Number=Sing|Person=3|Tense=Pres']

Command Line Interface (CLI)

The package includes a built-in CLI for batch processing files or testing quick strings directly from your terminal.

Convert a single string:

smc convert ByT5 Canonical -i "devam_deva_Case=Acc|Gender=Masc|Number=Sing"

Process an entire file and save the output:

smc convert SH Canonical -f data/sh_analysis.tsv -o data/canonical_results.tsv

Change the output script (e.g., to WX or IAST):

smc convert ByT5 SH -i "devam_deva_Case=Acc|Gender=Masc|Number=Sing" --script WX

Architecture

This library operates on a flexible, three-stage pipeline: Adapters (to read the source format), a Mapper (to route to a mathematical Pivot), and an Converter (to format the target platform output).

The Google Sheets Integration

To ensure this tool remains accessible to linguists and researchers who may not write code, the mapping vocabulary is not hardcoded. Instead, tag standardizations and lexical exceptions (like pronouns and causatives) are maintained collaboratively in a Master Google Sheet.

When linguistic rules are updated in the sheet, you can use the built-in compiler to fetch the latest data and rebuild the internal .tsv files (pivot_mapping.tsv, normalization.tsv, etc.) without altering the Python engine.

To fetch the latest mappings from the Google Sheet:

sanskrit-morph update

(Note: The pre-compiled .tsv files are already bundled with the PyPI package, so standard users do not need to run the compiler to use the tool).

📜 License

This project is licensed under the GNU GENERAL PUBLIC LICENSE v3 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanskrit_morph_converter-0.1.1.tar.gz (52.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sanskrit_morph_converter-0.1.1-py3-none-any.whl (52.0 kB view details)

Uploaded Python 3

File details

Details for the file sanskrit_morph_converter-0.1.1.tar.gz.

File metadata

  • Download URL: sanskrit_morph_converter-0.1.1.tar.gz
  • Upload date:
  • Size: 52.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sanskrit_morph_converter-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7a04856b45a5cd4b5abd3ac02a1d8b1d811780c6b80eb84ba18cd27da441e899
MD5 ed8e13e71d0e4d8ab241b2b654dd1bb7
BLAKE2b-256 2f8339972a75803c32a8f9f0835faf64224606e207390f17081d8cbc315c5edc

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanskrit_morph_converter-0.1.1.tar.gz:

Publisher: publish.yml on SriramKrishnan8/sanskrit-morph-converter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sanskrit_morph_converter-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sanskrit_morph_converter-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 660ea53c2ede46ca72325b91f55eb6f725cc9a2c3e6243314998009b0a52538f
MD5 b5a029b2f484b0bc9308b7d693be3607
BLAKE2b-256 f338feb55e71b7e7d3322f0e6a217e6a3c35d26736172f9652470e1197fc55c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanskrit_morph_converter-0.1.1-py3-none-any.whl:

Publisher: publish.yml on SriramKrishnan8/sanskrit-morph-converter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page