Skip to main content

Sanskrit grammar processing using the Dharmamitra API

Project description

Sanskrit Processor

A Python package for processing Sanskrit text using the Dharmamitra API.

Installation

pip install dharmamitra-sanskrit-grammar

Usage

from dharmamitra_sanskrit_grammar import DharmamitraSanskritProcessor

# Initialize the processor
processor = DharmamitraSanskritProcessor()

# Process a batch of sentences
sentences = [
    "tapaḥsvādhyāyanirataṃ tapasvī vāgvidāṃ varam",
    "nāradaṃ paripapraccha vālmīkirmunipuṃgavam"
]

# Using different modes
results = processor.process_batch(
    sentences,
    mode="lemma",  # or 'unsandhied' or 'unsandhied-lemma-morphosyntax'
    human_readable_tags=True
)

Available Modes

  • lemma: Basic lemmatization
  • unsandhied: Word segmentation only
  • unsandhied-lemma-morphosyntax: Full analysis with word segmentation, lemmatization, and morphosyntax

Output format

Default is 'dict', but if you set it to 'string' you will get a simple string version of just the lemmas in 'lemma' mode or the unsandhied surface forms in 'unsandhied' mode. This should be handy for information-retrieval setups.

Project

You can visit an interactive version of this at [dharmamitra.org]. A github repository for the underlying model is here.

Citation

The preprint is available on arxiv. If you like our work and use it in your research, feel free to cite the paper:

@inproceedings{
nehrdichetal2024,
title={One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit {NLP} Tasks},
author={Nehrdich, Sebastian and Hellwig, Oliver and Keutzer, Kurt},
booktitle={Findings of the 2024 Conference on Empirical Methods in Natural Language Processing},
year={2024},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dharmamitra_sanskrit_grammar-0.1.6.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file dharmamitra_sanskrit_grammar-0.1.6.tar.gz.

File metadata

File hashes

Hashes for dharmamitra_sanskrit_grammar-0.1.6.tar.gz
Algorithm Hash digest
SHA256 f62a2958fdb924583bf1600f76aff5d37798210614cab0fa64dd40bdddb0b87e
MD5 28963e2455abc6dc6f0e669a411cfb4c
BLAKE2b-256 61761797cd865460b78ed89d920bbdf91b3e4fb5520ac12733ae6fa96b0e8df3

See more details on using hashes here.

File details

Details for the file dharmamitra_sanskrit_grammar-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for dharmamitra_sanskrit_grammar-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0a1c88e98d35ee1051ec6c2262b65783fe464c45ccdbf38e50123a9a593bb875
MD5 b3acb7a34c8be5eb759f8f06d3a227e9
BLAKE2b-256 caadee5ced9f7c332fe8ff5cf10b43ed6df53d60d32cf0fa9366f44845349447

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page