Skip to main content

Sanskrit grammar processing using the Dharmamitra API

Project description

Sanskrit Processor

A Python package for processing Sanskrit text using the Dharmamitra API.

Installation

pip install dharmamitra-sanskrit-grammar

Usage

from dharmamitra_sanskrit_grammar import DharmamitraSanskritProcessor

# Initialize the processor
processor = DharmamitraSanskritProcessor()

# Process a batch of sentences
sentences = [
    "tapaḥsvādhyāyanirataṃ tapasvī vāgvidāṃ varam",
    "nāradaṃ paripapraccha vālmīkirmunipuṃgavam"
]

# Using different modes
results = processor.process_batch(
    sentences,
    mode="lemma",  # or 'unsandhied' or 'unsandhied-lemma-morphosyntax'
    human_readable_tags=True
)

Available Modes

  • lemma: Basic lemmatization
  • unsandhied: Word segmentation only
  • unsandhied-lemma-morphosyntax: Full analysis with word segmentation, lemmatization, and morphosyntax

Output format

Default is 'dict', but if you set it to 'string' you will get a simple string version of just the lemmas in 'lemma' mode or the unsandhied surface forms in 'unsandhied' mode. This should be handy for information-retrieval setups.

Project

You can visit an interactive version of this at [dharmamitra.org]. A github repository for the underlying model is here.

Citation

The preprint is available on arxiv. If you like our work and use it in your research, feel free to cite the paper:

@inproceedings{
nehrdichetal2024,
title={One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit {NLP} Tasks},
author={Nehrdich, Sebastian and Hellwig, Oliver and Keutzer, Kurt},
booktitle={Findings of the 2024 Conference on Empirical Methods in Natural Language Processing},
year={2024},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dharmamitra_sanskrit_grammar-0.1.7.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file dharmamitra_sanskrit_grammar-0.1.7.tar.gz.

File metadata

File hashes

Hashes for dharmamitra_sanskrit_grammar-0.1.7.tar.gz
Algorithm Hash digest
SHA256 079f4b6ce2ab4272cae3869fbf49f3d065fe733594ebe6db7ef90558e9d91062
MD5 92552da7e4781e9f1b7cbbef7418e987
BLAKE2b-256 8dfbb4b6687733bc9a47c6a613ae500ba4c44412bd5e39c0e9228d7ff8e82dc7

See more details on using hashes here.

File details

Details for the file dharmamitra_sanskrit_grammar-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for dharmamitra_sanskrit_grammar-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 0a88acb9cc5ab6c01084eaf9b9a5fe8d383336a8d5e0d0d1530f43e6afc51a77
MD5 acede6c22350ad517ea9c07636884911
BLAKE2b-256 408759653c15fcc79dc5beea9bf5baf8e4831cc5679dd05cf6aeee2bd9414e8f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page