Sanskrit grammar processing using the Dharmamitra API
Project description
Sanskrit Processor
A Python package for processing Sanskrit text using the Dharmamitra API.
Installation
pip install dharmamitra-sanskrit-grammar
Usage
from dharmamitra_sanskrit_grammar import DharmamitraSanskritProcessor
# Initialize the processor
processor = DharmamitraSanskritProcessor()
# Process a batch of sentences
sentences = [
"tapaḥsvādhyāyanirataṃ tapasvī vāgvidāṃ varam",
"nāradaṃ paripapraccha vālmīkirmunipuṃgavam"
]
# Using different modes
results = processor.process_batch(
sentences,
mode="lemma", # or 'unsandhied' or 'unsandhied-lemma-morphosyntax'
human_readable_tags=True
)
Available Modes
lemma
: Basic lemmatizationunsandhied
: Word segmentation onlyunsandhied-lemma-morphosyntax
: Full analysis with word segmentation, lemmatization, and morphosyntax
Output format
Default is 'dict', but if you set it to 'string' you will get a simple string version of just the lemmas in 'lemma' mode or the unsandhied surface forms in 'unsandhied' mode. This should be handy for information-retrieval setups.
Project
You can visit an interactive version of this at [dharmamitra.org]. A github repository for the underlying model is here.
Citation
The preprint is available on arxiv. If you like our work and use it in your research, feel free to cite the paper:
@inproceedings{
nehrdichetal2024,
title={One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit {NLP} Tasks},
author={Nehrdich, Sebastian and Hellwig, Oliver and Keutzer, Kurt},
booktitle={Findings of the 2024 Conference on Empirical Methods in Natural Language Processing},
year={2024},
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dharmamitra_sanskrit_grammar-0.1.6.tar.gz
.
File metadata
- Download URL: dharmamitra_sanskrit_grammar-0.1.6.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f62a2958fdb924583bf1600f76aff5d37798210614cab0fa64dd40bdddb0b87e |
|
MD5 | 28963e2455abc6dc6f0e669a411cfb4c |
|
BLAKE2b-256 | 61761797cd865460b78ed89d920bbdf91b3e4fb5520ac12733ae6fa96b0e8df3 |
File details
Details for the file dharmamitra_sanskrit_grammar-0.1.6-py3-none-any.whl
.
File metadata
- Download URL: dharmamitra_sanskrit_grammar-0.1.6-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a1c88e98d35ee1051ec6c2262b65783fe464c45ccdbf38e50123a9a593bb875 |
|
MD5 | b3acb7a34c8be5eb759f8f06d3a227e9 |
|
BLAKE2b-256 | caadee5ced9f7c332fe8ff5cf10b43ed6df53d60d32cf0fa9366f44845349447 |