Deterministic, offline Marathi word analysis library (shabda = word in Marathi)
Project description
marathi-shabda
Deterministic, offline Marathi word analysis library
What is marathi-shabda?
marathi-shabda is a production-quality Python library for analyzing Marathi words. It provides:
- Lemma (stem) extraction from inflected Marathi words
- Dictionary lookup (Marathi ↔ English) with meanings
- Morphological analysis (रूप परिचय) including POS, vibhakti, and kāl detection
Why "pratham" (प्रथम)?
Pratham means "first" in Marathi. This library provides the first step in Marathi text analysis: understanding individual words before tackling sentences or documents.
Motivation
Marathi language tooling lags behind other Indian languages. Existing solutions either:
- Require network access (API-based)
- Hallucinate meanings (LLM-based)
- Lack linguistic grounding (pure ML)
marathi-shabda is different:
- ✅ Offline-first: No network, no API keys
- ✅ Dictionary-backed: Authoritative meanings, no hallucinations
- ✅ Explainable: Shows reasoning for every decision
- ✅ Honest about limitations: Surfaces ambiguity instead of hiding it
What It Does
✅ Supported Features
- Lemma extraction:
पाण्यावर→पाणी(water) - Vibhakti detection: Identifies case markers (तृतीया, सप्तमी, संबंध, etc.)
- Dictionary lookup: Marathi → English meanings
- POS tagging: Conservative noun/verb/adjective classification
- Kāl inference: Basic tense detection for verbs
- Roman input: Accepts romanized Marathi (e.g.,
pani→पाणी) - Stem alternations: Handles oblique forms (
पाण्य→पाणी)
❌ Explicit Non-Goals
This library does NOT:
- Parse sentences or multi-word phrases
- Claim grammatical correctness in all contexts
- Infer semantics beyond dictionary meanings
- Require network access
- Use machine learning (v0.1.0)
Installation
pip install marathi-shabda
Requirements: Python 3.8+, no external dependencies
Quick Start
1. Lemma Extraction
from marathi_shabda import get_lemma
result = get_lemma("पाण्यावर")
print(result.lemma) # पाणी
print(result.confidence) # 0.9
print(result.detected_vibhakti) # VibhaktiType.SAPTAMI (सप्तमी)
print(result.explanation) # "Detected सप्तमी vibhakti"
2. Dictionary Lookup
from marathi_shabda import lookup_word
result = lookup_word("पाणी")
print(result.english_meanings) # ['water']
print(result.found) # True
# Also works with Roman input
result = lookup_word("pani")
print(result.lemma) # पाणी
3. Morphological Analysis
from marathi_shabda import analyze_word
result = analyze_word("मुलाने")
print(result.lemma) # मुल
print(result.pos) # POSTag.NOUN
print(result.vibhakti) # VibhaktiType.TRUTIYA (तृतीया)
print(result.confidence) # 0.9
print(result.explanation)
# "Detected तृतीया vibhakti; Inferred noun"
How It Works
Architecture
Input Word
↓
Normalization (Roman → Devanagari)
↓
Dictionary Check (exact match?)
↓
Vibhakti Detection (longest-first)
↓
Stem Alternations (पाण्य → पाणी)
↓
Dictionary Validation (lemma exists?)
↓
POS & Kāl Inference
↓
Result with Confidence
Key Principles
- Dictionary-first validation: Rules generate candidates, dictionary decides truth
- Longest-match-first: Detects
मध्येbeforeये - Conservative inference: Returns
UNKNOWNwhen uncertain - Explainable decisions: Every result includes reasoning
Confidence & Ambiguity
Confidence Scores
- 1.0: Exact dictionary match
- 0.9: Vibhakti detected, lemma validated
- 0.7: Ambiguous (multiple possible lemmas)
- 0.0: Word not in dictionary
Handling Ambiguity
result = get_lemma("घरात")
if result.ambiguous:
print(f"Multiple interpretations: {result.candidates}")
# ['घर', 'घरात'] # Could be noun or compound
Philosophy: We surface ambiguity instead of making false claims.
Offline Guarantee
marathi-shabda works completely offline:
- ✅ No network requests
- ✅ No API keys
- ✅ No telemetry
- ✅ Bundled SQLite database
- ✅ Pure Python (stdlib only)
Perfect for:
- Privacy-sensitive applications
- Offline environments
- Embedded systems
- Research reproducibility
Limitations
Current Limitations (v0.1.0)
- Single words only: No sentence parsing
- Conservative POS tagging: Limited to obvious cases
- Basic kāl detection: Only common verb patterns
- No semantic analysis: Dictionary meanings only
- Limited verb conjugation: Focus on nouns/vibhakti
Known Edge Cases
- Compound words may not split correctly
- Rare vibhaktis may not be detected
- Ambiguous forms return multiple candidates
- Roman transliteration is approximate
We document limitations honestly. If you encounter issues, please report them!
Future Roadmap
v0.2.0 (Planned)
- Extended database schema (POS, gender, number)
- Improved verb conjugation analysis
- Compound word splitting
- Performance optimizations
v0.3.0 (Planned)
- Optional SLM integration for ambiguity resolution
- Sentence-level analysis (experimental)
- Batch processing API
Long-term
- Hybrid rule-based + ML approach
- Community-contributed dictionary expansions
- Web API (optional deployment)
Command-Line Interface
# Extract lemma
marathi-shabda lemma पाण्यावर
# Dictionary lookup
marathi-shabda lookup पाणी
# Full analysis
marathi-shabda analyze मुलाने
Contributing
We welcome contributions! See CONTRIBUTING.md for:
- How to add vibhakti rules
- How to improve transliteration
- Code style guidelines
- Testing requirements
License
MIT License - see LICENSE for details
Acknowledgments
- Marathi language scholars and grammarians
- Open-source NLP community
- Contributors and testers
Citation
If you use marathi-shabda in research, please cite:
@software{marathi_shabda,
title = {marathi-shabda: Deterministic Marathi Word Analysis},
author = {Marathi Pratham Contributors},
year = {2026},
url = {https://github.com/yourusername/marathi-shabda}
}
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: [your-email@example.com]
Philosophy: When unsure, defer. When confident, explain why.
Built with respect for the Marathi language and its speakers. 🙏
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marathi_shabda-0.1.0.tar.gz.
File metadata
- Download URL: marathi_shabda-0.1.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bda78ed68885cefad79a02cf608a755e984e19e7b5556a537ec90991b734cd6e
|
|
| MD5 |
4c21a9c8cedebbaab0ef4576c9d93ebd
|
|
| BLAKE2b-256 |
7b9e3d0a482d4fd1ad8607306223552fa6fbb52cc0f9b110393a76d45c0b51b3
|
File details
Details for the file marathi_shabda-0.1.0-py3-none-any.whl.
File metadata
- Download URL: marathi_shabda-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2bc5a31abb9bc4b61b30b5c662958b1de43aa67600969d3b927b8b822a2e6b3
|
|
| MD5 |
36b5e7e3261da1d2b0b9e0171ca50ee3
|
|
| BLAKE2b-256 |
07688058a3258535ef36c76d381c719256650d440f78b9458e44c1652087f0bd
|