Natural Language Processing for Zomi Language (Zopau)
Project description
Zomi NLP
Natural Language Processing toolkit for the Zomi language (Zopau).
Features
- 🔤 Tokenization - Smart tokenization with Zomi clitic handling
- 🏷️ POS Tagging - Part-of-speech tagging
- 🌲 Dependency Parsing - Grammatical structure analysis
- 📍 Named Entity Recognition - Entity extraction
- 🔌 Pluggable Backends - Use spaCy, Stanza, or native implementations
- 🚀 Production Ready - CI/CD, type hints, comprehensive testing
Requirements
- Python 3.9 or higher
- pip (latest version recommended)
Dependencies
Zomi NLP works with either spaCy or Stanza as backends. If both are installed, it will prefer Stanza (more accurate) but fall back to spaCy (faster) if needed.
Installation Options
Minimal Installation (Basic Tokenization Only)
pip install zomi-nlp
With spaCy (Recommended for Speed)
pip install 'zomi-nlp[spacy]'
python -m spacy download en_core_web_sm
With Stanza (Recommended for Accuracy)
pip install 'zomi-nlp[stanza]'
Full installation (Both Backends)
pip install 'zomi-nlp[full]'
Quick Start
from zomi_nlp import load
# Load the pipeline (auto-selects best available backend)
nlp = load()
# Process text
text = "An ka ne hi."
doc = nlp(text)
# Access tokens
for token in doc:
print(f"{token.text}\t{token.pos_}\t{token.lemma_}")
# Output:
# An NOUN an
# ka PRON ka
# ne VERB ne
# hi PART hi
# . PUNCT .
Configuration
from zomi_nlp import ZomiConfig, ZomiPipeline
# Use spaCy for speed
config = ZomiConfig(tokenizer_backend="spacy", tagger_backend="spacy")
nlp = ZomiPipeline(config)
# Use Stanza for accuracy
config = ZomiConfig(tokenizer_backend="stanza", tagger_backend="stanza")
nlp = ZomiPipeline(config)
# Auto-select best available (recommended)
config = ZomiConfig(tokenizer_backend="auto")
nlp = ZomiPipeline(config)
Checking Installation
from zomi_nlp import check_installation
# Check what's installed
check_installation()
# Get status as dict
status = check_installation(verbose=False)
print(status)
Troubleshooting
Check your installation
zomi-nlp --check
Diagnose issues automatically
zomi-nlp --doctor
"stanza not installed" Warning
If you see warnings about stanza, you have two options:
- Install stanza (better accuracy):
pip install stanza
- Use spaCy instead (change your config):
config = ZomiConfig(tokenizer_backend="spacy")
"No backend available" Error
Install at least one backend:
pip install 'zomi-nlp[full]'
Getting None Values for POS Tags
This happens when no backend is available. The library falls back to a simple tokenizer. Install spaCy or stanza for full functionality.
Development
# Clone repository
git clone https://github.com/ZomiCommunity/zomi-nlp.git
cd zomi-nlp
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Run linting
ruff check zomi_nlp/
# Format code
black zomi_nlp/ tests/
Roadmap
- v0.1.0 - Core architecture + spaCy/Stanza adapters
- v0.2.0 - Zomi-native tokenizer
- v0.3.0 - Zomi POS tagger
- v0.4.0 - Zomi dependency parser
- v1.0.0 - Fully native implementation
Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines.
License
Apache License 2.0
Citation
@software{zomi_nlp_2026,
title={Zomi NLP: Natural Language Processing for Zomi Language},
author={Zomi NLP Community},
year={2026},
url={https://github.com/ZomiCommunity/zomi-nlp}
}
Acknowledgments
- Built with ❤️ for the Zomi community
- Uses spaCy and Stanza as backends
- Inspired by universal dependencies framework
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zomi_nlp-0.4.0.tar.gz.
File metadata
- Download URL: zomi_nlp-0.4.0.tar.gz
- Upload date:
- Size: 52.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa6db3d8a4ab684f96a598314e7ffde60b1f04c06da45adf3fe8cb6373f29610
|
|
| MD5 |
bde4d0c362ab8afcaffde3164f6b5a9c
|
|
| BLAKE2b-256 |
d69fc6bc7a4294acde7e2d8d63e5e88b1d620e38155b6b4e1834c5639c9a955a
|
File details
Details for the file zomi_nlp-0.4.0-py3-none-any.whl.
File metadata
- Download URL: zomi_nlp-0.4.0-py3-none-any.whl
- Upload date:
- Size: 52.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3cb6ca2f59a2d7da3ef73290b7c9c224dfb4ae2413f54f223015f6b3599d30a
|
|
| MD5 |
284f2e6c9dd2e48ba9c388af1da14540
|
|
| BLAKE2b-256 |
7eb9bd3ca0e75e55c4aed5d0f08b94acecd812520249bc6aa6da47b64a5fa419
|