Skip to main content

Natural Language Processing for Zomi Language (Zopau)

Project description

Zomi NLP

PyPI version Python Versions License CI

Natural Language Processing toolkit for the Zomi language (Zopau).

Features

  • 🔤 Tokenization - Smart tokenization with Zomi clitic handling
  • 🏷️ POS Tagging - Part-of-speech tagging
  • 🌲 Dependency Parsing - Grammatical structure analysis
  • 📍 Named Entity Recognition - Entity extraction
  • 🔌 Pluggable Backends - Use spaCy, Stanza, or native implementations
  • 🚀 Production Ready - CI/CD, type hints, comprehensive testing

Requirements

  • Python 3.9 or higher
  • pip (latest version recommended)

Dependencies

Zomi NLP works with either spaCy or Stanza as backends. If both are installed, it will prefer Stanza (more accurate) but fall back to spaCy (faster) if needed.

Installation Options

Minimal Installation (Basic Tokenization Only)

pip install zomi-nlp

With spaCy (Recommended for Speed)

pip install 'zomi-nlp[spacy]'
python -m spacy download en_core_web_sm

With Stanza (Recommended for Accuracy)

pip install 'zomi-nlp[stanza]'

Full installation (Both Backends)

pip install 'zomi-nlp[full]'

Quick Start

from zomi_nlp import load

# Load the pipeline (auto-selects best available backend)
nlp = load()

# Process text
text = "An ka ne hi."
doc = nlp(text)

# Access tokens
for token in doc:
    print(f"{token.text}\t{token.pos_}\t{token.lemma_}")

# Output:
# An      NOUN    an
# ka      PRON    ka
# ne      VERB    ne
# hi      PART    hi
# .       PUNCT   .

Configuration

from zomi_nlp import ZomiConfig, ZomiPipeline

# Use spaCy for speed
config = ZomiConfig(tokenizer_backend="spacy", tagger_backend="spacy")
nlp = ZomiPipeline(config)

# Use Stanza for accuracy
config = ZomiConfig(tokenizer_backend="stanza", tagger_backend="stanza")
nlp = ZomiPipeline(config)

# Auto-select best available (recommended)
config = ZomiConfig(tokenizer_backend="auto")
nlp = ZomiPipeline(config)

Checking Installation

from zomi_nlp import check_installation

# Check what's installed
check_installation()

# Get status as dict
status = check_installation(verbose=False)
print(status)

Troubleshooting

Check your installation

zomi-nlp --check

Diagnose issues automatically

zomi-nlp --doctor

"stanza not installed" Warning

If you see warnings about stanza, you have two options:

  1. Install stanza (better accuracy):
pip install stanza
  1. Use spaCy instead (change your config):
config = ZomiConfig(tokenizer_backend="spacy")

"No backend available" Error

Install at least one backend:

pip install 'zomi-nlp[full]'

Getting None Values for POS Tags

This happens when no backend is available. The library falls back to a simple tokenizer. Install spaCy or stanza for full functionality.

Development

# Clone repository
git clone https://github.com/ZomiCommunity/zomi-nlp.git
cd zomi-nlp

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run linting
ruff check zomi_nlp/

# Format code
black zomi_nlp/ tests/

Roadmap

  • v0.1.0 - Core architecture + spaCy/Stanza adapters
  • v0.2.0 - Zomi-native tokenizer
  • v0.3.0 - Zomi POS tagger
  • v0.4.0 - Zomi dependency parser
  • v1.0.0 - Fully native implementation

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

Apache License 2.0

Citation

@software{zomi_nlp_2026,
  title={Zomi NLP: Natural Language Processing for Zomi Language},
  author={Zomi NLP Community},
  year={2026},
  url={https://github.com/ZomiCommunity/zomi-nlp}
}

Acknowledgments

  • Built with ❤️ for the Zomi community
  • Uses spaCy and Stanza as backends
  • Inspired by universal dependencies framework

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zomi_nlp-0.4.0.tar.gz (52.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zomi_nlp-0.4.0-py3-none-any.whl (52.8 kB view details)

Uploaded Python 3

File details

Details for the file zomi_nlp-0.4.0.tar.gz.

File metadata

  • Download URL: zomi_nlp-0.4.0.tar.gz
  • Upload date:
  • Size: 52.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for zomi_nlp-0.4.0.tar.gz
Algorithm Hash digest
SHA256 fa6db3d8a4ab684f96a598314e7ffde60b1f04c06da45adf3fe8cb6373f29610
MD5 bde4d0c362ab8afcaffde3164f6b5a9c
BLAKE2b-256 d69fc6bc7a4294acde7e2d8d63e5e88b1d620e38155b6b4e1834c5639c9a955a

See more details on using hashes here.

File details

Details for the file zomi_nlp-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: zomi_nlp-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 52.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for zomi_nlp-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3cb6ca2f59a2d7da3ef73290b7c9c224dfb4ae2413f54f223015f6b3599d30a
MD5 284f2e6c9dd2e48ba9c388af1da14540
BLAKE2b-256 7eb9bd3ca0e75e55c4aed5d0f08b94acecd812520249bc6aa6da47b64a5fa419

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page