Skip to main content

Natural Language Processing for Zomi Language (Zopau)

Project description

Zomi NLP

PyPI version Python Versions License CI

Natural Language Processing toolkit for the Zomi language (Zopau).

Features

  • 🔤 Tokenization - Smart tokenization with Zomi clitic handling
  • 🏷️ POS Tagging - Part-of-speech tagging
  • 🌲 Dependency Parsing - Grammatical structure analysis
  • 📍 Named Entity Recognition - Entity extraction
  • 🔌 Pluggable Backends - Use spaCy, Stanza, or native implementations
  • 🚀 Production Ready - CI/CD, type hints, comprehensive testing

Requirements

  • Python 3.9 or higher
  • pip (latest version recommended)

Dependencies

Zomi NLP works with either spaCy or Stanza as backends. If both are installed, it will prefer Stanza (more accurate) but fall back to spaCy (faster) if needed.

Installation Options

Minimal Installation (Basic Tokenization Only)

pip install zomi-nlp

With spaCy (Recommended for Speed)

pip install 'zomi-nlp[spacy]'
python -m spacy download en_core_web_sm

With Stanza (Recommended for Accuracy)

pip install 'zomi-nlp[stanza]'

Full installation (Both Backends)

pip install 'zomi-nlp[full]'

Quick Start

from zomi_nlp import load

# Load the pipeline (auto-selects best available backend)
nlp = load()

# Process text
text = "An ka ne hi."
doc = nlp(text)

# Access tokens
for token in doc:
    print(f"{token.text}\t{token.pos_}\t{token.lemma_}")

# Output:
# An      NOUN    an
# ka      PRON    ka
# ne      VERB    ne
# hi      PART    hi
# .       PUNCT   .

Configuration

from zomi_nlp import ZomiConfig, ZomiPipeline

# Use spaCy for speed
config = ZomiConfig(tokenizer_backend="spacy", tagger_backend="spacy")
nlp = ZomiPipeline(config)

# Use Stanza for accuracy
config = ZomiConfig(tokenizer_backend="stanza", tagger_backend="stanza")
nlp = ZomiPipeline(config)

# Auto-select best available (recommended)
config = ZomiConfig(tokenizer_backend="auto")
nlp = ZomiPipeline(config)

Checking Installation

from zomi_nlp import check_installation

# Check what's installed
check_installation()

# Get status as dict
status = check_installation(verbose=False)
print(status)

Troubleshooting

Check your installation

zomi-nlp --check

Diagnose issues automatically

zomi-nlp --doctor

"stanza not installed" Warning

If you see warnings about stanza, you have two options:

  1. Install stanza (better accuracy):
pip install stanza
  1. Use spaCy instead (change your config):
config = ZomiConfig(tokenizer_backend="spacy")

"No backend available" Error

Install at least one backend:

pip install 'zomi-nlp[full]'

Getting None Values for POS Tags

This happens when no backend is available. The library falls back to a simple tokenizer. Install spaCy or stanza for full functionality.

Development

# Clone repository
git clone https://github.com/ZomiCommunity/zomi-nlp.git
cd zomi-nlp

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run linting
ruff check zomi_nlp/

# Format code
black zomi_nlp/ tests/

Roadmap

  • v0.1.0 - Core architecture + spaCy/Stanza adapters
  • v0.2.0 - Zomi-native tokenizer
  • v0.3.0 - Zomi POS tagger
  • v0.4.0 - Zomi dependency parser
  • v1.0.0 - Fully native implementation

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

Apache License 2.0

Citation

@software{zomi_nlp_2026,
  title={Zomi NLP: Natural Language Processing for Zomi Language},
  author={Zomi NLP Community},
  year={2026},
  url={https://github.com/ZomiCommunity/zomi-nlp}
}

Acknowledgments

  • Built with ❤️ for the Zomi community
  • Uses spaCy and Stanza as backends
  • Inspired by universal dependencies framework

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zomi_nlp-0.3.0.tar.gz (27.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zomi_nlp-0.3.0-py3-none-any.whl (28.7 kB view details)

Uploaded Python 3

File details

Details for the file zomi_nlp-0.3.0.tar.gz.

File metadata

  • Download URL: zomi_nlp-0.3.0.tar.gz
  • Upload date:
  • Size: 27.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for zomi_nlp-0.3.0.tar.gz
Algorithm Hash digest
SHA256 83da2a1dce6d6ae7f285bec288daa4a7a48debba9b6dc0a6c3f5d697113f8472
MD5 526970de6e4d5507fd2d217ebce2a878
BLAKE2b-256 c020226ffd11ca52ce11d42af07f128dcdcd32eb0fef7a8085f362225cf22c3c

See more details on using hashes here.

File details

Details for the file zomi_nlp-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: zomi_nlp-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 28.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for zomi_nlp-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57a6979ec53a56b8730331314a42c33b8c7bfd81e2a4913d0f76d891350bcd5a
MD5 8ea47a4e252e5bd208b9fa67f5ab7cf7
BLAKE2b-256 950cfacc9cb32522a0b2c3e063e34982ad4652936ecdeeb7622a4ef53a308c33

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page