Skip to main content

Natural Language Processing for Zomi Language (Zopau)

Project description

Zomi NLP

PyPI version CI License Python Versions

Natural Language Processing toolkit for the Zomi language (Zopau).

Features

  • 🔤 Tokenization - Smart tokenization with Zomi clitic handling
  • 🏷️ POS Tagging - Part-of-speech tagging
  • 🌲 Dependency Parsing - Grammatical structure analysis
  • 📍 Named Entity Recognition - Entity extraction
  • 🔌 Pluggable Backends - Use spaCy, Stanza, or native implementations
  • 🚀 Production Ready - CI/CD, type hints, comprehensive testing

Requirements

  • Python 3.9 or higher
  • pip (latest version recommended)

Dependencies

Zomi NLP works with either spaCy or Stanza as backends. If both are installed, it will prefer Stanza (more accurate) but fall back to spaCy (faster) if needed.

Installation Options

Minimal Installation (Basic Tokenization Only)

pip install zomi-nlp

With spaCy (Recommended for Speed)

pip install 'zomi-nlp[spacy]'
python -m spacy download en_core_web_sm

With Stanza (Recommended for Accuracy)

pip install 'zomi-nlp[stanza]'

Full installation (Both Backends)

pip install 'zomi-nlp[full]'

Quick Start

from zomi_nlp import load

# Load the pipeline (auto-selects best available backend)
nlp = load()

# Process text
text = "An ka ne hi."
doc = nlp(text)

# Access tokens
for token in doc:
    print(f"{token.text}\t{token.pos_}\t{token.lemma_}")

# Output:
# An      NOUN    an
# ka      PRON    ka
# ne      VERB    ne
# hi      PART    hi
# .       PUNCT   .

Configuration

from zomi_nlp import ZomiConfig, ZomiPipeline

# Use spaCy for speed
config = ZomiConfig(tokenizer_backend="spacy", tagger_backend="spacy")
nlp = ZomiPipeline(config)

# Use Stanza for accuracy
config = ZomiConfig(tokenizer_backend="stanza", tagger_backend="stanza")
nlp = ZomiPipeline(config)

# Auto-select best available (recommended)
config = ZomiConfig(tokenizer_backend="auto")
nlp = ZomiPipeline(config)

Checking Installation

from zomi_nlp import check_installation

# Check what's installed
check_installation()

# Get status as dict
status = check_installation(verbose=False)
print(status)

Troubleshooting

"stanza not installed" Warning

If you see warnings about stanza, you have two options:

  1. Install stanza (better accuracy):
pip install stanza
  1. Use spaCy instead (change your config):
config = ZomiConfig(tokenizer_backend="spacy")

"No backend available" Error

Install at least one backend:

pip install 'zomi-nlp[full]'

Getting None Values for POS Tags

This happens when no backend is available. The library falls back to a simple tokenizer. Install spaCy or stanza for full functionality.

Development

# Clone repository
git clone https://github.com/ZomiCommunity/zomi-nlp.git
cd zomi-nlp

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run linting
ruff check zomi_nlp/

# Format code
black zomi_nlp/ tests/

Roadmap

  • v0.1.0 - Core architecture + spaCy/Stanza adapters
  • v0.2.0 - Zomi-native tokenizer
  • v0.3.0 - Zomi POS tagger
  • v0.4.0 - Zomi dependency parser
  • v1.0.0 - Fully native implementation

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

Apache License 2.0

Citation

@software{zomi_nlp_2026,
  title={Zomi NLP: Natural Language Processing for Zomi Language},
  author={Zomi NLP Community},
  year={2026},
  url={https://github.com/ZomiCommunity/zomi-nlp}
}

Acknowledgments

  • Built with ❤️ for the Zomi community
  • Uses spaCy and Stanza as backends
  • Inspired by universal dependencies framework

Makefile

.PHONY: install install-dev test lint format clean build publish help

help:
	@echo "Available commands:"
	@echo "  install     - Install package"
	@echo "  install-dev - Install with dev dependencies"
	@echo "  test        - Run tests"
	@echo "  lint        - Run linters"
	@echo "  format      - Format code"
	@echo "  clean       - Clean build artifacts"
	@echo "  build       - Build distribution"
	@echo "  publish     - Publish to PyPI"

install:
	pip install .

install-dev:
	pip install -e ".[dev]"

test:
	pytest tests/ -v --cov=zomi_nlp

lint:
	ruff check zomi_nlp/
	mypy zomi_nlp/ --ignore-missing-imports

format:
	black zomi_nlp/ tests/
	ruff check --fix zomi_nlp/

clean:
	rm -rf build/
	rm -rf dist/
	rm -rf *.egg-info
	rm -rf .pytest_cache/
	rm -rf .mypy_cache/
	rm -rf .ruff_cache/
	find . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true
	find . -type f -name "*.pyc" -delete

build: clean
	python -m build
	twine check dist/*

publish: build
	twine upload dist/*

release: lint test build publish
	@echo "Release completed!"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zomi_nlp-0.2.0.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zomi_nlp-0.2.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file zomi_nlp-0.2.0.tar.gz.

File metadata

  • Download URL: zomi_nlp-0.2.0.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for zomi_nlp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2fb24852ce44c3c964a17817de933e91368a30c7c6a8ec0a7be8993197d3a19b
MD5 90666d9c6061433a3e07eab962c80fe8
BLAKE2b-256 43d98bf46817f10794f85c47a6a69612bea3fd07b6286f3d1afd205d29f3e40b

See more details on using hashes here.

File details

Details for the file zomi_nlp-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: zomi_nlp-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for zomi_nlp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c045fb29f84aaa62390321ee78abb73eb488117b7896c70564fe621d969a5560
MD5 3b121ebcb02de46812c59cee369726dc
BLAKE2b-256 381116709e37aabb5996de81f1c8b988c691a55b9ae1ec09a47994bc40e39dcd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page