Natural Language Processing for Zomi Language (Zopau)

These details have not been verified by PyPI

Project links

Project description

Zomi NLP

Natural Language Processing toolkit for the Zomi language (Zopau).

Features

🔤 Tokenization - Smart tokenization with clitic splitting, reduplication handling, and compound word support
🏷️ POS Tagging - Rule-based part-of-speech tagging with 600+ lexicon entries
📖 Lemmatization - Morphological lemmatization with clitic removal and affix stripping
🌲 Dependency Parsing - Grammatical structure analysis with Zomi-specific rules
📍 Named Entity Recognition - Entity extraction for PERSON, LOCATION, GPE, DATE, NUMERIC
🔬 Morphological Analysis - Morpheme segmentation and feature extraction
🔌 Pluggable Backends - Use native Zomi, spaCy, or Stanza backends
📊 CoNLL-U Export - Standard 10-column and extended 16-column formats
🚀 Production Ready - CI/CD, type hints, comprehensive testing

Coming Soon (v0.5.0+)

🔤 Word Sense Disambiguation - Context-aware meaning disambiguation
📚 Sense Lexicon - Word sense inventory with examples
📈 Statistical Disambiguation - Frequency-based sense prediction
🏷️ Sense Tagger - Automatic sense annotation
🔧 Nominalizer Detector - Rule-based -na suffix detection with stem alternation handling

Requirements

Python 3.9 or higher
pip (latest version recommended)

Dependencies

Zomi NLP works with either spaCy or Stanza as backends. If both are installed, it will prefer Stanza (more accurate) but fall back to spaCy (faster) if needed.

Installation Options

Minimal Installation (Native Only)

pip install zomi-nlp

With spaCy (Recommended for Speed)

pip install 'zomi-nlp[spacy]'
python -m spacy download en_core_web_sm

With Stanza (Recommended for Accuracy)

pip install 'zomi-nlp[stanza]'

Full installation (Both Backends)

pip install 'zomi-nlp[full]'

Quick Start

from zomi_nlp import load

# Load the pipeline (auto-selects best available backend)
nlp = load()

# Process text
text = "Tuni an ka ne hi."
doc = nlp(text)

# Access tokens
for token in doc:
    print(f"{token.text}\t{token.pos_}\t{token.lemma_}\t{token.ent_type_ or 'N/A'}")

# Output:
# Tuni    DATE    tuni    DATE
# an      NOUN    an      N/A  
# ka      PRON    ka      N/A
# ne      VERB    ne      N/A
# hi      PART    hi      N/A
# .       PUNCT   .       N/A

Native Pipeline Components

Zomi NLP v0.4.0 introduces a complete native pipeline with no external dependencies:

Component	Description
ZomiTokenizer	Clitic splitting, reduplication, compound words, punctuation
ZomiPOSTagger	Rule-based POS tagging with 600+ lexicon entries
ZomiLemmatizer	Morphological lemmatization with irregular form handling
ZomiDependencyParser	Zomi-specific dependency relations (nsubj, obj, case, etc.)
ZomiNER	Named entity recognition for 6+ entity types
ZomiMorphologicalAnalyzer	Morpheme segmentation and feature extraction

CoNLL-U Export

from zomi_nlp import load

nlp = load()
doc = nlp("Ka pai ve.")

# Export to standard CoNLL-U format
for token in doc:
    print(f"{token.text}\t{token.lemma_}\t{token.pos_}\t{token.head}\t{token.dep_}")

# Output format: ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC

Configuration

from zomi_nlp import ZomiConfig, ZomiPipeline

# Use native Zomi pipeline (default, no dependencies)
config = ZomiConfig(parser_backend="native")
nlp = ZomiPipeline(config)

# Use spaCy for speed
config = ZomiConfig(parser_backend="spacy")
nlp = ZomiPipeline(config)

# Use Stanza for accuracy
config = ZomiConfig(parser_backend="stanza")
nlp = ZomiPipeline(config)

# Auto-select best available
config = ZomiConfig(parser_backend="auto")
nlp = ZomiPipeline(config)

CLI Usage

# Check installation status
zomi-nlp --check

# Diagnose issues
zomi-nlp --doctor

# Process text directly
zomi-nlp "Tuni ka pai ve."

# Output:
# Tuni     DATE     tuni
# ka       PRON     ka
# pai      VERB     pai
# hi       PART     hi
# .        PUNCT    .

Checking Installation

from zomi_nlp import check_installation

# Check what's installed
check_installation()

# Get status as dict
status = check_installation(verbose=False)
print(status)

Troubleshooting

"stanza not installed" Warning

If you see warnings about stanza, you have two options:

Install stanza (better accuracy):

pip install stanza

Use spaCy instead (change your config):

config = ZomiConfig(tokenizer_backend="spacy")

"No backend available" Error

Install at least one backend:

pip install 'zomi-nlp[full]'

Getting `None` Values for POS Tags

This happens when no backend is available. The library falls back to a simple tokenizer. Install spaCy or stanza for full functionality.

Development

# Clone repository
git clone https://github.com/ZomiCommunity/zomi-nlp.git
cd zomi-nlp

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run linting
ruff check zomi_nlp/

# Format code
black zomi_nlp/ tests/

Roadmap

Version	Features	Status
v0.1.0	Core architecture + spaCy/Stanza adapters	✅ Released
v0.2.0	spaCy/Stanza backends	✅ Released
v0.3.0	ZomiRuleBasedParser	✅ Released
v0.4.0	Complete native pipeline	✅ Current
v0.5.0	Word embeddings, sense disambiguation	🔜 Planned
v0.6.0	ML-based components	🔜 Planned
v1.0.0	Production ready	🔜 Planned

Planned Features for v0.5.0

ZomiWordSenseDisambiguator - Context-aware meaning disambiguation
ZOMI_SENSE_LEXICON - Word sense inventory with examples
StatisticalDisambiguator - Frequency-based sense prediction
ZomiSenseTagger - Automatic sense annotation
ZomiNominalizerDetector - Rule-based -na suffix detection with stem alternation handling (e.g., pia → piakna, um → upna)

Contributing

Contributions welcome! See CONTRIBUTING for guidelines.

License

Apache License 2.0

Citation

@software{zomi_nlp_2026,
  title={Zomi NLP: Natural Language Processing for Zomi Language},
  author={Zomi NLP Community},
  year={2026},
  url={https://github.com/ZomiCommunity/zomi-nlp}
}

Acknowledgments

Built with ❤️ for the Zomi community
Uses spaCy and Stanza as backends
Inspired by universal dependencies framework

📝 Summary of Changes

Section	Change
Features	Added lemmatization, morphological analysis, CoNLL-U export
Coming Soon	New section listing planned features (disambiguator, sense lexicon, etc.)
Native Pipeline	New section documenting all native components
CoNLL-U Export	New section with example
CLI Usage	New section with command examples
Roadmap	Converted to table format, marked v0.4.0 as current
Planned Features	Detailed list of v0.5.0 features including those you asked about

The planned features section clearly indicates that ZomiWordSenseDisambiguator, ZOMI_SENSE_LEXICON, StatisticalDisambiguator, ZomiSenseTagger, and ZomiNominalizerDetector are coming in v0.5.0, not yet available in v0.4.0. 🚀

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.1

May 3, 2026

0.4.0

Apr 27, 2026

0.3.0

Apr 25, 2026

0.2.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zomi_nlp-0.4.1.tar.gz (54.8 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zomi_nlp-0.4.1-py3-none-any.whl (54.0 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file zomi_nlp-0.4.1.tar.gz.

File metadata

Download URL: zomi_nlp-0.4.1.tar.gz
Upload date: May 3, 2026
Size: 54.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for zomi_nlp-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`3a1b82deb6f987c90efa77254f9ae6e20ee12c1e57f1310831b72921631b525c`
MD5	`0580ed43ce6031ebb3e18c82430ab808`
BLAKE2b-256	`a2f1276c14a60ebc84654618b8a4eb805263b67fc949df92b99b4eef0f1bd341`

See more details on using hashes here.

File details

Details for the file zomi_nlp-0.4.1-py3-none-any.whl.

File metadata

Download URL: zomi_nlp-0.4.1-py3-none-any.whl
Upload date: May 3, 2026
Size: 54.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for zomi_nlp-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`be6ac3af27f3d882364d185a14cddc7f550a3cb93453dbda4121ce63b74c51d2`
MD5	`a658b114df8e98fa3eef961ae7c33b0c`
BLAKE2b-256	`c2a4bfbe5367c48ca559ce0e97a2e9858833d60e5cf956d5da2f11e78ac1c5be`

See more details on using hashes here.

zomi-nlp 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Zomi NLP

Features

Coming Soon (v0.5.0+)

Requirements

Dependencies

Installation Options

Minimal Installation (Native Only)

With spaCy (Recommended for Speed)

With Stanza (Recommended for Accuracy)

Full installation (Both Backends)

Quick Start

Native Pipeline Components

CoNLL-U Export

Configuration

CLI Usage

Checking Installation

Troubleshooting

"stanza not installed" Warning

"No backend available" Error

Getting None Values for POS Tags

Development

Roadmap

Planned Features for v0.5.0

Contributing

License

Citation

Acknowledgments

📝 Summary of Changes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Getting `None` Values for POS Tags