Skip to main content

A modular, dialect-aware Zomi syllabification library with rule-based and CRF backends.

Project description

📦 zomi‑syl

PyPI Version Downloads License CI Status Documentation

zomi‑syl

A modular, dialect‑aware Zomi syllabification library with rule‑based and CRF backends.

zomi-syl provides a production‑ready syllabifier for Zomi, supporting multiple dialects, multiple backends, and a clean, extensible architecture. It includes:

  • A fast rule‑based syllabifier
  • A statistical CRF syllabifier
  • A unified API
  • A full CLI
  • A backend registry
  • Benchmarking tools
  • Dialect profiles
  • A clean, documented developer workflow

🚀 Features

  • Multiple backends: rule‑based, CRF, transformer‑ready
  • Dialect‑aware syllabification (csy [Siyin], ctd [Tedim] , gnb [Gangte], kmm [Kom Rem], pck [Paite], vap [Vaiphei], smt [Simte], tcz [Thado/Thadou], zom [Zo/Zou], [Mate], [Thangkhal], Zolai Standard, Myanmar Zomi, India Zomi)
  • Unified API (zs.syllabify(), zs.analyze())
  • Full CLI (zomi-syl syllabify, zomi-syl models benchmark, zomi-syl models compare)
  • Benchmarking & evaluation tools
  • Extensible backend architecture
  • Clean developer documentation

📦 Installation

pip install zomi-syl

🧠 Quick Start

Syllabify a word

zomi-syl syllabify itna

Analyze a word

zomi-syl analyze itna --json

Batch syllabify

zomi-syl batch words.txt --output out.txt

🧰 Python API

import zomi_syl as zs

zs.syllabify("itna")
zs.analyze("itna")

🧩 Backends

zomi-syl supports multiple backends through a unified registry:

  • rule — deterministic rule‑based syllabifier
  • crf — statistical CRF syllabifier
  • transformer — placeholder for future transformer models

List available backends:

zomi-syl models list

Show backend metadata:

zomi-syl models info crf

📊 Benchmarking

Single backend

zomi-syl models benchmark crf

Compare multiple backends

zomi-syl models compare rule crf

Compare all backends

zomi-syl models compare --all

🩺 Diagnostics

Run a full backend self‑test:

zomi-syl models doctor

This checks:

  • registry integrity
  • model metadata
  • backend loadability
  • single prediction
  • batch prediction

🌏 Dialect Profiles

Profiles live under:

src/zomi_syl/profiles/

Supported dialects:

  • Gangte | Not Yet
  • Kom | Not Yet
  • Mate | Not Yet
  • Paite | Yes
  • Simte | Not Yet
  • Siyin | Not Yet
  • Tedim | Yes
  • Thangkhaal | Not Yet
  • Thado/Thadou | Not Yet
  • Vaiphei | Not Yet
  • Zo/Zou | Not Yet
  • India Zomi | Not Yet
  • Myanmar Zomi | Not Yet
  • Zolai Standard | Not Yet

Eventhough some dialects are not yet supportted, zomi-syl will give higher 90% accurarcy for all the dialects.

List profiles:

zomi-syl profiles list

Show profile info:

zomi-syl profiles info tedim

🧪 Testing

Run all tests:

pytest

Golden CRF regression data:

tests/golden/crf_golden.tsv

🗂 Project Structure

src/zomi_syl/
    api.py
    cli.py
    backends/
    profiles/
    models/
    evaluation/
    rule_based/
    utils/
    ...
scripts/
docs/
tests/
training/

🛠 Development

Developer documentation lives in:

docs/Developer/

Key guides:

  • Adding new backends
  • Unified Metadata Schema (UMS)
  • CRF training
  • Backend loader
  • Test templates

📄 Changelog

The changelog is generated automatically:

make changelog

Template:

docs/Developer/CHANGELOG_template.md

📦 Release Checklist

See:

docs/RELEASE_CHECKLIST_v0.1.0.md

📜 License

MIT License — see LICENSE.


🙌 Contributing

See:

CONTRIBUTING.md

🔗 Command Reference

Full CLI command tree:

zomi-syl
│
├── syllabify
├── analyze
├── batch
├── benchmark
│
├── profiles list|info|validate
├── datasets list|download|validate
│
├── config show|path|validate|set
├── cache info|clear|remove
│
├── validate
├── download
├── version
│
└── models
    ├── list
    ├── info
    ├── benchmark
    ├── compare
    └── doctor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zomi_syl-0.1.908.tar.gz (85.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zomi_syl-0.1.908-py3-none-any.whl (67.1 kB view details)

Uploaded Python 3

File details

Details for the file zomi_syl-0.1.908.tar.gz.

File metadata

  • Download URL: zomi_syl-0.1.908.tar.gz
  • Upload date:
  • Size: 85.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for zomi_syl-0.1.908.tar.gz
Algorithm Hash digest
SHA256 dd05c837354f4301765790dc7681c704739a985cb61d68f492d4f571d4411970
MD5 e4643159ddef7c91d98a2e4dc2ea5006
BLAKE2b-256 18dbcbed750596057133d46de13ab5d918c6230aead1ba66985b83a37281c047

See more details on using hashes here.

Provenance

The following attestation bundles were made for zomi_syl-0.1.908.tar.gz:

Publisher: publishpypi.yml on ZomiLearner/zomi-syl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zomi_syl-0.1.908-py3-none-any.whl.

File metadata

  • Download URL: zomi_syl-0.1.908-py3-none-any.whl
  • Upload date:
  • Size: 67.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for zomi_syl-0.1.908-py3-none-any.whl
Algorithm Hash digest
SHA256 c6b776d865c462b511be2ecf8f12e4d0870e08d51e96e53f765e8c6f8e37ce5e
MD5 c3c3db0743f1dee9f0dd9930ccd0e438
BLAKE2b-256 78b18bfdcd66c735e88cf4b792e1b712e64ebb585ebae6d57a649e4d06d0d38a

See more details on using hashes here.

Provenance

The following attestation bundles were made for zomi_syl-0.1.908-py3-none-any.whl:

Publisher: publishpypi.yml on ZomiLearner/zomi-syl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page