Skip to main content

A modular, dialect-aware Zomi syllabification library with rule-based, CRF, FST, and neural backends.

Project description

README.md (v0.1.0 — Pre‑Release)


zomi‑syl

A modular, dialect‑aware Zomi syllabification library with rule‑based and CRF backends.

zomi-syl provides a production‑ready syllabifier for Zomi, supporting multiple dialects, multiple backends, and a clean, extensible architecture. It includes:

  • A fast rule‑based syllabifier
  • A statistical CRF syllabifier
  • A unified API
  • A full CLI
  • A backend registry
  • Benchmarking tools
  • Dialect profiles
  • A clean, documented developer workflow

🚀 Features

  • Multiple backends: rule‑based, CRF, transformer‑ready
  • Dialect‑aware syllabification (csy [Siyin], ctd [Tedim] , gnb [Gangte], kmm [Kom Rem], pck [Paite], vap [Vaiphei], smt [Simte], tcz [Thado/Thadou], zom [Zo/Zou], [Mate], [Thangkhal], Zolai Standard, Myanmar Zomi, India Zomi)
  • Unified API (zs.syllabify(), zs.analyze())
  • Full CLI (zomi-syl syllabify, zomi-syl models benchmark, zomi-syl models compare)
  • Benchmarking & evaluation tools
  • Extensible backend architecture
  • Clean developer documentation

📦 Installation

pip install zomi-syl

🧠 Quick Start

Syllabify a word

zomi-syl syllabify itna

Analyze a word

zomi-syl analyze itna --json

Batch syllabify

zomi-syl batch words.txt --output out.txt

🧰 Python API

import zomi_syl as zs

zs.syllabify("itna")
zs.analyze("itna")

🧩 Backends

zomi-syl supports multiple backends through a unified registry:

  • rule — deterministic rule‑based syllabifier
  • crf — statistical CRF syllabifier
  • transformer — placeholder for future transformer models

List available backends:

zomi-syl models list

Show backend metadata:

zomi-syl models info crf

📊 Benchmarking

Single backend

zomi-syl models benchmark crf

Compare multiple backends

zomi-syl models compare rule crf

Compare all backends

zomi-syl models compare --all

🩺 Diagnostics

Run a full backend self‑test:

zomi-syl models doctor

This checks:

  • registry integrity
  • model metadata
  • backend loadability
  • single prediction
  • batch prediction

🌏 Dialect Profiles

Profiles live under:

src/zomi_syl/profiles/

Supported dialects:

  • Gangte | Not Yet
  • Kom | Not Yet
  • Mate | Not Yet
  • Paite | Yes
  • Simte | Not Yet
  • Siyin | Not Yet
  • Tedim | Yes
  • Thangkhaal | Not Yet
  • Thado/Thadou | Not Yet
  • Vaiphei | Not Yet
  • Zo/Zou | Not Yet
  • India Zomi | Not Yet
  • Myanmar Zomi | Not Yet
  • Zolai Standard | Not Yet

Eventhough some dialects are not yet supportted, zomi-syl will give higher 90% accurarcy for all the dialects.

List profiles:

zomi-syl profiles list

Show profile info:

zomi-syl profiles info tedim

🧪 Testing

Run all tests:

pytest

Golden CRF regression data:

tests/golden/crf_golden.tsv

🗂 Project Structure

src/zomi_syl/
    api.py
    cli.py
    backends/
    profiles/
    models/
    evaluation/
    rule_based/
    utils/
    ...
scripts/
docs/
tests/
training/

🛠 Development

Developer documentation lives in:

docs/Developer/

Key guides:

  • Adding new backends
  • Unified Metadata Schema (UMS)
  • CRF training
  • Backend loader
  • Test templates

📄 Changelog

The changelog is generated automatically:

make changelog

Template:

docs/Developer/CHANGELOG_template.md

📦 Release Checklist

See:

docs/RELEASE_CHECKLIST_v0.1.0.md

📜 License

MIT License — see LICENSE.


🙌 Contributing

See:

CONTRIBUTING.md

🔗 Command Reference

Full CLI command tree:

zomi-syl
│
├── syllabify
├── analyze
├── batch
├── benchmark
│
├── profiles list|info|validate
├── datasets list|download|validate
│
├── config show|path|validate|set
├── cache info|clear|remove
│
├── validate
├── download
├── version
│
└── models
    ├── list
    ├── info
    ├── benchmark
    ├── compare
    └── doctor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zomi_syl-0.1.0.tar.gz (138.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zomi_syl-0.1.0-py3-none-any.whl (119.8 kB view details)

Uploaded Python 3

File details

Details for the file zomi_syl-0.1.0.tar.gz.

File metadata

  • Download URL: zomi_syl-0.1.0.tar.gz
  • Upload date:
  • Size: 138.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zomi_syl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1cba2faf23e9df28dbbad682644713178fc6ed67d5ce2aecd13c0b6a9d373c3d
MD5 7f74e9ce911ea4e2e32899fc0f3dbc6c
BLAKE2b-256 402c604d42136e96108de38be21220d214fc51ca9eb1353527cf4f9eacd12530

See more details on using hashes here.

File details

Details for the file zomi_syl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: zomi_syl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 119.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zomi_syl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e347dcbbe0d901553221d71659c2da13f7afb907c66637608698b7c952f299fa
MD5 37c02e5621629ce51c9997defcd5a907
BLAKE2b-256 9a27b2a0cb242f0025b2ff3ce5fed83d16d1caad57ca16a4c592395c6b6ac703

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page