Skip to main content

A modular, dialect-aware Zomi syllabification library with rule-based and CRF backends.

Project description

📦 zomi‑syl

PyPI Version Downloads License CI Status Documentation

zomi‑syl

A modular, dialect‑aware Zomi syllabification library with rule‑based and CRF backends.

zomi-syl provides a production‑ready syllabifier for Zomi, supporting multiple dialects, multiple backends, and a clean, extensible architecture. It includes:

  • A fast rule‑based syllabifier
  • A statistical CRF syllabifier
  • A unified API
  • A full CLI
  • A backend registry
  • Benchmarking tools
  • Dialect profiles
  • A clean, documented developer workflow

🚀 Features

  • Multiple backends: rule‑based, CRF, transformer‑ready
  • Dialect‑aware syllabification (csy [Siyin], ctd [Tedim] , gnb [Gangte], kmm [Kom Rem], pck [Paite], vap [Vaiphei], smt [Simte], tcz [Thado/Thadou], zom [Zo/Zou], [Mate], [Thangkhal], Zolai Standard, Myanmar Zomi, India Zomi)
  • Unified API (zs.syllabify(), zs.analyze())
  • Full CLI (zomi-syl syllabify, zomi-syl models benchmark, zomi-syl models compare)
  • Benchmarking & evaluation tools
  • Extensible backend architecture
  • Clean developer documentation

📦 Installation

pip install zomi-syl

🧠 Quick Start

Syllabify a word

zomi-syl syllabify itna

Analyze a word

zomi-syl analyze itna --json

Batch syllabify

zomi-syl batch words.txt --output out.txt

🧰 Python API

import zomi_syl as zs

zs.syllabify("itna")
zs.analyze("itna")

🧩 Backends

zomi-syl supports multiple backends through a unified registry:

  • rule — deterministic rule‑based syllabifier
  • crf — statistical CRF syllabifier
  • transformer — placeholder for future transformer models

List available backends:

zomi-syl models list

Show backend metadata:

zomi-syl models info crf

📊 Benchmarking

Single backend

zomi-syl models benchmark crf

Compare multiple backends

zomi-syl models compare rule crf

Compare all backends

zomi-syl models compare --all

🩺 Diagnostics

Run a full backend self‑test:

zomi-syl models doctor

This checks:

  • registry integrity
  • model metadata
  • backend loadability
  • single prediction
  • batch prediction

🌏 Dialect Profiles

Profiles live under:

src/zomi_syl/profiles/

Supported dialects:

  • Gangte | Not Yet
  • Kom | Not Yet
  • Mate | Not Yet
  • Paite | Yes
  • Simte | Not Yet
  • Siyin | Not Yet
  • Tedim | Yes
  • Thangkhaal | Not Yet
  • Thado/Thadou | Not Yet
  • Vaiphei | Not Yet
  • Zo/Zou | Not Yet
  • India Zomi | Not Yet
  • Myanmar Zomi | Not Yet
  • Zolai Standard | Not Yet

Eventhough some dialects are not yet supportted, zomi-syl will give higher 90% accurarcy for all the dialects.

List profiles:

zomi-syl profiles list

Show profile info:

zomi-syl profiles info tedim

🧪 Testing

Run all tests:

pytest

Golden CRF regression data:

tests/golden/crf_golden.tsv

🗂 Project Structure

src/zomi_syl/
    api.py
    cli.py
    backends/
    profiles/
    models/
    evaluation/
    rule_based/
    utils/
    ...
scripts/
docs/
tests/
training/

🛠 Development

Developer documentation lives in:

docs/Developer/

Key guides:

  • Adding new backends
  • Unified Metadata Schema (UMS)
  • CRF training
  • Backend loader
  • Test templates

📄 Changelog

The changelog is generated automatically:

make changelog

Template:

docs/Developer/CHANGELOG_template.md

📦 Release Checklist

See:

docs/RELEASE_CHECKLIST_v0.1.0.md

📜 License

MIT License — see LICENSE.


🙌 Contributing

See:

CONTRIBUTING.md

🔗 Command Reference

Full CLI command tree:

zomi-syl
│
├── syllabify
├── analyze
├── batch
├── benchmark
│
├── profiles list|info|validate
├── datasets list|download|validate
│
├── config show|path|validate|set
├── cache info|clear|remove
│
├── validate
├── download
├── version
│
└── models
    ├── list
    ├── info
    ├── benchmark
    ├── compare
    └── doctor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zomi_syl-0.1.1.tar.gz (140.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zomi_syl-0.1.1-py3-none-any.whl (120.2 kB view details)

Uploaded Python 3

File details

Details for the file zomi_syl-0.1.1.tar.gz.

File metadata

  • Download URL: zomi_syl-0.1.1.tar.gz
  • Upload date:
  • Size: 140.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zomi_syl-0.1.1.tar.gz
Algorithm Hash digest
SHA256 785ed7bcc4b3cf16d2aba9156fabd5086f284f03de8c78ce17d63bdf5afca06a
MD5 a910268383d38032787271dd9582c326
BLAKE2b-256 82cc5fd01add4d9b2fdf1779d6cd2cdfa2ca8a82c34d77a9911ef5ef64c178e7

See more details on using hashes here.

File details

Details for the file zomi_syl-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: zomi_syl-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 120.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zomi_syl-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 feb73255faa532e8de9665f6697ce14d82ceb069404ea54504facd16674fdc45
MD5 d90534de37703c1a2b06792adae3eace
BLAKE2b-256 25641c323b249d8180245e23ae2d903da0488b72e0f909ed8463b25f113c6307

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page