Skip to main content

A modular, dialect-aware Zomi syllabification library with rule-based and CRF backends.

Project description

📦 zomi‑syl

PyPI Version Downloads License CI Status Documentation

zomi‑syl

A modular, dialect‑aware Zomi syllabification library with rule‑based and CRF backends.

zomi-syl provides a production‑ready syllabifier for Zomi, supporting multiple dialects, multiple backends, and a clean, extensible architecture. It includes:

  • A fast rule‑based syllabifier
  • A statistical CRF syllabifier
  • A unified API
  • A full CLI
  • A backend registry
  • Benchmarking tools
  • Dialect profiles
  • A clean, documented developer workflow

🚀 Features

  • Multiple backends: rule‑based, CRF, transformer‑ready
  • Dialect‑aware syllabification (csy [Siyin], ctd [Tedim] , gnb [Gangte], kmm [Kom Rem], pck [Paite], vap [Vaiphei], smt [Simte], tcz [Thado/Thadou], zom [Zo/Zou], [Mate], [Thangkhal], Zolai Standard, Myanmar Zomi, India Zomi)
  • Unified API (zs.syllabify(), zs.analyze())
  • Full CLI (zomi-syl syllabify, zomi-syl models benchmark, zomi-syl models compare)
  • Benchmarking & evaluation tools
  • Extensible backend architecture
  • Clean developer documentation

📦 Installation

pip install zomi-syl

🧠 Quick Start

Syllabify a word

zomi-syl syllabify itna

Analyze a word

zomi-syl analyze itna --json

Batch syllabify

zomi-syl batch words.txt --output out.txt

🧰 Python API

import zomi_syl as zs

zs.syllabify("itna")
zs.analyze("itna")

🧩 Backends

zomi-syl supports multiple backends through a unified registry:

  • rule — deterministic rule‑based syllabifier
  • crf — statistical CRF syllabifier
  • transformer — placeholder for future transformer models

List available backends:

zomi-syl models list

Show backend metadata:

zomi-syl models info crf

📊 Benchmarking

Single backend

zomi-syl models benchmark crf

Compare multiple backends

zomi-syl models compare rule crf

Compare all backends

zomi-syl models compare --all

🩺 Diagnostics

Run a full backend self‑test:

zomi-syl models doctor

This checks:

  • registry integrity
  • model metadata
  • backend loadability
  • single prediction
  • batch prediction

🌏 Dialect Profiles

Profiles live under:

src/zomi_syl/profiles/

Supported dialects:

  • Gangte | Not Yet
  • Kom | Not Yet
  • Mate | Not Yet
  • Paite | Yes
  • Simte | Not Yet
  • Siyin | Not Yet
  • Tedim | Yes
  • Thangkhaal | Not Yet
  • Thado/Thadou | Not Yet
  • Vaiphei | Not Yet
  • Zo/Zou | Not Yet
  • India Zomi | Not Yet
  • Myanmar Zomi | Not Yet
  • Zolai Standard | Not Yet

Eventhough some dialects are not yet supportted, zomi-syl will give higher 90% accurarcy for all the dialects.

List profiles:

zomi-syl profiles list

Show profile info:

zomi-syl profiles info tedim

🧪 Testing

Run all tests:

pytest

Golden CRF regression data:

tests/golden/crf_golden.tsv

🗂 Project Structure

src/zomi_syl/
    api.py
    cli.py
    backends/
    profiles/
    models/
    evaluation/
    rule_based/
    utils/
    ...
scripts/
docs/
tests/
training/

🛠 Development

Developer documentation lives in:

docs/Developer/

Key guides:

  • Adding new backends
  • Unified Metadata Schema (UMS)
  • CRF training
  • Backend loader
  • Test templates

📄 Changelog

The changelog is generated automatically:

make changelog

Template:

docs/Developer/CHANGELOG_template.md

📦 Release Checklist

See:

docs/RELEASE_CHECKLIST_v0.1.0.md

📜 License

MIT License — see LICENSE.


🙌 Contributing

See:

CONTRIBUTING.md

🔗 Command Reference

Full CLI command tree:

zomi-syl
│
├── syllabify
├── analyze
├── batch
├── benchmark
│
├── profiles list|info|validate
├── datasets list|download|validate
│
├── config show|path|validate|set
├── cache info|clear|remove
│
├── validate
├── download
├── version
│
└── models
    ├── list
    ├── info
    ├── benchmark
    ├── compare
    └── doctor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zomi_syl-0.1.910.tar.gz (85.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zomi_syl-0.1.910-py3-none-any.whl (67.1 kB view details)

Uploaded Python 3

File details

Details for the file zomi_syl-0.1.910.tar.gz.

File metadata

  • Download URL: zomi_syl-0.1.910.tar.gz
  • Upload date:
  • Size: 85.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for zomi_syl-0.1.910.tar.gz
Algorithm Hash digest
SHA256 2d452c6c49f3053f05de1af30d6fc12d9e483edc3dd557bc45f74e582ac6b9b2
MD5 bd8675f1f7e1faa76291b3cac02ea3b7
BLAKE2b-256 f137e4eab70186194ffea0c075883739e247161dcc32cc0aaad1826354fde1ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for zomi_syl-0.1.910.tar.gz:

Publisher: publishpypi.yml on ZomiLearner/zomi-syl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zomi_syl-0.1.910-py3-none-any.whl.

File metadata

  • Download URL: zomi_syl-0.1.910-py3-none-any.whl
  • Upload date:
  • Size: 67.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for zomi_syl-0.1.910-py3-none-any.whl
Algorithm Hash digest
SHA256 322e4509ff678fb6b2012e28fedf7b67ee40f2a5d10b825723166911a989d01a
MD5 fe70c6d154f0b558aa293b8ea9034681
BLAKE2b-256 e49028f577aaba55bb828e6b0d6ab94ff64e2798c8585050e4742e13c7af4ae8

See more details on using hashes here.

Provenance

The following attestation bundles were made for zomi_syl-0.1.910-py3-none-any.whl:

Publisher: publishpypi.yml on ZomiLearner/zomi-syl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page