Skip to main content

Central Kurdish Grapheme-to-Phoneme (G2P) converter and Syllabifier for TTS.

Project description

Central Kurdish G2P (ckb_g2p)

PyPI version Streamlit App License: MIT Python 3.8+

A linguistically accurate Grapheme-to-Phoneme (G2P) converter and Syllabifier for Central Kurdish (Sorani).

Designed specifically for training modern Text-to-Speech (TTS) models (VITS, FastSpeech2, Glow-TTS) by providing robust phonetization, stress marking, and syllable boundaries.

(کوردی) دەربارەی پڕۆژە

ئەم پڕۆژەیە ئامرازێکی پێشکەوتووە بۆ گۆڕینی دەقی کوردی (سۆرانی) بۆ فۆنێم و بڕگە. بەتایبەت دیزاین کراوە بۆ سیستەمەکانی دروستکردنی دەنگ و ڕاهێنانی مۆدێلەکانی زیرەکی دەستکرد.


🌟 Why Use This?

Generic G2P tools often fail on Kurdish phonology. ckb_g2p solves these specific challenges:

Feature Problem in Generic Tools Solution in ckb_g2p
Palatalization Treats all 'k' and 'g' the same. Distinguishes Heavy (Postalveolar t͡ʃ, d͡ʒ) vs Light (Dental t̪͡ʃ̟, d̪͡ʒ̟) based on vowel context.
Schwa Insertion Fails on clusters like "grft". Automatically inserts Bizroka (/ɪ/) to fix illegal consonant clusters (gɪ.ɾɪft).
Geminate Consonants Merges double letters. Preserves true geminates or splits them if phonologically required (e.g., dat̪͡ʃɛnnda.t̪͡ʃ̟ɛ.ˈnɪn).
Stress (Prosody) Ignores stress. Smartly assigns stress (ˈ). Handles Negative Verb shifts (nachuˈna.t̪͡ʃ̟uː) vs Nouns (kurdkurd).
Complex Onsets Incorrectly splits clusters. Respects valid onsets like kw and cy (wushawu.ʃa).

🔗 Live Demos


📦 Installation

pip install ckb_g2p

Dependencies: This library automatically installs ckb-textify for normalizing numbers (1991hazar...), dates, and symbols.


🚀 Usage

Basic Conversion

from ckb_g2p import Converter

# Default: Normalization=ON, Pauses=ON, Stress=OFF
converter = Converter()

text = "کوردستان"
ipa = converter.syllabify(text)
print(ipa)
# Output: kuɾ.dɪs.tän

TTS-Ready Output (With Stress)

For training TTS models, you want explicit stress markers and pause tokens.

# Enable stress marking
converter = Converter(use_stress=True, use_pause_markers=True)

# Handles negative verbs correctly (Stress on first syllable)
text = "نەچوو بۆ بازاڕ, لە ساڵی 1991."
ipa = converter.syllabify(text)

print(ipa)
# Output: ˈna.t̪͡ʃ̟uː bo̞ bä.ˈzäɾ | la sä.ˈɫiː ha.ˈzäɾ w no̞.ˈsad w na.ˈwa.du ˈjak ||

Configuration Options

Argument Type Default Description
use_stress bool False Adds primary stress marker (ˈ) to the appropriate syllable.
use_pause_markers bool True Converts punctuation to IPA boundaries (| short, || long).
normalize bool True Uses ckb-textify to convert numbers/symbols to text before processing.

🗣️ Phoneme Inventory

We use a precise IPA set to capture allophonic variations critical for natural speech synthesis.

Consonants (Key Distinctions)

Grapheme IPA Type Description
چ t̪͡ʃ̟ Light (Dental) Standard "ch". Tongue tip touches teeth.
ک t͡ʃ Heavy (Postalveolar) Palatalized /k/ before front vowels (i, e, y). Like English "Chair".
ج d̪͡ʒ̟ Light (Dental) Standard "j". Tongue tip touches teeth.
گ d͡ʒ Heavy (Postalveolar) Palatalized /g/ before front vowels. Like English "Jack".
ڵ ɫ Velarized "Dark L", distinct from clear l.
ڕ r Trill Rolled R, distinct from tap ɾ.

🛠️ Customizing Pronunciation

If the rule-based engine fails on a specific word (e.g., a foreign name), you can manually override it by editing src/ckb_g2p/resources/exceptions.csv inside the package or locally mapping exceptions before processing.


🤝 Contributing

Contributions are welcome!

  1. Fork the repository.
  2. Create a feature branch.
  3. Submit a Pull Request.

👨‍💻 Author

Developed by Razwan M. Haji.

Special thanks to the open-source community and the contributors of ckb-textify.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckb_g2p-2.0.0.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ckb_g2p-2.0.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file ckb_g2p-2.0.0.tar.gz.

File metadata

  • Download URL: ckb_g2p-2.0.0.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ckb_g2p-2.0.0.tar.gz
Algorithm Hash digest
SHA256 0d3598ab0eacfe68dce473a502e55a198c38a5ada2b0075b71f63410b8d7bc9c
MD5 a0b4ae91177864353801eb7f7f0e5046
BLAKE2b-256 3648f4c82871fb225b280e5386e6f3cf2420fdd3a1cd9af0e5b2ebd68722da48

See more details on using hashes here.

File details

Details for the file ckb_g2p-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: ckb_g2p-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ckb_g2p-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e3dfb0b9f24265dbbec4e7a8a018d08786c38495d636c968da0722e14b7b66f
MD5 970f7688e6ee610960d89250bb23721f
BLAKE2b-256 5987227d73e3cb59e4eee59d75fa7498ee43cc9fe38974010e9636c85e8c05fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page