Skip to main content

Central Kurdish Grapheme-to-Phoneme (G2P) converter and Syllabifier for TTS.

Project description

Central Kurdish G2P (ckb_g2p)

PyPI version Streamlit App License: MIT Python 3.8+

A linguistically accurate Grapheme-to-Phoneme (G2P) converter and Syllabifier for Central Kurdish (Sorani).

Designed specifically for training modern Text-to-Speech (TTS) models (VITS, FastSpeech2, Glow-TTS) by providing robust phonetization, stress marking, and syllable boundaries.

(کوردی) دەربارەی پڕۆژە

ئەم پڕۆژەیە ئامرازێکی پێشکەوتووە بۆ گۆڕینی دەقی کوردی (سۆرانی) بۆ فۆنێم و بڕگە. بەتایبەت دیزاین کراوە بۆ سیستەمەکانی دروستکردنی دەنگ و ڕاهێنانی مۆدێلەکانی زیرەکی دەستکرد.


🌟 Why Use This?

Generic G2P tools often fail on Kurdish phonology. ckb_g2p solves these specific challenges:

Feature Problem in Generic Tools Solution in ckb_g2p
Palatalization Treats all 'k' and 'g' the same. Distinguishes Heavy (Postalveolar t͡ʃ, d͡ʒ) vs Light (Dental t̪͡ʃ̟, d̪͡ʒ̟) based on vowel context.
Schwa Insertion Fails on clusters like "grft". Automatically inserts Bizroka (/ɪ/) to fix illegal consonant clusters (gɪ.ɾɪft).
Geminate Consonants Merges double letters. Preserves true geminates or splits them if phonologically required (e.g., dat̪͡ʃɛnnda.t̪͡ʃ̟ɛ.ˈnɪn).
Stress (Prosody) Ignores stress. Smartly assigns stress (ˈ). Handles Negative Verb shifts (nachuˈna.t̪͡ʃ̟uː) vs Nouns (kurdkurd).
Complex Onsets Incorrectly splits clusters. Respects valid onsets like kw and cy (wushawu.ʃa).

🔗 Live Demos


📦 Installation

pip install ckb_g2p

Dependencies: This library automatically installs ckb-textify for normalizing numbers (1991hazar...), dates, and symbols.


🚀 Usage

Basic Conversion

from ckb_g2p import Converter

# Default: Normalization=ON, Pauses=ON, Stress=OFF
converter = Converter()

text = "کوردستان"
ipa = converter.syllabify(text)
print(ipa)
# Output: kuɾ.dɪs.tän

TTS-Ready Output (With Stress)

For training TTS models, you want explicit stress markers and pause tokens.

# Enable stress marking
converter = Converter(use_stress=True, use_pause_markers=True)

# Handles negative verbs correctly (Stress on first syllable)
text = "نەچوو بۆ بازاڕ, لە ساڵی 1991."
ipa = converter.syllabify(text)

print(ipa)
# Output: ˈna.t̪͡ʃ̟uː bo̞ bä.ˈzäɾ | la sä.ˈɫiː ha.ˈzäɾ w no̞.ˈsad w na.ˈwa.du ˈjak ||

Configuration Options

Argument Type Default Description
use_stress bool False Adds primary stress marker (ˈ) to the appropriate syllable.
use_pause_markers bool True Converts punctuation to IPA boundaries (| short, || long).
normalize bool True Uses ckb-textify to convert numbers/symbols to text before processing.

🗣️ Phoneme Inventory

We use a precise IPA set to capture allophonic variations critical for natural speech synthesis.

Consonants (Key Distinctions)

Grapheme IPA Type Description
چ t̪͡ʃ̟ Light (Dental) Standard "ch". Tongue tip touches teeth.
ک t͡ʃ Heavy (Postalveolar) Palatalized /k/ before front vowels (i, e, y). Like English "Chair".
ج d̪͡ʒ̟ Light (Dental) Standard "j". Tongue tip touches teeth.
گ d͡ʒ Heavy (Postalveolar) Palatalized /g/ before front vowels. Like English "Jack".
ڵ ɫ Velarized "Dark L", distinct from clear l.
ڕ r Trill Rolled R, distinct from tap ɾ.

🛠️ Customizing Pronunciation

If the rule-based engine fails on a specific word (e.g., a foreign name), you can manually override it by editing src/ckb_g2p/resources/exceptions.csv inside the package or locally mapping exceptions before processing.


🤝 Contributing

Contributions are welcome!

  1. Fork the repository.
  2. Create a feature branch.
  3. Submit a Pull Request.

👨‍💻 Author

Developed by Razwan M. Haji.

Special thanks to the open-source community and the contributors of ckb-textify.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckb_g2p-2.0.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ckb_g2p-2.0.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file ckb_g2p-2.0.1.tar.gz.

File metadata

  • Download URL: ckb_g2p-2.0.1.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ckb_g2p-2.0.1.tar.gz
Algorithm Hash digest
SHA256 aa961ddb19c27d5b0ffea999bc8e7b61aee84cf6d487f1f39bc55d8de088cbde
MD5 d9ea53b64eb65b70c166e3f9faca793d
BLAKE2b-256 e91c40fb8a9306decc0004244fc2999036b3613d4f32118551826c833c630394

See more details on using hashes here.

File details

Details for the file ckb_g2p-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: ckb_g2p-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ckb_g2p-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bf06c3c7be79c6bbb71ce5f2e704c6ff992403a109b9cb08d961edd516902d6f
MD5 a2e934881a76485178ba3da6da40783f
BLAKE2b-256 dec0640196528b0f05a83624e5039c54654a52af9bb164a2f657757bd9920be0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page