Central Kurdish Grapheme-to-Phoneme (G2P) converter and Syllabifier for TTS.
Project description
Central Kurdish G2P (ckb_g2p)
A linguistically accurate Grapheme-to-Phoneme (G2P) converter and Syllabifier for Central Kurdish (Sorani), designed specifically for modern Text-to-Speech (TTS) pipelines (VITS, FastSpeech2, etc.).
🔗 Live Demos & Resources
| Project | Description | Links |
|---|---|---|
| ckb_g2p | Phonemizer & Syllabifier | Live G2P Demo • GitHub |
| ckb-textify | Text Normalizer (Prerequisite) | Live Normalizer Demo • GitHub |
✨ Key Features
This library handles the complex phonological rules that generic G2P tools miss:
- Context-Aware Palatalization:
- Distinguishes between the Dental Affricates (standard
چ/ج) and Postalveolar Affricates (palatalizedک/گ). - Example:
کێوار→t͡ʃɛ.wäɾ(Heavy) vsچێوار→t̪͡ʃ̟ɛ.wäɾ(Light).
- Distinguishes between the Dental Affricates (standard
- Schwa (Bizroka) Insertion:
- Automatically inserts
/ɪ/to break illegal consonant clusters based on sonority rules. - Example:
گرفت(grft) →gɪ.ɾɪft.
- Automatically inserts
- Advanced Syllabification:
- Respects complex onsets (e.g.,
kw,cy) while splitting others correctly. - Example:
ووشە→wu.ʃa.
- Respects complex onsets (e.g.,
- Prosody & Stress (Configurable):
- Noun/Adj: Stress on final syllable (
ˈ). - Negative Verbs: Stress shifts to initial syllable (e.g.,
نەچوو→ˈna.t̪͡ʃ̟uː).
- Noun/Adj: Stress on final syllable (
- Foreign Text Support:
- Powered by ckb-textify. Automatically converts numbers (
1991), symbols ($), and English text to Kurdish phonemes before processing.
- Powered by ckb-textify. Automatically converts numbers (
📦 Installation
pip install ckb_g2p
🚀 Usage
1. Basic Usage (Python)
from ckb_g2p import Converter
# Initialize (Default: Stress=OFF, Pauses=ON, Normalization=ON)
converter = Converter()
text = "کوردستان"
ipa = converter.syllabify(text)
print(ipa)
# Output: kuɾ.dɪs.tän
2. Advanced TTS Configuration
You can toggle specific features to match your TTS model's requirements.
# Initialize with specific TTS options
converter = Converter(
use_stress=True, # Mark primary stress (ˈ)
use_pause_markers=True, # Convert punctuation to | and ||
normalize=True # Use ckb-textify to clean text first
)
text = "نەچوو بۆ بازاڕ, لە ساڵی 1991."
ipa = converter.syllabify(text)
print(ipa)
# Output: ˈna.t̪͡ʃ̟uː bo̞ bä.ˈzäɾ | la sä.ˈɫiː ha.ˈzäɾ w no̞.ˈsad w na.ˈwa.du ˈjak ||
Configuration Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
use_stress |
bool |
False |
Adds ˈ to the stressed syllable. Smartly handles negative verbs. |
use_pause_markers |
bool |
True |
Converts punctuation to IPA pause boundaries (` |
normalize |
bool |
True |
Uses ckb-textify to convert numbers, symbols, and Latin text before G2P. |
🗣️ Phoneme Set
To ensure high-quality audio generation, we use precise IPA notation to distinguish allophones:
| Grapheme | Sound Type | IPA | Description |
|---|---|---|---|
| چ | Standard | t̪͡ʃ̟ |
Light / Dental: Tongue touches teeth. |
| ج | Standard | d̪͡ʒ̟ |
Light / Dental: Tongue touches teeth. |
| ک | Palatalized | t͡ʃ |
Heavy / Postalveolar: Like English "Chair". (Before i, e, y) |
| گ | Palatalized | d͡ʒ |
Heavy / Postalveolar: Like English "Jack". (Before i, e, y) |
🤝 Contributing
Contributions are welcome! Whether it's fixing a bug, improving phonological rules, or adding documentation, please feel free to submit a Pull Request on GitHub.
- Fork the project.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
👨💻 Author
Developed by Razwan M. Haji.
Special thanks to the open-source community and the contributors of ckb-textify, eng-to-ipa, and anyascii.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ckb_g2p-1.0.0.tar.gz.
File metadata
- Download URL: ckb_g2p-1.0.0.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
724ff3b033d61ea6e5f91b171f681e9ce1af5fc67eb822c4732032a6e966b991
|
|
| MD5 |
a69a63678823bf418aece54b150d1684
|
|
| BLAKE2b-256 |
aff41a53c8f4b427dafd1d31deecb6ddf847f78853fbdbb46674fc2b9377f577
|
File details
Details for the file ckb_g2p-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ckb_g2p-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
763a6f82f839c9747b75de597ce13095c29b908f7a619f9f41a9bc98fb435761
|
|
| MD5 |
c4b221c5ce643de6218c0f15fb37b839
|
|
| BLAKE2b-256 |
5b192b86a7b8a1c5e39c3e0ca6a6a968e90be4f16a8bfd228d86808e9de2d327
|