Skip to main content

Central Kurdish Grapheme-to-Phoneme (G2P) converter and Syllabifier for TTS.

Project description

Central Kurdish G2P (ckb_g2p)

PyPI version Streamlit App License: MIT

A linguistically accurate Grapheme-to-Phoneme (G2P) converter and Syllabifier for Central Kurdish (Sorani), designed specifically for modern Text-to-Speech (TTS) pipelines (VITS, FastSpeech2, etc.).

🔗 Live Demos & Resources

Project Description Links
ckb_g2p Phonemizer & Syllabifier Live G2P DemoGitHub
ckb-textify Text Normalizer (Prerequisite) Live Normalizer DemoGitHub

✨ Key Features

This library handles the complex phonological rules that generic G2P tools miss:

  1. Context-Aware Palatalization:
    • Distinguishes between the Dental Affricates (standard چ / ج) and Postalveolar Affricates (palatalized ک / گ).
    • Example: کێوارt͡ʃɛ.wäɾ (Heavy) vs چێوارt̪͡ʃ̟ɛ.wäɾ (Light).
  2. Schwa (Bizroka) Insertion:
    • Automatically inserts /ɪ/ to break illegal consonant clusters based on sonority rules.
    • Example: گرفت (grft) → gɪ.ɾɪft.
  3. Advanced Syllabification:
    • Respects complex onsets (e.g., kw, cy) while splitting others correctly.
    • Example: ووشەwu.ʃa.
  4. Prosody & Stress (Configurable):
    • Noun/Adj: Stress on final syllable (ˈ).
    • Negative Verbs: Stress shifts to initial syllable (e.g., نەچووˈna.t̪͡ʃ̟uː).
  5. Foreign Text Support:
    • Powered by ckb-textify. Automatically converts numbers (1991), symbols ($), and English text to Kurdish phonemes before processing.

📦 Installation

pip install ckb_g2p

🚀 Usage

1. Basic Usage (Python)

from ckb_g2p import Converter

# Initialize (Default: Stress=OFF, Pauses=ON, Normalization=ON)
converter = Converter()

text = "کوردستان"
ipa = converter.syllabify(text)
print(ipa)
# Output: kuɾ.dɪs.tän

2. Advanced TTS Configuration

You can toggle specific features to match your TTS model's requirements.

# Initialize with specific TTS options
converter = Converter(
    use_stress=True,        # Mark primary stress (ˈ)
    use_pause_markers=True, # Convert punctuation to | and ||
    normalize=True          # Use ckb-textify to clean text first
)

text = "نەچوو بۆ بازاڕ, لە ساڵی 1991."
ipa = converter.syllabify(text)

print(ipa)
# Output: ˈna.t̪͡ʃ̟uː bo̞ bä.ˈzäɾ | la sä.ˈɫiː ha.ˈzäɾ w no̞.ˈsad w na.ˈwa.du ˈjak ||

Configuration Parameters

Parameter Type Default Description
use_stress bool False Adds ˈ to the stressed syllable. Smartly handles negative verbs.
use_pause_markers bool True Converts punctuation to IPA pause boundaries (`
normalize bool True Uses ckb-textify to convert numbers, symbols, and Latin text before G2P.

🗣️ Phoneme Set

To ensure high-quality audio generation, we use precise IPA notation to distinguish allophones:

Grapheme Sound Type IPA Description
چ Standard t̪͡ʃ̟ Light / Dental: Tongue touches teeth.
ج Standard d̪͡ʒ̟ Light / Dental: Tongue touches teeth.
ک Palatalized t͡ʃ Heavy / Postalveolar: Like English "Chair". (Before i, e, y)
گ Palatalized d͡ʒ Heavy / Postalveolar: Like English "Jack". (Before i, e, y)

🤝 Contributing

Contributions are welcome! Whether it's fixing a bug, improving phonological rules, or adding documentation, please feel free to submit a Pull Request on GitHub.

  1. Fork the project.
  2. Create your feature branch (git checkout -b feature/AmazingFeature).
  3. Commit your changes (git commit -m 'Add some AmazingFeature').
  4. Push to the branch (git push origin feature/AmazingFeature).
  5. Open a Pull Request.

👨‍💻 Author

Developed by Razwan M. Haji.

Special thanks to the open-source community and the contributors of ckb-textify, eng-to-ipa, and anyascii.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckb_g2p-1.0.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ckb_g2p-1.0.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file ckb_g2p-1.0.0.tar.gz.

File metadata

  • Download URL: ckb_g2p-1.0.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ckb_g2p-1.0.0.tar.gz
Algorithm Hash digest
SHA256 724ff3b033d61ea6e5f91b171f681e9ce1af5fc67eb822c4732032a6e966b991
MD5 a69a63678823bf418aece54b150d1684
BLAKE2b-256 aff41a53c8f4b427dafd1d31deecb6ddf847f78853fbdbb46674fc2b9377f577

See more details on using hashes here.

File details

Details for the file ckb_g2p-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ckb_g2p-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ckb_g2p-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 763a6f82f839c9747b75de597ce13095c29b908f7a619f9f41a9bc98fb435761
MD5 c4b221c5ce643de6218c0f15fb37b839
BLAKE2b-256 5b192b86a7b8a1c5e39c3e0ca6a6a968e90be4f16a8bfd228d86808e9de2d327

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page