Skip to main content

A comprehensive Text Normalization library for Central Kurdish (Sorani).

Project description

ckb-textify

PyPI version License: MIT Streamlit App

ckb-textify is an industrial-strength Text Normalization and Transliteration library designed specifically for Central Kurdish (Sorani).

It transforms "messy" real-world text (mixed languages, symbols, math, codes) into clean, spoken Kurdish text, making it the perfect pre-processor for Text-to-Speech (TTS) and NLP models.

🚀 Live Demo

Try the library instantly without installing anything: 👉 Click here to open the Live App

📦 Installation

pip install ckb-textify

Dependencies:

  • eng-to-ipa: For accurate English pronunciation.
  • anyascii: For universal transliteration (Chinese, Russian, etc.).

⚡ Usage

from ckb_textify import convert_all

text = """
سڵاو، پەیوەندی بکە بە 07501234567.
نرخی زێڕ ≈ $2500.
کۆدەکە A1-B2 یە.
سڵاو لە Peter و Xi Jinping.
"""

normalized = convert_all(text)

print(normalized)

# Output:
# سڵاو, پەیوەندی بکە بە سفر حەوت سەد و پەنجا سەد و بیست و سێ چل و پێنج شەست و حەوت.
# نرخی زێڕ نزیکەی دوو هەزار و پێنج سەد دۆلار یە.
# کۆدەکە ئەی یەک داش بی دوو یە.
# سڵاو لە پیتەر و سی جینپینگ.

🌟 Features (v3.0.0)

1. 🌍 Universal Script Support

Transliterates almost any language into Sorani script using a "Latin Bridge" technique.

  • Chinese: 你好 → نی هاو
  • Russian: Путин → پوتین
  • Greek: Χαίρετε → چایرێت
  • German/French: Handles accents (Straẞe → ستراسسە, République → ڕیپەبلیک).

2. ➗ Advanced Math & Science

  • Scientific Notation: 5e-23 → پێنج جارانی دە توانی سالب بیست و سێ
  • Functions: ln 4 → لۆگاریتمی سروشتی چوار
  • Context-Aware: Distinguishes Division (7/6) from Rates (km/h).

3. 📞 Smart Phone Numbers

Handles Iraqi/Kurdish formats with intelligent grouping (4-3-2-2).

  • 07501234567 → سفر حەوت سەد و پەنجا...
  • +964... → کۆ نۆ سەد و شەست و چوار...

4. 🔡 English Transliteration (IPA)

Uses the International Phonetic Alphabet to pronounce English words correctly.

  • Phone → فۆن (Not "پھۆنە")
  • Google → گووگڵ
  • Acronyms: GPT → جی پی تی

5. 💻 Web & Technical

Reads technical strings character-by-character.

  • Emails: info@gmail.com → ئای ئێن ئێف ئۆ ئەت جیمەیڵ دۆت کۆم
  • URLs: www.razwan.net → دابڵیو دابڵیو دابڵیو دۆت ئاڕ ئەی زێت یو ئەی ئێن دۆت نێت
  • Codes: A1-B2 → ئەی یەک داش بی دوو

6. 📏 Units & Measurements

Solves ambiguity between units and nouns.

  • 10m → دە مەتر (Unit) vs m → m (Noun/Letter)
  • 120km/h → سەد و بیست کیلۆمەتر بۆ هەر کاتژمێرێک

🎛️ Configuration

You can disable specific modules if needed:

config = {
    "phone_numbers": False,
    "foreign": False  # Disable Chinese/Russian transliteration
}

convert_all(text, config=config)

🤝 Contributing

Contributions are widely welcomed! If you have ideas for new rules, found a bug, or want to add support for more units, please feel free to open a Pull Request.

  1. Fork the repository on GitHub.
  2. Clone your fork locally.
  3. Create a new branch for your feature (git checkout -b feature/amazing-feature).
  4. Run Tests to ensure everything is working (python -m unittest discover tests).
  5. Commit your changes.
  6. Push to the branch and open a Pull Request.

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckb_textify-3.0.0.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ckb_textify-3.0.0-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file ckb_textify-3.0.0.tar.gz.

File metadata

  • Download URL: ckb_textify-3.0.0.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ckb_textify-3.0.0.tar.gz
Algorithm Hash digest
SHA256 bf3ddc0e8600f0a26368e0a13718dd5bd71f899797a586c3dbd20c9409d66605
MD5 9f1d31fb694b15b5fe875f08b757bcb3
BLAKE2b-256 56fba0045c8419b0095140f71b31bfcf4e39396492cc9dd96375b842e7ead647

See more details on using hashes here.

File details

Details for the file ckb_textify-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: ckb_textify-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for ckb_textify-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f0a7782b954c34c40e20b16151d39fb9f2704f0dcf78601cebc287d3a302c98
MD5 c6ae963654c1c54dfe8e9c8ce5e621c4
BLAKE2b-256 500d60bbb393f02f2404b97dac1c71789466455a3b3f0014482e35678b47d4ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page