Skip to main content

A unified G2P (Grapheme-to-Phoneme) library for Kokoro TTS

Project description

kokorog2p

A unified multi-language G2P (Grapheme-to-Phoneme) library for Kokoro TTS.

kokorog2p converts text to phonemes optimized for the Kokoro text-to-speech system. It provides:

  • Multi-language support: English (US/GB), German, French, Czech, Chinese, Japanese
  • Dictionary-based lookup with comprehensive lexicons
    • English: 100k+ entries (gold/silver tiers)
    • German: 738k+ entries from Olaph/IPA-Dict
    • French: Gold-tier dictionary
    • Czech, Chinese, Japanese: Rule-based and specialized engines
  • espeak-ng integration as a fallback for out-of-vocabulary words
  • Automatic IPA to Kokoro phoneme conversion
  • Number and currency handling for supported languages
  • Stress assignment based on linguistic rules

Installation

# Core package (no dependencies)
pip install kokorog2p

# With English support
pip install kokorog2p[en]

# With German support
pip install kokorog2p[de]

# With French support
pip install kokorog2p[fr]

# With espeak-ng backend
pip install kokorog2p[espeak]

# Full installation (all languages and backends)
pip install kokorog2p[all]

Quick Start

from kokorog2p import phonemize

# English (US)
phonemes = phonemize("Hello world!", language="en-us")
print(phonemes)  # həlˈoʊ wˈɜːld!

# British English
phonemes = phonemize("Hello world!", language="en-gb")
print(phonemes)  # həlˈəʊ wˈɜːld!

# German
phonemes = phonemize("Guten Tag!", language="de")
print(phonemes)  # ɡuːtn̩ taːk!

# French
phonemes = phonemize("Bonjour!", language="fr")
print(phonemes)

# Chinese
phonemes = phonemize("你好", language="zh")
print(phonemes)

Advanced Usage

from kokorog2p import get_g2p

# English with custom settings
g2p_en = get_g2p("en-us", use_espeak_fallback=True)
tokens = g2p_en("The quick brown fox jumps over the lazy dog.")
for token in tokens:
    print(f"{token.text}{token.phonemes}")

# German with lexicon and number handling
g2p_de = get_g2p("de")
tokens = g2p_de("Es kostet 42 Euro.")
for token in tokens:
    print(f"{token.text}{token.phonemes}")

# French with fallback support
g2p_fr = get_g2p("fr", use_espeak_fallback=True)
tokens = g2p_fr("C'est magnifique!")
for token in tokens:
    print(f"{token.text}{token.phonemes}")

Supported Languages

Language Code Dictionary Size Number Support Status
English (US) en-us 100k+ entries Production
English (GB) en-gb 100k+ entries Production
German de 738k+ entries Production
French fr Gold dictionary Production
Czech cs Rule-based - Production
Chinese zh pypinyin - Production
Japanese ja pyopenjtalk - Production

Phoneme Inventory

kokorog2p uses Kokoro's 45-phoneme vocabulary:

Vowels (US)

  • Monophthongs: æ ɑ ə ɚ ɛ ɪ i ʊ u ʌ ɔ
  • Diphthongs: aɪ aʊ eɪ oʊ ɔɪ

Consonants

  • Stops: p b t d k ɡ
  • Fricatives: f v θ ð s z ʃ ʒ h
  • Affricates: tʃ dʒ
  • Nasals: m n ŋ
  • Liquids: l ɹ
  • Glides: w j

Suprasegmentals

  • Primary stress: ˈ
  • Secondary stress: ˌ

License

Apache2 License - see LICENSE for details.

Credits

kokorog2p consolidates functionality from:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kokorog2p-0.1.3.tar.gz (8.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kokorog2p-0.1.3-py3-none-any.whl (8.7 MB view details)

Uploaded Python 3

File details

Details for the file kokorog2p-0.1.3.tar.gz.

File metadata

  • Download URL: kokorog2p-0.1.3.tar.gz
  • Upload date:
  • Size: 8.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for kokorog2p-0.1.3.tar.gz
Algorithm Hash digest
SHA256 615eeb20cefe065a05fdeadb66b602029cf621dc8f9b8c232128b85051dc768a
MD5 f2e3764a787dca97fc10d7c459c35ce9
BLAKE2b-256 e6d8ff0f1c0ad68bc794993fc70d5bcf0b8942f005dbac54db9df8581652be4b

See more details on using hashes here.

File details

Details for the file kokorog2p-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: kokorog2p-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 8.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for kokorog2p-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4e181f9b95e51840b423c08ce77c2ff06887b2eac87e7319f18686dc660f3ff0
MD5 67d82de4a5ed23a70b37398bdc37094f
BLAKE2b-256 80df156cd57b7a9477ab19fedd54013e967084e680c71772833cbb40a3f192a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page