A unified G2P (Grapheme-to-Phoneme) library for Kokoro TTS
Project description
kokorog2p
A unified multi-language G2P (Grapheme-to-Phoneme) library for Kokoro TTS.
kokorog2p converts text to phonemes optimized for the Kokoro text-to-speech system. It provides:
- Multi-language support: English (US/GB), German, French, Czech, Chinese, Japanese
- Dictionary-based lookup with comprehensive lexicons
- English: 100k+ entries (gold/silver tiers)
- German: 738k+ entries from Olaph/IPA-Dict
- French: Gold-tier dictionary
- Czech, Chinese, Japanese: Rule-based and specialized engines
- espeak-ng integration as a fallback for out-of-vocabulary words
- Automatic IPA to Kokoro phoneme conversion
- Number and currency handling for supported languages
- Stress assignment based on linguistic rules
Installation
# Core package (no dependencies)
pip install kokorog2p
# With English support
pip install kokorog2p[en]
# With German support
pip install kokorog2p[de]
# With French support
pip install kokorog2p[fr]
# With espeak-ng backend
pip install kokorog2p[espeak]
# Full installation (all languages and backends)
pip install kokorog2p[all]
Quick Start
from kokorog2p import phonemize
# English (US)
phonemes = phonemize("Hello world!", language="en-us")
print(phonemes) # həlˈoʊ wˈɜːld!
# British English
phonemes = phonemize("Hello world!", language="en-gb")
print(phonemes) # həlˈəʊ wˈɜːld!
# German
phonemes = phonemize("Guten Tag!", language="de")
print(phonemes) # ɡuːtn̩ taːk!
# French
phonemes = phonemize("Bonjour!", language="fr")
print(phonemes)
# Chinese
phonemes = phonemize("你好", language="zh")
print(phonemes)
Advanced Usage
from kokorog2p import get_g2p
# English with custom settings
g2p_en = get_g2p("en-us", use_espeak_fallback=True)
tokens = g2p_en("The quick brown fox jumps over the lazy dog.")
for token in tokens:
print(f"{token.text} → {token.phonemes}")
# German with lexicon and number handling
g2p_de = get_g2p("de")
tokens = g2p_de("Es kostet 42 Euro.")
for token in tokens:
print(f"{token.text} → {token.phonemes}")
# French with fallback support
g2p_fr = get_g2p("fr", use_espeak_fallback=True)
tokens = g2p_fr("C'est magnifique!")
for token in tokens:
print(f"{token.text} → {token.phonemes}")
Supported Languages
| Language | Code | Dictionary Size | Number Support | Status |
|---|---|---|---|---|
| English (US) | en-us |
100k+ entries | ✓ | Production |
| English (GB) | en-gb |
100k+ entries | ✓ | Production |
| German | de |
738k+ entries | ✓ | Production |
| French | fr |
Gold dictionary | ✓ | Production |
| Czech | cs |
Rule-based | - | Production |
| Chinese | zh |
pypinyin | - | Production |
| Japanese | ja |
pyopenjtalk | - | Production |
Phoneme Inventory
kokorog2p uses Kokoro's 45-phoneme vocabulary:
Vowels (US)
- Monophthongs:
æ ɑ ə ɚ ɛ ɪ i ʊ u ʌ ɔ - Diphthongs:
aɪ aʊ eɪ oʊ ɔɪ
Consonants
- Stops:
p b t d k ɡ - Fricatives:
f v θ ð s z ʃ ʒ h - Affricates:
tʃ dʒ - Nasals:
m n ŋ - Liquids:
l ɹ - Glides:
w j
Suprasegmentals
- Primary stress:
ˈ - Secondary stress:
ˌ
License
Apache2 License - see LICENSE for details.
Credits
kokorog2p consolidates functionality from:
- misaki - G2P engine for Kokoro TTS
- phonemizer - espeak-ng wrapper
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kokorog2p-0.1.3.tar.gz.
File metadata
- Download URL: kokorog2p-0.1.3.tar.gz
- Upload date:
- Size: 8.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
615eeb20cefe065a05fdeadb66b602029cf621dc8f9b8c232128b85051dc768a
|
|
| MD5 |
f2e3764a787dca97fc10d7c459c35ce9
|
|
| BLAKE2b-256 |
e6d8ff0f1c0ad68bc794993fc70d5bcf0b8942f005dbac54db9df8581652be4b
|
File details
Details for the file kokorog2p-0.1.3-py3-none-any.whl.
File metadata
- Download URL: kokorog2p-0.1.3-py3-none-any.whl
- Upload date:
- Size: 8.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e181f9b95e51840b423c08ce77c2ff06887b2eac87e7319f18686dc660f3ff0
|
|
| MD5 |
67d82de4a5ed23a70b37398bdc37094f
|
|
| BLAKE2b-256 |
80df156cd57b7a9477ab19fedd54013e967084e680c71772833cbb40a3f192a7
|