Pure-Python, dependency-free reimplementation of espeak-ng's grapheme-to-phoneme (G2P) engine

These details have not been verified by PyPI

Project description

espyak

A pure-Python reimplementation of espeak-ng's grapheme-to-phoneme (G2P) front-end. Text → phonemes only: no synthesis, no audio, no C extension, no runtime dependencies.

Reproduces the espeak-ng binary (pinned 1.52.0) byte-for-byte on its test sets — a per-language headword sweep (1703/1703, 86 languages) and a real-sentence corpus (438/438, 31 languages). 117 languages bundled. Inputs outside those sets are not all covered yet — see Coverage.

from espyak import G2P

g2p = G2P("en")
g2p.phonemize("hello world")              # 'həlˈəʊ wˈɜːld'   (IPA)
g2p.phonemize("hello world", ipa=False)   # "h@l'oU w'3:ld"   (Kirshenbaum / -x)

G2P("es").phonemize("buenos días")        # 'bwˈenos dˈias'
G2P("de").phonemize("straße")             # 'ʃtɾˈɑːsə'
G2P("ru").phonemize("привет")             # 'prʲivʲˈet'

Why

espyak gives projects espeak-ng's phonemes without the native dependency: nothing to shell out to, no C-extension to build, and the rules are readable and patchable in Python. It drops in as a backend for phoonnx.

Install

pip install -e .          # from a clone (the espeak-ng source data is bundled, ~44 MB)
# or:  uv pip install -e .

Python ≥ 3.9. The espeak-ng dictsource/, phsource/, and lang/ data are bundled under espyak/data/ at the pinned 1.52.0 tag, so nothing is needed system-wide.

Usage

Python

from espyak import G2P

g2p = G2P("en")                      # one translator per language — construct once, reuse

g2p.phonemize("read")                # 'ɹˈiːd'
g2p.phonemize("2024 dogs")           # numbers expand to words, then phonemes

g2p.phonemize("cat", ipa=True)       # 'kˈat'        — Unicode IPA (default)
g2p.phonemize("cat", ipa=False)      # "k'at"        — Kirshenbaum ASCII (espeak -x)
g2p.phonemize("cat", separator="_")  # 'k_ˈa_t'      — separate phonemes
g2p.phonemize("cat", tie="͡")         # tie multi-char phoneme names

Command line

espyak -v en "hello world"           # həlˈəʊ wˈɜːld
espyak -v es "díganme"               # dˈiɣanme
espyak -v fr -x "bonjour"            # bO~Z'ur    (Kirshenbaum)
espyak -v de --sep _ "haus"          # h_ˈaʊ_s
echo "привет" | espyak -v ru -       # read from stdin

Output formats

API argument	CLI flag	effect
(default)	`--ipa`	Unicode IPA with `ˈ`/`ˌ` stress
`ipa=False`	`-x`	Kirshenbaum ASCII
`separator="_"`	`--sep=_`	insert a separator between phonemes
`tie="͡"`	`--tie`	tie character within multi-char names

G2P(lang).phonemize(text, ipa=True, tie=None, separator=None) is the whole surface; see docs/usage.md for details and render() (raw phoneme-string rendering).

How it works

espyak parses espeak-ng's own source data at load time and replays its pipeline in Python:

text → dictionary _list lookup → prefix/suffix retranslation → letter-to-sound rules
     → SetWordStress → phoneme programs (ChangePhoneme/InsertPhoneme) → render (IPA / -x)

Fidelity is inherited from the bundled data; the matcher, stress, number, and phoneme-program logic are re-implemented to match the binary, espeak-ng's quirks included. docs/architecture.md has the module map and pipeline.

Verification

pytest -q                            # unit + fixture tests
python test/sweep.py 25              # per-language _list-headword sweep vs the oracle
python test/corpus_sweep.py          # real-sentence corpus vs the oracle

The reference ("oracle") is a pinned espeak-ng 1.52.0 build, used only to generate expected outputs — espyak never calls it at runtime. Every dictionary *_list headword is a free test case; test/report.md holds the per-language pass rate.

Coverage

The headword sweep samples the first N alphabetic, length ≥ 3 headwords per language (1703 words at N=25); that set and the real-sentence corpus reproduce espeak-ng exactly. Inputs outside those sets can still differ — isolated accented letters spoken as their name (á → "a acute"), bare ordinal suffixes (th, nd), unicode-codepoint names (U+5c1), and some uncommon words. Raise N in test/sweep.py, or widen its word filter, to exercise more of the dictionary.

Project layout

espyak/            the engine (one module per espeak-ng translation unit)
  api.py           public G2P entry point
  dictionary.py    MatchRule / TranslateRules / SetWordStress / LookupDict2
  rule_compiler.py compiledict.c — rule byte encoding + groups
  phoneme_tab.py   phsource loader; phoneme_program.py — ChangePhoneme/InsertPhoneme
  language_data.py per-language translator config (tr_languages.c + voice files)
  numbers.py       TranslateNumber + ordinals/fractions
  render.py        phoneme list → IPA / Kirshenbaum / stress / tie / separator
  data/            bundled espeak-ng dictsource/ phsource/ lang/ @ 1.52.0
docs/              architecture, usage
examples/          runnable usage examples
test/              unit tests, oracle fixtures, sweep + corpus harnesses

Provenance

espyak is an AI-assisted port. The Python was written by an AI coding assistant that read and instrumented espeak-ng's C source; human review has been minimal. It is not an independent clean-room implementation.

License

espyak is GPL-3.0-or-later, the same as espeak-ng — from which it is derived and whose data it bundles under espyak/data/. See LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.2a1 pre-release

Jun 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

espyak-0.0.2a1.tar.gz (11.8 MB view details)

Uploaded Jun 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

espyak-0.0.2a1-py3-none-any.whl (12.1 MB view details)

Uploaded Jun 19, 2026 Python 3

File details

Details for the file espyak-0.0.2a1.tar.gz.

File metadata

Download URL: espyak-0.0.2a1.tar.gz
Upload date: Jun 19, 2026
Size: 11.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for espyak-0.0.2a1.tar.gz
Algorithm	Hash digest
SHA256	`9ff8bf27580570a8b5f8d377d21db23cb4ededa374bf000dc2929c8968715d60`
MD5	`0135be7736032b266317a00d19ac8729`
BLAKE2b-256	`d4e420a8dc1205bdb284399920c3b923bffde9812f11ef4f44a44fa656c20374`

See more details on using hashes here.

File details

Details for the file espyak-0.0.2a1-py3-none-any.whl.

File metadata

Download URL: espyak-0.0.2a1-py3-none-any.whl
Upload date: Jun 19, 2026
Size: 12.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for espyak-0.0.2a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd4fb2218dd5a6cb3e0c989cf7a7e325889af985c4f2471a5ee59824a6277678`
MD5	`ce51c7523f2ecb29900293b8301d5002`
BLAKE2b-256	`720f094b0cec622f150a62ab473eedfe2780cb0218e2343863d4799334d11503`

See more details on using hashes here.

espyak 0.0.2a1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

espyak

Why

Install

Usage

Python

Command line

Output formats

How it works

Verification

Coverage

Project layout

Provenance

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes