Skip to main content

OVOS TTS plugin for Kokoro — 82M parameter multilingual TTS by hexgrad

Project description

ovos-tts-plugin-kokoro

Status: Proof of Concept

POC status — experimental, not for production, may be abandoned. No API stability promise.

OVOS TTS plugin for Kokoro — an 82M parameter multilingual TTS model by hexgrad. Same engine used by VoiceMode, now wired up for the standard OVOS voice assistant.

Install

pip install ovos-tts-plugin-kokoro

espeak-ng is required for the underlying G2P stack:

# Debian/Ubuntu
sudo apt-get install espeak-ng
# macOS
brew install espeak-ng

English voices also need spaCy's en_core_web_sm model. Misaki (the Kokoro G2P library) attempts to download it on first use but does not reload it in the same process, so you'll want to install it ahead of time:

python -m spacy download en_core_web_sm

For Japanese or Chinese voices, install the optional G2P extras:

pip install "ovos-tts-plugin-kokoro[ja,zh]"

Linux: CPU-only torch (saves ~2GB)

On Linux, pip defaults to the CUDA torch wheel (~2.5GB). If you don't need GPU support, install torch from the CPU index first:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install ovos-tts-plugin-kokoro

On macOS, this is not needed — PyPI torch is already CPU-only (~60MB). With uv, torch automatically resolves to the CPU-only wheel via the tool.uv.sources block in pyproject.toml.

Configuration

{
  "tts": {
    "module": "ovos-tts-plugin-kokoro",
    "ovos-tts-plugin-kokoro": {
      "voice": "af_bella"
    }
  }
}

Voice options

Kokoro ships 56 built-in voices across 9 languages. The voice id encodes language + gender:

Prefix Language Examples
af_ American English (F) af_bella, af_heart, af_nicole
am_ American English (M) am_michael, am_onyx, am_eric
bf_ British English (F) bf_alice, bf_emma, bf_lily
bm_ British English (M) bm_george, bm_fable, bm_daniel
jf_ / jm_ Japanese jf_alpha, jm_kumo (needs [ja] extra)
zf_ / zm_ Mandarin zf_xiaoxiao, zm_yunjian (needs [zh] extra)
ef_ / em_ Spanish ef_dora, em_alex
ff_ French (F) ff_siwis
hf_ / hm_ Hindi hf_alpha, hm_omega
if_ / im_ Italian if_sara, im_nicola
pf_ / pm_ Brazilian Portuguese pf_dora, pm_alex

The voice id determines which Kokoro language pipeline is used, regardless of the OVOS active language. Picking bm_george will speak through the British pipeline even if lang is en-US.

See the full hexgrad/Kokoro-82M VOICES.md for samples.

Language support

The plugin maps the active OVOS language (BCP-47, e.g. fr-FR) to a Kokoro single-letter language code:

OVOS lang Kokoro code Language
en / en-us a American English
en-gb b British English
es e Spanish
fr f French
hi h Hindi
it i Italian
ja j Japanese
pt / pt-br p Brazilian Portuguese
zh z Mandarin

Lookup tries the full BCP-47 tag first (e.g. en-gb), then falls back to the base subtag, then to American English. Unknown languages fall back to American English with a log line. The voice id always wins over the language map — a voice prefixed bm_ always uses the British pipeline.

Override the language map

{
  "tts": {
    "module": "ovos-tts-plugin-kokoro",
    "ovos-tts-plugin-kokoro": {
      "voice": "af_bella",
      "speed": 1.0,
      "language_aliases": {
        "en": "b"
      },
      "preload_languages": ["en", "fr"]
    }
  }
}
Key Type Default Description
voice str af_bella Any built-in voice id (see table above).
speed float 1.0 Playback speed multiplier passed to KPipeline.
sample_rate int 16000 Output sample rate in Hz. Kokoro's native rate is 24000; the plugin resamples.
device str or null "cpu" Torch device — "cpu", "cuda", "mps", or null to let Kokoro auto-select.
language_aliases dict {} Override or extend the BCP-47 → Kokoro code map.
preload_languages list[str] [] BCP-47 codes to load eagerly during plugin init instead of lazy-loading.

Memory note: Each loaded language pipeline holds the 82M parameter model + a g2p stack. The plugin caches one pipeline per (language, device) pair, so leaving preload_languages empty and letting the cache warm on demand keeps the resident set small.

Apple Silicon note: Despite MPS being available on M-series Macs, CPU is the fastest device for Kokoro on Apple Silicon. The vocoder leans heavily on torch.stft/istft, which are weak spots on the Metal backend — measured RTF on an M3 Max was ~0.08 on CPU vs ~0.40 on MPS. The default of "cpu" is intentional; only set device to "cuda" if you actually have a discrete NVIDIA GPU.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ovos_tts_plugin_kokoro-0.2.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ovos_tts_plugin_kokoro-0.2.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file ovos_tts_plugin_kokoro-0.2.0.tar.gz.

File metadata

  • Download URL: ovos_tts_plugin_kokoro-0.2.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ovos_tts_plugin_kokoro-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fbfea4d36589ae1652483e2d52ff51320e6b67bd597b869e5e8bee7202ff2759
MD5 38563fd0fc9146c737301ed6a7d3fcb9
BLAKE2b-256 1b665a8c7c7480cff955492ea5ab139203df749d128c4af2ce3a47ea22c85fa9

See more details on using hashes here.

File details

Details for the file ovos_tts_plugin_kokoro-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ovos_tts_plugin_kokoro-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b7f76237ebfbb5dbb0093b76c767d07f1d8daf3ffd82c076421f36e5b8c675c
MD5 97790cad1f3ff1346144eee5ce9990fd
BLAKE2b-256 173e2fd1de981b1889e3b03525fde30ac3f9ef1a271bac23685434fa4ff1c11c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page