Lightweight OpenAI-compatible Kokoro TTS server powered by ONNX Runtime
Project description
fastkokoro
Lightweight OpenAI-compatible Kokoro TTS server powered by ONNX Runtime.
fastkokoro runs the 82M-parameter Kokoro text-to-speech model with low startup
overhead, fast local inference, and a small dependency footprint. It supports CPU
and GPU execution through ONNX Runtime providers, including CUDA, TensorRT, and
OpenVINO when the matching runtime package is installed. The default model is
NVIDIA's optimized ONNX export: nvidia/kokoro-82M-onnx-opt.
The NVIDIA repo's voices.bin uses a raw float32 layout. fastkokoro converts it
once into the .npz voice format expected by kokoro-onnx, so the default model
and voices both come from nvidia/kokoro-82M-onnx-opt.
Install
uv sync
From PyPI:
pip install fastkokoro
For GPU builds on platforms supported by onnxruntime-gpu:
uv sync --extra gpu
Run
uv run fastkokoro
The server starts on http://0.0.0.0:8880 by default.
Docker CPU:
docker build -f Dockerfile.cpu -t fastkokoro:cpu .
docker run -p 8880:8880 fastkokoro:cpu
Docker Hub CPU:
docker run -p 8880:8880 msgflux/fastkokoro:cpu
Docker GPU:
docker build -f Dockerfile.gpu -t fastkokoro:gpu .
docker run --gpus all -p 8880:8880 fastkokoro:gpu
Docker Hub GPU:
docker run --gpus all -p 8880:8880 msgflux/fastkokoro:gpu
Environment variables:
| Variable | Default |
|---|---|
FASTKOKORO_HOST |
0.0.0.0 |
FASTKOKORO_PORT |
8880 |
FASTKOKORO_MODEL_REPO |
nvidia/kokoro-82M-onnx-opt |
FASTKOKORO_MODEL_FILE |
kokoro-82m-v1.0.onnx |
FASTKOKORO_MODEL_PATH |
unset; downloads from Hugging Face |
FASTKOKORO_VOICES_FILE |
voices.bin |
FASTKOKORO_VOICES_INDEX_FILE |
voices.txt |
FASTKOKORO_VOICES_PATH |
unset; downloads and converts NVIDIA voices |
FASTKOKORO_DEFAULT_VOICE |
af_heart |
FASTKOKORO_DEFAULT_LANG |
en-us |
FASTKOKORO_WARMUP |
true |
FASTKOKORO_WARMUP_TEXT |
hello |
FASTKOKORO_ONNX_PROVIDERS |
CPUExecutionProvider |
FASTKOKORO_ONNX_AUTO_PROVIDERS |
false |
FASTKOKORO_ONNX_INTRA_OP_NUM_THREADS |
unset |
FASTKOKORO_ONNX_INTER_OP_NUM_THREADS |
unset |
FASTKOKORO_WARMUP=true runs a short synthesis during startup. This makes the
server take a little longer to become ready, but avoids paying most of the first
request latency on the first user request.
ONNX Runtime Providers
fastkokoro creates the ONNX Runtime session directly, so provider selection is
explicit and predictable.
CPU:
FASTKOKORO_ONNX_PROVIDERS=CPUExecutionProvider uv run fastkokoro
CUDA with CPU fallback:
FASTKOKORO_ONNX_PROVIDERS=CUDAExecutionProvider,CPUExecutionProvider uv run fastkokoro
TensorRT with CUDA and CPU fallback:
FASTKOKORO_ONNX_PROVIDERS=TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider uv run fastkokoro
Intel/OpenVINO builds can use:
FASTKOKORO_ONNX_PROVIDERS=OpenVINOExecutionProvider,CPUExecutionProvider uv run fastkokoro
Set FASTKOKORO_ONNX_AUTO_PROVIDERS=true to pass every provider available in the
installed ONNX Runtime build to the session. Use this mostly for quick local
experiments; production deployments should pin an explicit provider order.
API
Health:
curl http://localhost:8880/health
Models:
curl http://localhost:8880/v1/models
The server exposes the local Kokoro model as kokoro. For client compatibility,
/v1/audio/speech also accepts tts-1 and gpt-4o-mini-tts as aliases, but
they are not listed by /v1/models because the server is not running OpenAI TTS
models.
Voices and Languages
The official Kokoro voice list maps voices to language codes. fastkokoro
accepts the Kokoro language code and common locale aliases, then validates that
the requested voice belongs to the resolved language.
| Language | Request lang values |
Voices |
|---|---|---|
| American English | a, en-us, american |
af_heart, af_alloy, af_aoede, af_bella, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky, am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa |
| British English | b, en-gb, british |
bf_alice, bf_emma, bf_isabella, bf_lily, bm_daniel, bm_fable, bm_george, bm_lewis |
| Japanese | j, ja, ja-jp |
jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo |
| Mandarin Chinese | z, zh, zh-cn, mandarin |
zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang |
| Spanish | e, es, es-es |
ef_dora, em_alex, em_santa |
| French | f, fr, fr-fr |
ff_siwis |
| Hindi | h, hi, hi-in |
hf_alpha, hf_beta, hm_omega, hm_psi |
| Italian | i, it, it-it |
if_sara, im_nicola |
| Brazilian Portuguese | p, pt, pt-br |
pf_dora, pm_alex, pm_santa |
Speech:
curl http://localhost:8880/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "kokoro",
"input": "Hello from fastkokoro.",
"voice": "af_heart",
"response_format": "wav"
}' \
--output speech.wav
Streaming PCM:
curl http://localhost:8880/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "kokoro",
"input": "Streaming from fastkokoro.",
"voice": "af_heart",
"response_format": "pcm",
"stream": true
}' \
--output speech.pcm
OpenAI SDK Examples
The examples use inline script dependencies, so they can run directly with uv
without adding the OpenAI SDK to the project environment.
Start fastkokoro first:
uv run fastkokoro
Save synthesized audio to a file:
uv run examples/tts_save_file.py
Consume streamed audio chunks:
uv run examples/tts_stream_chunks.py
Useful environment variables:
| Variable | Default |
|---|---|
FASTKOKORO_BASE_URL |
http://localhost:8880/v1 |
FASTKOKORO_API_KEY |
fastkokoro |
FASTKOKORO_VOICE |
pf_dora |
FASTKOKORO_TEXT |
Ola, tudo bem? |
FASTKOKORO_TTS_OUTPUT |
speech.wav |
Python
from fastkokoro import FastKokoro
engine = FastKokoro()
audio = engine.create(
"Hello from fastkokoro.",
voice="af_heart",
response_format="wav",
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastkokoro-0.2.0.tar.gz.
File metadata
- Download URL: fastkokoro-0.2.0.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb219f60983fec509a114798fb0cdf3c88227d4d6fbd92f29e04fde588bccfbe
|
|
| MD5 |
6bcdb00afef6a638e6d6bba21f3a126e
|
|
| BLAKE2b-256 |
9e64325452465812cfb59d44146221aac8f0d513db73bd0d2cc2249eb15a0b27
|
Provenance
The following attestation bundles were made for fastkokoro-0.2.0.tar.gz:
Publisher:
publish.yml on msgflux/fastkokoro
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastkokoro-0.2.0.tar.gz -
Subject digest:
eb219f60983fec509a114798fb0cdf3c88227d4d6fbd92f29e04fde588bccfbe - Sigstore transparency entry: 1700891230
- Sigstore integration time:
-
Permalink:
msgflux/fastkokoro@a7202f387d375442c6f9bc471acbb588a4aeab2d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/msgflux
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a7202f387d375442c6f9bc471acbb588a4aeab2d -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file fastkokoro-0.2.0-py3-none-any.whl.
File metadata
- Download URL: fastkokoro-0.2.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29293136ff7758e92b7c4c3a5294dc059f6137f4bcbf3aecd5b578bdc9731a98
|
|
| MD5 |
ce32797ce1960e0b5ddd8930ffe0820a
|
|
| BLAKE2b-256 |
facb7c62dd66d21421f6ccb6307833c1ee5f2bffe14cf46748d256d52a1fbf9d
|
Provenance
The following attestation bundles were made for fastkokoro-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on msgflux/fastkokoro
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastkokoro-0.2.0-py3-none-any.whl -
Subject digest:
29293136ff7758e92b7c4c3a5294dc059f6137f4bcbf3aecd5b578bdc9731a98 - Sigstore transparency entry: 1700891269
- Sigstore integration time:
-
Permalink:
msgflux/fastkokoro@a7202f387d375442c6f9bc471acbb588a4aeab2d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/msgflux
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a7202f387d375442c6f9bc471acbb588a4aeab2d -
Trigger Event:
workflow_run
-
Statement type: