Lightweight OpenAI-compatible Kokoro TTS server powered by ONNX Runtime

Project description

fastkokoro

Lightweight OpenAI-compatible Kokoro TTS server powered by ONNX Runtime.

fastkokoro runs the 82M-parameter Kokoro text-to-speech model with low startup overhead, fast local inference, and a small dependency footprint. It supports CPU and GPU execution through ONNX Runtime providers, including CUDA, TensorRT, and OpenVINO when the matching runtime package is installed. The default model is NVIDIA's optimized ONNX export: nvidia/kokoro-82M-onnx-opt.

The NVIDIA repo's voices.bin uses a raw float32 layout. fastkokoro converts it once into the .npz voice format expected by kokoro-onnx, so the default model and voices both come from nvidia/kokoro-82M-onnx-opt.

Install

uv sync

From PyPI:

pip install fastkokoro

For GPU builds on platforms supported by onnxruntime-gpu:

uv sync --extra gpu

Run

uv run fastkokoro

The server starts on http://0.0.0.0:8880 by default.

Docker CPU:

docker build -f Dockerfile.cpu -t fastkokoro:cpu .
docker run -p 8880:8880 fastkokoro:cpu

Docker Hub CPU:

docker run -p 8880:8880 msgflux/fastkokoro:cpu

Docker GPU:

docker build -f Dockerfile.gpu -t fastkokoro:gpu .
docker run --gpus all -p 8880:8880 fastkokoro:gpu

Docker Hub GPU:

docker run --gpus all -p 8880:8880 msgflux/fastkokoro:gpu

Environment variables:

Variable	Default
`FASTKOKORO_HOST`	`0.0.0.0`
`FASTKOKORO_PORT`	`8880`
`FASTKOKORO_MODEL_REPO`	`nvidia/kokoro-82M-onnx-opt`
`FASTKOKORO_MODEL_FILE`	`kokoro-82m-v1.0.onnx`
`FASTKOKORO_MODEL_PATH`	unset; downloads from Hugging Face
`FASTKOKORO_VOICES_FILE`	`voices.bin`
`FASTKOKORO_VOICES_INDEX_FILE`	`voices.txt`
`FASTKOKORO_VOICES_PATH`	unset; downloads and converts NVIDIA voices
`FASTKOKORO_DEFAULT_VOICE`	`af_heart`
`FASTKOKORO_DEFAULT_LANG`	`en-us`
`FASTKOKORO_WARMUP`	`true`
`FASTKOKORO_WARMUP_TEXT`	`hello`
`FASTKOKORO_ONNX_PROVIDERS`	`CPUExecutionProvider`
`FASTKOKORO_ONNX_AUTO_PROVIDERS`	`false`
`FASTKOKORO_ONNX_INTRA_OP_NUM_THREADS`	unset
`FASTKOKORO_ONNX_INTER_OP_NUM_THREADS`	unset

FASTKOKORO_WARMUP=true runs a short synthesis during startup. This makes the server take a little longer to become ready, but avoids paying most of the first request latency on the first user request.

ONNX Runtime Providers

fastkokoro creates the ONNX Runtime session directly, so provider selection is explicit and predictable.

CPU:

FASTKOKORO_ONNX_PROVIDERS=CPUExecutionProvider uv run fastkokoro

CUDA with CPU fallback:

FASTKOKORO_ONNX_PROVIDERS=CUDAExecutionProvider,CPUExecutionProvider uv run fastkokoro

TensorRT with CUDA and CPU fallback:

FASTKOKORO_ONNX_PROVIDERS=TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider uv run fastkokoro

Intel/OpenVINO builds can use:

FASTKOKORO_ONNX_PROVIDERS=OpenVINOExecutionProvider,CPUExecutionProvider uv run fastkokoro

Set FASTKOKORO_ONNX_AUTO_PROVIDERS=true to pass every provider available in the installed ONNX Runtime build to the session. Use this mostly for quick local experiments; production deployments should pin an explicit provider order.

API

Health:

curl http://localhost:8880/health

Models:

curl http://localhost:8880/v1/models

The server exposes the local Kokoro model as kokoro. For client compatibility, /v1/audio/speech also accepts tts-1 and gpt-4o-mini-tts as aliases, but they are not listed by /v1/models because the server is not running OpenAI TTS models.

Voices and Languages

The official Kokoro voice list maps voices to language codes. fastkokoro accepts the Kokoro language code and common locale aliases, then validates that the requested voice belongs to the resolved language.

Language	Request `lang` values	Voices
American English	`a`, `en-us`, `american`	`af_heart`, `af_alloy`, `af_aoede`, `af_bella`, `af_jessica`, `af_kore`, `af_nicole`, `af_nova`, `af_river`, `af_sarah`, `af_sky`, `am_adam`, `am_echo`, `am_eric`, `am_fenrir`, `am_liam`, `am_michael`, `am_onyx`, `am_puck`, `am_santa`
British English	`b`, `en-gb`, `british`	`bf_alice`, `bf_emma`, `bf_isabella`, `bf_lily`, `bm_daniel`, `bm_fable`, `bm_george`, `bm_lewis`
Japanese	`j`, `ja`, `ja-jp`	`jf_alpha`, `jf_gongitsune`, `jf_nezumi`, `jf_tebukuro`, `jm_kumo`
Mandarin Chinese	`z`, `zh`, `zh-cn`, `mandarin`	`zf_xiaobei`, `zf_xiaoni`, `zf_xiaoxiao`, `zf_xiaoyi`, `zm_yunjian`, `zm_yunxi`, `zm_yunxia`, `zm_yunyang`
Spanish	`e`, `es`, `es-es`	`ef_dora`, `em_alex`, `em_santa`
French	`f`, `fr`, `fr-fr`	`ff_siwis`
Hindi	`h`, `hi`, `hi-in`	`hf_alpha`, `hf_beta`, `hm_omega`, `hm_psi`
Italian	`i`, `it`, `it-it`	`if_sara`, `im_nicola`
Brazilian Portuguese	`p`, `pt`, `pt-br`	`pf_dora`, `pm_alex`, `pm_santa`

Speech:

curl http://localhost:8880/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro",
    "input": "Hello from fastkokoro.",
    "voice": "af_heart",
    "response_format": "wav"
  }' \
  --output speech.wav

Streaming PCM:

curl http://localhost:8880/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro",
    "input": "Streaming from fastkokoro.",
    "voice": "af_heart",
    "response_format": "pcm",
    "stream": true
  }' \
  --output speech.pcm

OpenAI SDK Examples

The examples use inline script dependencies, so they can run directly with uv without adding the OpenAI SDK to the project environment.

Start fastkokoro first:

uv run fastkokoro

Save synthesized audio to a file:

uv run examples/tts_save_file.py

Consume streamed audio chunks:

uv run examples/tts_stream_chunks.py

Useful environment variables:

Variable	Default
`FASTKOKORO_BASE_URL`	`http://localhost:8880/v1`
`FASTKOKORO_API_KEY`	`fastkokoro`
`FASTKOKORO_VOICE`	`pf_dora`
`FASTKOKORO_TEXT`	`Ola, tudo bem?`
`FASTKOKORO_TTS_OUTPUT`	`speech.wav`

Python

from fastkokoro import FastKokoro

engine = FastKokoro()
audio = engine.create(
    "Hello from fastkokoro.",
    voice="af_heart",
    response_format="wav",
)

Project details

Release history Release notifications | RSS feed

0.3.0

Jun 11, 2026

This version

0.2.0

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastkokoro-0.2.0.tar.gz (8.9 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastkokoro-0.2.0-py3-none-any.whl (12.6 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file fastkokoro-0.2.0.tar.gz.

File metadata

Download URL: fastkokoro-0.2.0.tar.gz
Upload date: Jun 2, 2026
Size: 8.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastkokoro-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`eb219f60983fec509a114798fb0cdf3c88227d4d6fbd92f29e04fde588bccfbe`
MD5	`6bcdb00afef6a638e6d6bba21f3a126e`
BLAKE2b-256	`9e64325452465812cfb59d44146221aac8f0d513db73bd0d2cc2249eb15a0b27`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastkokoro-0.2.0.tar.gz:

Publisher: publish.yml on msgflux/fastkokoro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fastkokoro-0.2.0.tar.gz
- Subject digest: eb219f60983fec509a114798fb0cdf3c88227d4d6fbd92f29e04fde588bccfbe
- Sigstore transparency entry: 1700891230
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: msgflux/fastkokoro@a7202f387d375442c6f9bc471acbb588a4aeab2d
- Branch / Tag: refs/heads/main
- Owner: https://github.com/msgflux
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a7202f387d375442c6f9bc471acbb588a4aeab2d
- Trigger Event: workflow_run

File details

Details for the file fastkokoro-0.2.0-py3-none-any.whl.

File metadata

Download URL: fastkokoro-0.2.0-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastkokoro-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29293136ff7758e92b7c4c3a5294dc059f6137f4bcbf3aecd5b578bdc9731a98`
MD5	`ce32797ce1960e0b5ddd8930ffe0820a`
BLAKE2b-256	`facb7c62dd66d21421f6ccb6307833c1ee5f2bffe14cf46748d256d52a1fbf9d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastkokoro-0.2.0-py3-none-any.whl:

Publisher: publish.yml on msgflux/fastkokoro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fastkokoro-0.2.0-py3-none-any.whl
- Subject digest: 29293136ff7758e92b7c4c3a5294dc059f6137f4bcbf3aecd5b578bdc9731a98
- Sigstore transparency entry: 1700891269
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: msgflux/fastkokoro@a7202f387d375442c6f9bc471acbb588a4aeab2d
- Branch / Tag: refs/heads/main
- Owner: https://github.com/msgflux
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a7202f387d375442c6f9bc471acbb588a4aeab2d
- Trigger Event: workflow_run

fastkokoro 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

fastkokoro

Install

Run

ONNX Runtime Providers

API

Voices and Languages

OpenAI SDK Examples

Python

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance