Skip to main content

Lightweight OpenAI-compatible Kokoro TTS server powered by ONNX Runtime

Project description

fastkokoro

Lightweight OpenAI-compatible Kokoro TTS server powered by ONNX Runtime.

fastkokoro runs the 82M-parameter Kokoro text-to-speech model with low startup overhead, fast local inference, and a small dependency footprint. It supports CPU and GPU execution through ONNX Runtime providers, including CUDA, TensorRT, and OpenVINO when the matching runtime package is installed. The default model is NVIDIA's optimized ONNX export: nvidia/kokoro-82M-onnx-opt.

The NVIDIA repo's voices.bin uses a raw float32 layout. fastkokoro converts it once into the .npz voice format expected by kokoro-onnx, so the default model and voices both come from nvidia/kokoro-82M-onnx-opt.

Install

uv sync

From PyPI:

pip install fastkokoro

For GPU builds on platforms supported by onnxruntime-gpu:

uv sync --extra gpu

Run

uv run fastkokoro

The server starts on http://0.0.0.0:8880 by default.

Docker CPU:

docker build -f Dockerfile.cpu -t fastkokoro:cpu .
docker run -p 8880:8880 fastkokoro:cpu

Docker Hub CPU:

docker run -p 8880:8880 msgflux/fastkokoro:cpu

Docker GPU:

docker build -f Dockerfile.gpu -t fastkokoro:gpu .
docker run --gpus all -p 8880:8880 fastkokoro:gpu

Docker Hub GPU:

docker run --gpus all -p 8880:8880 msgflux/fastkokoro:gpu

Environment variables:

Variable Default
FASTKOKORO_HOST 0.0.0.0
FASTKOKORO_PORT 8880
FASTKOKORO_MODEL_REPO nvidia/kokoro-82M-onnx-opt
FASTKOKORO_MODEL_FILE kokoro-82m-v1.0.onnx
FASTKOKORO_MODEL_PATH unset; downloads from Hugging Face
FASTKOKORO_VOICES_FILE voices.bin
FASTKOKORO_VOICES_INDEX_FILE voices.txt
FASTKOKORO_VOICES_PATH unset; downloads and converts NVIDIA voices
FASTKOKORO_DEFAULT_VOICE af_heart
FASTKOKORO_DEFAULT_LANG en-us
FASTKOKORO_WARMUP true
FASTKOKORO_WARMUP_TEXT hello
FASTKOKORO_ONNX_PROVIDERS CPUExecutionProvider
FASTKOKORO_ONNX_AUTO_PROVIDERS false
FASTKOKORO_ONNX_INTRA_OP_NUM_THREADS unset
FASTKOKORO_ONNX_INTER_OP_NUM_THREADS unset

FASTKOKORO_WARMUP=true runs a short synthesis during startup. This makes the server take a little longer to become ready, but avoids paying most of the first request latency on the first user request.

ONNX Runtime Providers

fastkokoro creates the ONNX Runtime session directly, so provider selection is explicit and predictable.

CPU:

FASTKOKORO_ONNX_PROVIDERS=CPUExecutionProvider uv run fastkokoro

CUDA with CPU fallback:

FASTKOKORO_ONNX_PROVIDERS=CUDAExecutionProvider,CPUExecutionProvider uv run fastkokoro

TensorRT with CUDA and CPU fallback:

FASTKOKORO_ONNX_PROVIDERS=TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider uv run fastkokoro

Intel/OpenVINO builds can use:

FASTKOKORO_ONNX_PROVIDERS=OpenVINOExecutionProvider,CPUExecutionProvider uv run fastkokoro

Set FASTKOKORO_ONNX_AUTO_PROVIDERS=true to pass every provider available in the installed ONNX Runtime build to the session. Use this mostly for quick local experiments; production deployments should pin an explicit provider order.

API

Health:

curl http://localhost:8880/health

Models:

curl http://localhost:8880/v1/models

The server exposes the local Kokoro model as kokoro. For client compatibility, /v1/audio/speech also accepts tts-1 and gpt-4o-mini-tts as aliases, but they are not listed by /v1/models because the server is not running OpenAI TTS models.

Voices and Languages

The official Kokoro voice list maps voices to language codes. fastkokoro accepts the Kokoro language code and common locale aliases, then validates that the requested voice belongs to the resolved language.

Language Request lang values Voices
American English a, en-us, american af_heart, af_alloy, af_aoede, af_bella, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky, am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa
British English b, en-gb, british bf_alice, bf_emma, bf_isabella, bf_lily, bm_daniel, bm_fable, bm_george, bm_lewis
Japanese j, ja, ja-jp jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo
Mandarin Chinese z, zh, zh-cn, mandarin zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang
Spanish e, es, es-es ef_dora, em_alex, em_santa
French f, fr, fr-fr ff_siwis
Hindi h, hi, hi-in hf_alpha, hf_beta, hm_omega, hm_psi
Italian i, it, it-it if_sara, im_nicola
Brazilian Portuguese p, pt, pt-br pf_dora, pm_alex, pm_santa

Speech:

curl http://localhost:8880/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro",
    "input": "Hello from fastkokoro.",
    "voice": "af_heart",
    "response_format": "wav"
  }' \
  --output speech.wav

Streaming PCM:

curl http://localhost:8880/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro",
    "input": "Streaming from fastkokoro.",
    "voice": "af_heart",
    "response_format": "pcm",
    "stream": true
  }' \
  --output speech.pcm

OpenAI SDK Examples

The examples use inline script dependencies, so they can run directly with uv without adding the OpenAI SDK to the project environment.

Start fastkokoro first:

uv run fastkokoro

Save synthesized audio to a file:

uv run examples/tts_save_file.py

Consume streamed audio chunks:

uv run examples/tts_stream_chunks.py

Useful environment variables:

Variable Default
FASTKOKORO_BASE_URL http://localhost:8880/v1
FASTKOKORO_API_KEY fastkokoro
FASTKOKORO_VOICE pf_dora
FASTKOKORO_TEXT Ola, tudo bem?
FASTKOKORO_TTS_OUTPUT speech.wav

Python

from fastkokoro import FastKokoro

engine = FastKokoro()
audio = engine.create(
    "Hello from fastkokoro.",
    voice="af_heart",
    response_format="wav",
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastkokoro-0.2.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastkokoro-0.2.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file fastkokoro-0.2.0.tar.gz.

File metadata

  • Download URL: fastkokoro-0.2.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastkokoro-0.2.0.tar.gz
Algorithm Hash digest
SHA256 eb219f60983fec509a114798fb0cdf3c88227d4d6fbd92f29e04fde588bccfbe
MD5 6bcdb00afef6a638e6d6bba21f3a126e
BLAKE2b-256 9e64325452465812cfb59d44146221aac8f0d513db73bd0d2cc2249eb15a0b27

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastkokoro-0.2.0.tar.gz:

Publisher: publish.yml on msgflux/fastkokoro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastkokoro-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: fastkokoro-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastkokoro-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 29293136ff7758e92b7c4c3a5294dc059f6137f4bcbf3aecd5b578bdc9731a98
MD5 ce32797ce1960e0b5ddd8930ffe0820a
BLAKE2b-256 facb7c62dd66d21421f6ccb6307833c1ee5f2bffe14cf46748d256d52a1fbf9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastkokoro-0.2.0-py3-none-any.whl:

Publisher: publish.yml on msgflux/fastkokoro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page