Multi-provider TTS gateway server with engine fallback, text chunking, and audio stitching
Project description
tts-gateway
A local text-to-speech gateway with a pluggable engine architecture. New open-source voice models ship constantly; tts-gateway gives clients a stable HTTP API with canonical POST /v1/speech and POST /v1/jobs endpoints, while retaining legacy /tts compatibility shims so swapping or adding models means implementing a small engine class, not rewiring your workflow.
Currently supports Kokoro and Pocket TTS. Each engine runs natively in-process.
Install
Requires uv.
# With Kokoro support (recommended)
uv tool install tts-gateway[kokoro]
# With Pocket TTS support
uv tool install tts-gateway[pocket]
# Both engines
uv tool install tts-gateway[all]
This installs a tts binary in ~/.local/bin/.
spaCy model (Kokoro only)
Kokoro depends on misaki for grapheme-to-phoneme conversion, which needs a spaCy English model. On first request, misaki tries to download en_core_web_sm via spacy.cli.download, but that shells out to pip install — which doesn't exist inside uv tool environments. You'll get a SystemExit: 1 crash on the first TTS call.
Install the model manually into the tool's venv:
uv pip install \
--python ~/.local/share/uv/tools/tts-gateway/bin/python \
en_core_web_sm@https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl
For local development, see Development below.
Docker
This repo now publishes a container image to GHCR from GitHub Actions.
docker pull ghcr.io/abpai/tts-gateway:latest
docker run --rm -p 8080:8080 \
-e TTS_PRIMARY_ENGINE=kokoro \
-e TTS_OUTPUT_FORMAT=mp3 \
ghcr.io/abpai/tts-gateway:latest
The published image installs both native engine stacks and the Kokoro spaCy
model. By default it does not bake model weights into the image, so the first
/warmup or /tts request may still download engine weights unless you build a
preloaded image yourself.
To build a production image with baked model weights:
docker build \
--build-arg PRELOAD_KOKORO=true \
--build-arg PRELOAD_POCKET=false \
-t tts-gateway:local .
Verify the container:
docker run --rm -d --name tts-gateway-test -p 8080:8080 tts-gateway:local
docker ps --filter name=tts-gateway-test
curl http://127.0.0.1:8080/health
curl -X POST http://127.0.0.1:8080/warmup
curl -X POST http://127.0.0.1:8080/v1/speech -F 'text=Hello world' -o output.mp3
For bookmark.bunny, the intended final-state deployment is to reference the
published image from Compose rather than vendoring this repo's Python source.
Usage
Start the server:
tts serve --provider kokoro
tts serve --provider kokoro --port 9000 --device cpu --format mp3
tts serve --provider kokoro --fallback pocket
Synthesize speech:
# Canonical sync API
curl -X POST http://localhost:8000/v1/speech -F 'text=Hello world' -o output.mp3
# With a specific voice
curl -X POST http://localhost:8000/v1/speech -F 'text=Hello world' -F 'voice=af_heart' -o output.mp3
# Legacy compatibility route
curl -X POST http://localhost:8000/tts -F 'text=Hello world' -o output.mp3
# Async job submission
curl -X POST http://localhost:8000/v1/jobs -F 'text=Hello world' | jq
# Chunk-level audio streaming (always returns MP3)
curl -X POST http://localhost:8000/tts/stream \
-H 'Content-Type: application/json' \
-d '{"text":"Hello world"}' \
-o output.mp3
Check server status:
curl http://localhost:8000/health
Pre-load models into memory:
curl -X POST http://localhost:8000/warmup
When both a primary and fallback engine are configured, the gateway tries the primary first and falls back on failure. Long texts are chunked automatically, synthesized concurrently across native chunks, and stitched into one final output file. The canonical API surface is /v1/speech, /v1/jobs, and /v1/jobs/{key}/audio; /tts and /tts/sync remain available as compatibility shims.
Running with PM2
For a persistent local server, use PM2:
// ~/.pm2/ecosystem.config.js
module.exports = {
apps: [
{
name: "tts-gateway",
script: "~/.local/bin/tts", // output of: which tts
args: "serve --provider kokoro",
interpreter: "none",
autorestart: true,
max_restarts: 10,
restart_delay: 2000,
time: true,
},
],
};
pm2 start ~/.pm2/ecosystem.config.js --only tts-gateway
pm2 logs tts-gateway
Configuration
All settings can be controlled via environment variables. CLI flags take precedence (the CLI sets these env vars before starting the server).
| Variable | Default | Description |
|---|---|---|
TTS_PRIMARY_ENGINE |
kokoro |
Primary engine: kokoro or pocket |
TTS_FALLBACK_ENGINE |
none |
Fallback engine: kokoro, pocket, or none |
TTS_OUTPUT_FORMAT |
mp3 |
Output audio format: wav or mp3 |
TTS_DEVICE_MODE |
auto |
Torch device: auto, cpu, mps, cuda |
TTS_DEFAULT_VOICE |
(none) | Default voice name |
TTS_MODELS_DIR |
~/.cache/tts-gateway/models |
Model storage directory |
TTS_GATEWAY_HOST |
127.0.0.1 |
Bind address |
TTS_GATEWAY_PORT |
8000 |
Bind port |
TTS_CHUNK_MAX_CHARS |
500 |
Max characters per chunk |
TTS_REQUEST_TIMEOUT_SECONDS |
3600 |
Total request timeout |
TTS_ENGINE_TIMEOUT_SECONDS |
360 |
Per-engine call timeout |
TTS_FFMPEG_PATH |
ffmpeg |
Path to ffmpeg binary (for MP3 encoding) |
TTS_DATA_DIR |
~/.cache/tts-gateway/data |
Job store and artifact directory |
TTS_PIPELINE_VERSION |
1 |
Cache-busting version for synthesis pipeline |
TTS_WORKER_POLL_SECONDS |
1.0 |
Background worker poll interval |
KOKORO_TTS_ENABLED |
true |
Enable/disable Kokoro engine |
POCKET_TTS_ENABLED |
false |
Enable/disable Pocket TTS engine |
Development
make setup # Create venv, install deps, set up pre-commit hooks
make test # Run tests with coverage
make lint # Run ruff linter with auto-fix
make format # Run ruff formatter
make typecheck # Run ty type checker
make run # Start server (PROVIDER=kokoro by default)
make setup creates the local venv, installs dev dependencies plus all engine
extras, installs the Kokoro spaCy model, preloads engine weights, and sets up
pre-commit hooks. After it completes, the repo checkout is ready for real local
synthesis.
If you only want the dev toolchain without engine extras, use:
make install-dev
After that, you can verify the local server the same way as the container:
make run
curl http://127.0.0.1:8000/health
curl -X POST http://127.0.0.1:8000/warmup
curl -X POST http://127.0.0.1:8000/v1/speech -F 'text=Hello world' -o output.mp3
Releasing
Use the repo helper to do the whole release flow in one command:
make release
That command:
- bumps
project.versioninpyproject.tomlby one patch version - runs lint, typecheck, tests, and packaging checks
- commits the version bump
- creates the matching git tag
- pushes the branch and the tag
You can choose a different bump strategy:
make release BUMP=minor
make release BUMP=major
make release VERSION=0.2.0
To preview the exact commands first:
make release-dry-run
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tts_gateway-1.0.0.tar.gz.
File metadata
- Download URL: tts_gateway-1.0.0.tar.gz
- Upload date:
- Size: 178.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b24e581a600d9334027f6c3dc886c807757f998c1710485b677f0ce55e82a514
|
|
| MD5 |
55f8f88e984a13f669d71508bee0fdbc
|
|
| BLAKE2b-256 |
04530c6990b46993bf306da52950f1139c61739af047b2b6d3fc4d32c90357fc
|
Provenance
The following attestation bundles were made for tts_gateway-1.0.0.tar.gz:
Publisher:
release.yml on abpai/tts-gateway
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tts_gateway-1.0.0.tar.gz -
Subject digest:
b24e581a600d9334027f6c3dc886c807757f998c1710485b677f0ce55e82a514 - Sigstore transparency entry: 1207899694
- Sigstore integration time:
-
Permalink:
abpai/tts-gateway@0693f0e8a00db7da81fad0987c2f0362f2af8dfe -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/abpai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0693f0e8a00db7da81fad0987c2f0362f2af8dfe -
Trigger Event:
push
-
Statement type:
File details
Details for the file tts_gateway-1.0.0-py3-none-any.whl.
File metadata
- Download URL: tts_gateway-1.0.0-py3-none-any.whl
- Upload date:
- Size: 30.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ef1d5ac3bf1ecb032a7bd1af67413f080d6575037f01686c19076b8dc11425e
|
|
| MD5 |
be3e76fa2bf8e73aa5ad8c8e27707e41
|
|
| BLAKE2b-256 |
840384395a85425f2912d95b66e26a37227a986426b34cae0cb7cb522dfab09c
|
Provenance
The following attestation bundles were made for tts_gateway-1.0.0-py3-none-any.whl:
Publisher:
release.yml on abpai/tts-gateway
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tts_gateway-1.0.0-py3-none-any.whl -
Subject digest:
3ef1d5ac3bf1ecb032a7bd1af67413f080d6575037f01686c19076b8dc11425e - Sigstore transparency entry: 1207899746
- Sigstore integration time:
-
Permalink:
abpai/tts-gateway@0693f0e8a00db7da81fad0987c2f0362f2af8dfe -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/abpai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0693f0e8a00db7da81fad0987c2f0362f2af8dfe -
Trigger Event:
push
-
Statement type: