Skip to main content

Multi-provider TTS gateway server with engine fallback, text chunking, and audio stitching

Project description

tts-gateway

A local text-to-speech gateway with a pluggable engine architecture. New open-source voice models ship constantly; tts-gateway gives clients a stable HTTP API with canonical POST /v1/speech and POST /v1/jobs endpoints, while retaining legacy /tts compatibility shims so swapping or adding models means implementing a small engine class, not rewiring your workflow.

Currently supports Kokoro and Pocket TTS. Each engine runs natively in-process.

Install

Requires uv.

# With Kokoro support (recommended)
uv tool install tts-gateway[kokoro]

# With Pocket TTS support
uv tool install tts-gateway[pocket]

# Both engines
uv tool install tts-gateway[all]

This installs a tts binary in ~/.local/bin/.

spaCy model (Kokoro only)

Kokoro depends on misaki for grapheme-to-phoneme conversion, which needs a spaCy English model. On first request, misaki tries to download en_core_web_sm via spacy.cli.download, but that shells out to pip install — which doesn't exist inside uv tool environments. You'll get a SystemExit: 1 crash on the first TTS call.

Install the model manually into the tool's venv:

uv pip install \
  --python ~/.local/share/uv/tools/tts-gateway/bin/python \
  en_core_web_sm@https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl

For local development, see Development below.

Docker

This repo now publishes a container image to GHCR from GitHub Actions.

docker pull ghcr.io/abpai/tts-gateway:latest
docker run --rm -p 8080:8080 \
  -e TTS_PRIMARY_ENGINE=kokoro \
  -e TTS_OUTPUT_FORMAT=mp3 \
  ghcr.io/abpai/tts-gateway:latest

The published image installs both native engine stacks and the Kokoro spaCy model. By default it does not bake model weights into the image, so the first /warmup or /tts request may still download engine weights unless you build a preloaded image yourself.

To build a production image with baked model weights:

docker build \
  --build-arg PRELOAD_KOKORO=true \
  --build-arg PRELOAD_POCKET=false \
  -t tts-gateway:local .

Verify the container:

docker run --rm -d --name tts-gateway-test -p 8080:8080 tts-gateway:local
docker ps --filter name=tts-gateway-test
curl http://127.0.0.1:8080/health
curl -X POST http://127.0.0.1:8080/warmup
curl -X POST http://127.0.0.1:8080/v1/speech -F 'text=Hello world' -o output.mp3

For bookmark.bunny, the intended final-state deployment is to reference the published image from Compose rather than vendoring this repo's Python source.

Usage

Start the server:

tts serve --provider kokoro
tts serve --provider kokoro --port 9000 --device cpu --format mp3
tts serve --provider kokoro --fallback pocket

Synthesize speech:

# Canonical sync API
curl -X POST http://localhost:8000/v1/speech -F 'text=Hello world' -o output.mp3

# With a specific voice
curl -X POST http://localhost:8000/v1/speech -F 'text=Hello world' -F 'voice=af_heart' -o output.mp3

# Legacy compatibility route
curl -X POST http://localhost:8000/tts -F 'text=Hello world' -o output.mp3

# Async job submission
curl -X POST http://localhost:8000/v1/jobs -F 'text=Hello world' | jq

# Chunk-level audio streaming (always returns MP3)
curl -X POST http://localhost:8000/tts/stream \
  -H 'Content-Type: application/json' \
  -d '{"text":"Hello world"}' \
  -o output.mp3

Check server status:

curl http://localhost:8000/health

Pre-load models into memory:

curl -X POST http://localhost:8000/warmup

When both a primary and fallback engine are configured, the gateway tries the primary first and falls back on failure. Long texts are chunked automatically, synthesized concurrently across native chunks, and stitched into one final output file. The canonical API surface is /v1/speech, /v1/jobs, and /v1/jobs/{key}/audio; /tts and /tts/sync remain available as compatibility shims.

Running with PM2

For a persistent local server, use PM2:

// ~/.pm2/ecosystem.config.js
module.exports = {
  apps: [
    {
      name: "tts-gateway",
      script: "~/.local/bin/tts", // output of: which tts
      args: "serve --provider kokoro",
      interpreter: "none",
      autorestart: true,
      max_restarts: 10,
      restart_delay: 2000,
      time: true,
    },
  ],
};
pm2 start ~/.pm2/ecosystem.config.js --only tts-gateway
pm2 logs tts-gateway

Configuration

All settings can be controlled via environment variables. CLI flags take precedence (the CLI sets these env vars before starting the server).

Variable Default Description
TTS_PRIMARY_ENGINE kokoro Primary engine: kokoro or pocket
TTS_FALLBACK_ENGINE none Fallback engine: kokoro, pocket, or none
TTS_OUTPUT_FORMAT mp3 Output audio format: wav or mp3
TTS_DEVICE_MODE auto Torch device: auto, cpu, mps, cuda
TTS_DEFAULT_VOICE (none) Default voice name
TTS_MODELS_DIR ~/.cache/tts-gateway/models Model storage directory
TTS_GATEWAY_HOST 127.0.0.1 Bind address
TTS_GATEWAY_PORT 8000 Bind port
TTS_CHUNK_MAX_CHARS 500 Max characters per chunk
TTS_REQUEST_TIMEOUT_SECONDS 3600 Total request timeout
TTS_ENGINE_TIMEOUT_SECONDS 360 Per-engine call timeout
TTS_FFMPEG_PATH ffmpeg Path to ffmpeg binary (for MP3 encoding)
TTS_DATA_DIR ~/.cache/tts-gateway/data Job store and artifact directory
TTS_PIPELINE_VERSION 1 Cache-busting version for synthesis pipeline
TTS_WORKER_POLL_SECONDS 1.0 Background worker poll interval
KOKORO_TTS_ENABLED true Enable/disable Kokoro engine
POCKET_TTS_ENABLED false Enable/disable Pocket TTS engine

Development

make setup       # Create venv, install deps, set up pre-commit hooks
make test        # Run tests with coverage
make lint        # Run ruff linter with auto-fix
make format      # Run ruff formatter
make typecheck   # Run ty type checker
make run         # Start server (PROVIDER=kokoro by default)

make setup creates the local venv, installs dev dependencies plus all engine extras, installs the Kokoro spaCy model, preloads engine weights, and sets up pre-commit hooks. After it completes, the repo checkout is ready for real local synthesis.

If you only want the dev toolchain without engine extras, use:

make install-dev

After that, you can verify the local server the same way as the container:

make run
curl http://127.0.0.1:8000/health
curl -X POST http://127.0.0.1:8000/warmup
curl -X POST http://127.0.0.1:8000/v1/speech -F 'text=Hello world' -o output.mp3

Releasing

Use the repo helper to do the whole release flow in one command:

make release

That command:

  1. bumps project.version in pyproject.toml by one patch version
  2. runs lint, typecheck, tests, and packaging checks
  3. commits the version bump
  4. creates the matching git tag
  5. pushes the branch and the tag

You can choose a different bump strategy:

make release BUMP=minor
make release BUMP=major
make release VERSION=0.2.0

To preview the exact commands first:

make release-dry-run

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tts_gateway-1.0.0.tar.gz (178.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tts_gateway-1.0.0-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file tts_gateway-1.0.0.tar.gz.

File metadata

  • Download URL: tts_gateway-1.0.0.tar.gz
  • Upload date:
  • Size: 178.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tts_gateway-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b24e581a600d9334027f6c3dc886c807757f998c1710485b677f0ce55e82a514
MD5 55f8f88e984a13f669d71508bee0fdbc
BLAKE2b-256 04530c6990b46993bf306da52950f1139c61739af047b2b6d3fc4d32c90357fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for tts_gateway-1.0.0.tar.gz:

Publisher: release.yml on abpai/tts-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tts_gateway-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tts_gateway-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tts_gateway-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ef1d5ac3bf1ecb032a7bd1af67413f080d6575037f01686c19076b8dc11425e
MD5 be3e76fa2bf8e73aa5ad8c8e27707e41
BLAKE2b-256 840384395a85425f2912d95b66e26a37227a986426b34cae0cb7cb522dfab09c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tts_gateway-1.0.0-py3-none-any.whl:

Publisher: release.yml on abpai/tts-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page