Skip to main content

OpenAI-compatible HTTP server for Supertonic-3 TTS with streaming, voice aliases, and CPU / CoreML / CUDA acceleration.

Project description

supertonic-server

OpenAI-compatible HTTP server for the Supertonic-3 on-device TTS model — with streaming, voice aliases, multilingual support, and CPU/CoreML/CUDA acceleration.

Drop-in replacement for OpenAI's /v1/audio/speech endpoint. Works with the OpenAI Python SDK, Pipecat, LiveKit Agents, OpenWebUI, or anything else that speaks the OpenAI TTS protocol — just point it at http://localhost:8000/v1.

Why

Supertonic-3 (via this server)
Model size ~99M params (ONNX)
Runtime ONNX Runtime — runs on CPU, CoreML (Apple Silicon), or CUDA
Speed ~6–10× real-time on an M4 Pro CPU/CoreML
Languages 31 + a na fallback
Voices 10 presets (F1–F5, M1–M5) + OpenAI aliases (alloy, nova, echo, …)
First-byte latency ~450–650 ms after warmup (default settings)
Privacy Fully local — no cloud calls
License MIT code, OpenRAIL-M weights

Quick start (local, Apple Silicon / Linux / Windows)

# 1. Create venv and install
uv venv --python 3.12
uv pip install -e .

# 2. Run the server (first run downloads the model — ~one-time hit)
supertonic-server --port 8000

# 3. Speak
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello, world.","voice":"alloy","response_format":"mp3"}' \
  --output hello.mp3

--device auto is the default and picks the best available execution provider: CUDA (if onnxruntime-gpu is installed and a GPU is present) → CoreML (macOS) → CPU.

Docker

docker build -t supertonic-server .

# CPU (works on any platform incl. Linux, Windows containers, macOS)
docker run --rm -p 8000:8000 -v supertonic-cache:/root/.cache supertonic-server

# NVIDIA GPU (see Dockerfile for full instructions)
docker run --rm --gpus all -p 8000:8000 -v supertonic-cache:/root/.cache \
  -e SUPERTONIC_DEVICE=cuda supertonic-server

The mounted volume caches the model weights so subsequent starts skip the download.

CLI

supertonic-server --help
  --host TEXT                     Bind address.
  --port INTEGER                  Bind port.
  --device [auto|cpu|coreml|cuda] ONNX execution provider.
  --model [supertonic|supertonic-2|supertonic-3]
  --model-dir PATH                Local model cache dir.
  --voice TEXT                    Default voice (F1-F5, M1-M5).
  --lang TEXT                     Default language code.
  --speed FLOAT                   Default speed (0.5..2.0).
  --total-steps INTEGER           Diffusion steps (4..16). Lower = faster.
  --intra-threads INTEGER         ONNX intra-op threads.
  --inter-threads INTEGER         ONNX inter-op threads.
  --max-concurrent INTEGER        Concurrent synthesis ops.
  --no-warmup                     Skip startup warmup.
  --warmup-text TEXT              Custom warmup utterance.
  --log-level TEXT                debug | info | warning | error.
  --reload                        Auto-reload (dev only).

Every CLI flag also reads from SUPERTONIC_* environment variables (e.g. SUPERTONIC_PORT=9000).

Endpoints

POST /v1/audio/speech — OpenAI-compatible

Body:

{
  "model": "supertonic-3",                     // any string; informational
  "input": "Text to speak (up to 20k chars).",
  "voice": "alloy",                            // see Voices below
  "response_format": "mp3",                    // "mp3" | "wav" | "pcm"
  "speed": 1.05,                               // 0.5..2.0
  "lang": "en",                                // extension: 31 codes, see below
  "total_steps": 8                             // extension: 4..16
}

The response is HTTP/1.1 chunked transfer — audio bytes stream out as each sentence finishes synthesizing. Useful headers:

  • X-Sample-Rate: 44100
  • X-Voice: F1 (the actual Supertonic voice selected, after alias resolution)
  • X-Language: en
  • X-Audio-Encoding: pcm_s16le_44100_1ch (PCM only)

GET /v1/voices

Returns every accepted voice name (OpenAI aliases + Supertonic IDs) with the underlying Supertonic voice each one maps to.

GET /v1/models

OpenAI-style model list (returns supertonic-3 plus tts-1, tts-1-hd, gpt-4o-mini-tts as aliases so clients that hard-code those names work).

GET /healthz

{"status":"ok","model":"supertonic-3","sample_rate":44100,"voices":[…],"languages":[…]}

Voices

10 Supertonic presets + OpenAI's 13 standard voice names mapped onto them:

OpenAI alias Supertonic OpenAI alias Supertonic
alloy F1 marin F3
coral F2 nova F4
sage F5 shimmer F2
verse F1 onyx M1
ash M1 ballad M2
cedar M3 echo M4
fable M5

F1F5 and M1M5 also pass through unchanged.

Languages

31 supported language codes plus na (fallback): en, ko, ja, ar, bg, cs, da, de, el, es, et, fi, fr, hi, hr, hu, id, it, lt, lv, nl, pl, pt, ro, ru, sk, sl, sv, tr, uk, vi, na.

Pass via the lang field, e.g. {"input": "안녕하세요.", "lang": "ko"}.

Use it from Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
audio = client.audio.speech.create(
    model="supertonic-3",
    voice="alloy",
    input="Drop-in replacement for OpenAI TTS.",
    response_format="mp3",
)
audio.stream_to_file("hello.mp3")

Use it from Pipecat

from pipecat.services.openai.tts import OpenAITTSService, OpenAITTSSettings

tts = OpenAITTSService(
    api_key="not-needed",
    base_url="http://localhost:8000/v1",
    settings=OpenAITTSSettings(model="supertonic-3", voice="nova"),
    sample_rate=44100,  # supertonic-3 native rate
)
# Plug into any Pipecat pipeline as the TTS service.

A standalone smoke test (no full pipeline) lives at examples/pipecat_smoke.py.

Use it from LiveKit Agents

Any LiveKit openai.TTS plugin works the same way:

from livekit.plugins import openai

tts = openai.TTS(
    base_url="http://localhost:8000/v1",
    api_key="not-needed",
    model="supertonic-3",
    voice="nova",
)

Performance — what to expect

Numbers from an Apple M4 Pro with --device auto (CoreML EP):

Workload First-byte latency RTF
Short single sentence (~3s audio) ~450–650 ms 0.10 – 0.25
Multi-sentence (~13 s audio, streaming) ~620 ms 0.18
Long form (~20 s audio) ~600 ms 0.15

Warmup runs a short utterance on startup so the first real request doesn't pay the CoreML graph-compile tax (~2 s on cold start). Use --no-warmup to skip if you really want to.

Tuning

  • --total-steps 4 — lower diffusion steps, faster but slightly less expressive.
  • --total-steps 12 — higher quality, ~50% slower.
  • --max-concurrent 2 — allow two simultaneous syntheses (default 1 to avoid CPU thrashing).
  • --device cpu — skip CoreML/CUDA even when available (more predictable cold start).

Architecture (one paragraph)

engine.SupertonicEngine wraps the supertonic Python SDK, owns the ONNX sessions, and exposes an async sentence-level streaming generator. Each request splits the input on sentence/clause boundaries, runs each chunk through the diffusion pipeline in a ThreadPoolExecutor (ONNX releases the GIL), converts float32 audio to int16 PCM, and yields the bytes through a small async queue that pipelines chunk N+1's synthesis with chunk N's network send. The HTTP layer wraps that PCM stream in a format-specific transformer (passthrough, streaming-header WAV, lameenc MP3) and returns it as a StreamingResponse.

Limitations

  • Only mp3, wav, pcm response formats. (Opus/AAC/FLAC are TODO.)
  • No voice cloning at runtime — use Supertone's separate Voice Builder for that.
  • Diffusion pipeline is per-chunk, so we stream at sentence granularity, not sub-sentence. This is the standard granularity Pipecat / LiveKit expect.

License

  • Server code: MIT
  • Supertonic-3 model weights: OpenRAIL-M (downloaded automatically from Hugging Face on first run)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

supertonic_server-0.1.0.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

supertonic_server-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file supertonic_server-0.1.0.tar.gz.

File metadata

  • Download URL: supertonic_server-0.1.0.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for supertonic_server-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c69394cf569f5b493dfad672125a7b9767ec4812c256a8bb1d6e27e40b79d651
MD5 699e52ed2516ef069f5fd551a4d42a5c
BLAKE2b-256 227e431e7bc39743659a3c6edb2cd92996e728c2605fb64a23d434e91c3bb438

See more details on using hashes here.

File details

Details for the file supertonic_server-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: supertonic_server-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for supertonic_server-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 629ff080e7a620342359042dd8411d9e25f6b5f6faba83fc724dfa9af5d23ba4
MD5 571ab30cf95a6324eb959d8950868e33
BLAKE2b-256 ce5a0d467031b6ade061405fc10d29f35b07b9fa61dc6640eca2503ccd4dad25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page