OpenAI-compatible HTTP server for Supertonic-3 TTS with streaming, voice aliases, and CPU / CoreML / CUDA acceleration.
Project description
supertonic-server
OpenAI-compatible HTTP server for the Supertonic-3 on-device TTS model — with streaming, voice aliases, multilingual support, and CPU/CoreML/CUDA acceleration.
Drop-in replacement for OpenAI's /v1/audio/speech endpoint. Works with the OpenAI Python SDK, Pipecat, LiveKit Agents, OpenWebUI, or anything else that speaks the OpenAI TTS protocol — just point it at http://localhost:8000/v1.
Why
| Supertonic-3 (via this server) | |
|---|---|
| Model size | ~99M params (ONNX) |
| Runtime | ONNX Runtime — runs on CPU, CoreML (Apple Silicon), or CUDA |
| Speed | ~6–10× real-time on an M4 Pro CPU/CoreML |
| Languages | 31 + a na fallback |
| Voices | 10 presets (F1–F5, M1–M5) + OpenAI aliases (alloy, nova, echo, …) |
| First-byte latency | ~450–650 ms after warmup (default settings) |
| Privacy | Fully local — no cloud calls |
| License | MIT code, OpenRAIL-M weights |
Quick start (local, Apple Silicon / Linux / Windows)
# 1. Create venv and install
uv venv --python 3.12
uv pip install -e .
# 2. Run the server (first run downloads the model — ~one-time hit)
supertonic-server --port 8000
# 3. Speak
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input":"Hello, world.","voice":"alloy","response_format":"mp3"}' \
--output hello.mp3
--device auto is the default and picks the best available execution provider:
CUDA (if onnxruntime-gpu is installed and a GPU is present) → CoreML (macOS) → CPU.
Docker
docker build -t supertonic-server .
# CPU (works on any platform incl. Linux, Windows containers, macOS)
docker run --rm -p 8000:8000 -v supertonic-cache:/root/.cache supertonic-server
# NVIDIA GPU (see Dockerfile for full instructions)
docker run --rm --gpus all -p 8000:8000 -v supertonic-cache:/root/.cache \
-e SUPERTONIC_DEVICE=cuda supertonic-server
The mounted volume caches the model weights so subsequent starts skip the download.
CLI
supertonic-server --help
--host TEXT Bind address.
--port INTEGER Bind port.
--device [auto|cpu|coreml|cuda] ONNX execution provider.
--model [supertonic|supertonic-2|supertonic-3]
--model-dir PATH Local model cache dir.
--voice TEXT Default voice (F1-F5, M1-M5).
--lang TEXT Default language code.
--speed FLOAT Default speed (0.5..2.0).
--total-steps INTEGER Diffusion steps (4..16). Lower = faster.
--intra-threads INTEGER ONNX intra-op threads.
--inter-threads INTEGER ONNX inter-op threads.
--max-concurrent INTEGER Concurrent synthesis ops.
--no-warmup Skip startup warmup.
--warmup-text TEXT Custom warmup utterance.
--log-level TEXT debug | info | warning | error.
--reload Auto-reload (dev only).
Every CLI flag also reads from SUPERTONIC_* environment variables (e.g. SUPERTONIC_PORT=9000).
Endpoints
POST /v1/audio/speech — OpenAI-compatible
Body:
{
"model": "supertonic-3", // any string; informational
"input": "Text to speak (up to 20k chars).",
"voice": "alloy", // see Voices below
"response_format": "mp3", // "mp3" | "wav" | "pcm"
"speed": 1.05, // 0.5..2.0
"lang": "en", // extension: 31 codes, see below
"total_steps": 8 // extension: 4..16
}
The response is HTTP/1.1 chunked transfer — audio bytes stream out as each sentence finishes synthesizing. Useful headers:
X-Sample-Rate: 44100X-Voice: F1(the actual Supertonic voice selected, after alias resolution)X-Language: enX-Audio-Encoding: pcm_s16le_44100_1ch(PCM only)
GET /v1/voices
Returns every accepted voice name (OpenAI aliases + Supertonic IDs) with the underlying Supertonic voice each one maps to.
GET /v1/models
OpenAI-style model list (returns supertonic-3 plus tts-1, tts-1-hd,
gpt-4o-mini-tts as aliases so clients that hard-code those names work).
GET /healthz
{"status":"ok","model":"supertonic-3","sample_rate":44100,"voices":[…],"languages":[…]}
Voices
10 Supertonic presets + OpenAI's 13 standard voice names mapped onto them:
| OpenAI alias | Supertonic | OpenAI alias | Supertonic |
|---|---|---|---|
| alloy | F1 | marin | F3 |
| coral | F2 | nova | F4 |
| sage | F5 | shimmer | F2 |
| verse | F1 | onyx | M1 |
| ash | M1 | ballad | M2 |
| cedar | M3 | echo | M4 |
| fable | M5 |
F1–F5 and M1–M5 also pass through unchanged.
Languages
31 supported language codes plus na (fallback): en, ko, ja, ar, bg, cs, da, de, el, es, et, fi, fr, hi, hr, hu, id, it, lt, lv, nl, pl, pt, ro, ru, sk, sl, sv, tr, uk, vi, na.
Pass via the lang field, e.g. {"input": "안녕하세요.", "lang": "ko"}.
Use it from Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
audio = client.audio.speech.create(
model="supertonic-3",
voice="alloy",
input="Drop-in replacement for OpenAI TTS.",
response_format="mp3",
)
audio.stream_to_file("hello.mp3")
Use it from Pipecat
from pipecat.services.openai.tts import OpenAITTSService, OpenAITTSSettings
tts = OpenAITTSService(
api_key="not-needed",
base_url="http://localhost:8000/v1",
settings=OpenAITTSSettings(model="supertonic-3", voice="nova"),
sample_rate=44100, # supertonic-3 native rate
)
# Plug into any Pipecat pipeline as the TTS service.
A standalone smoke test (no full pipeline) lives at examples/pipecat_smoke.py.
Use it from LiveKit Agents
Any LiveKit openai.TTS plugin works the same way:
from livekit.plugins import openai
tts = openai.TTS(
base_url="http://localhost:8000/v1",
api_key="not-needed",
model="supertonic-3",
voice="nova",
)
Performance — what to expect
Numbers from an Apple M4 Pro with --device auto (CoreML EP):
| Workload | First-byte latency | RTF |
|---|---|---|
| Short single sentence (~3s audio) | ~450–650 ms | 0.10 – 0.25 |
| Multi-sentence (~13 s audio, streaming) | ~620 ms | 0.18 |
| Long form (~20 s audio) | ~600 ms | 0.15 |
Warmup runs a short utterance on startup so the first real request doesn't pay
the CoreML graph-compile tax (~2 s on cold start). Use --no-warmup to skip if
you really want to.
Tuning
--total-steps 4— lower diffusion steps, faster but slightly less expressive.--total-steps 12— higher quality, ~50% slower.--max-concurrent 2— allow two simultaneous syntheses (default 1 to avoid CPU thrashing).--device cpu— skip CoreML/CUDA even when available (more predictable cold start).
Architecture (one paragraph)
engine.SupertonicEngine wraps the supertonic Python SDK, owns the ONNX
sessions, and exposes an async sentence-level streaming generator. Each request
splits the input on sentence/clause boundaries, runs each chunk through the
diffusion pipeline in a ThreadPoolExecutor (ONNX releases the GIL), converts
float32 audio to int16 PCM, and yields the bytes through a small async queue
that pipelines chunk N+1's synthesis with chunk N's network send. The HTTP
layer wraps that PCM stream in a format-specific transformer (passthrough,
streaming-header WAV, lameenc MP3) and returns it as a StreamingResponse.
Limitations
- Only
mp3,wav,pcmresponse formats. (Opus/AAC/FLAC are TODO.) - No voice cloning at runtime — use Supertone's separate Voice Builder for that.
- Diffusion pipeline is per-chunk, so we stream at sentence granularity, not sub-sentence. This is the standard granularity Pipecat / LiveKit expect.
License
- Server code: MIT
- Supertonic-3 model weights: OpenRAIL-M (downloaded automatically from Hugging Face on first run)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file supertonic_server-0.1.0.tar.gz.
File metadata
- Download URL: supertonic_server-0.1.0.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c69394cf569f5b493dfad672125a7b9767ec4812c256a8bb1d6e27e40b79d651
|
|
| MD5 |
699e52ed2516ef069f5fd551a4d42a5c
|
|
| BLAKE2b-256 |
227e431e7bc39743659a3c6edb2cd92996e728c2605fb64a23d434e91c3bb438
|
File details
Details for the file supertonic_server-0.1.0-py3-none-any.whl.
File metadata
- Download URL: supertonic_server-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
629ff080e7a620342359042dd8411d9e25f6b5f6faba83fc724dfa9af5d23ba4
|
|
| MD5 |
571ab30cf95a6324eb959d8950868e33
|
|
| BLAKE2b-256 |
ce5a0d467031b6ade061405fc10d29f35b07b9fa61dc6640eca2503ccd4dad25
|