OpenAI-compatible HTTP server for Supertonic-3 TTS with streaming, voice aliases, and CPU / CoreML / CUDA acceleration.

These details have not been verified by PyPI

Project links

Project description

supertonic-server

OpenAI-compatible HTTP server for the Supertonic-3 on-device TTS model — with streaming, voice aliases, multilingual support, and CPU/CoreML/CUDA acceleration.

Drop-in replacement for OpenAI's /v1/audio/speech endpoint. Works with the OpenAI Python SDK, Pipecat, LiveKit Agents, OpenWebUI, or anything else that speaks the OpenAI TTS protocol — just point it at http://localhost:8000/v1.

Why

	Supertonic-3 (via this server)
Model size	~99M params (ONNX)
Runtime	ONNX Runtime — runs on CPU, CoreML (Apple Silicon), or CUDA
Speed	~6–10× real-time on an M4 Pro CPU/CoreML
Languages	31 + a `na` fallback
Voices	10 presets (F1–F5, M1–M5) + OpenAI aliases (`alloy`, `nova`, `echo`, …)
First-byte latency	~450–650 ms after warmup (default settings)
Privacy	Fully local — no cloud calls
License	MIT code, OpenRAIL-M weights

Quick start (local, Apple Silicon / Linux / Windows)

# 1. Create venv and install
uv venv --python 3.12
uv pip install -e .

# 2. Run the server (first run downloads the model — ~one-time hit)
supertonic-server --port 8000

# 3. Speak
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello, world.","voice":"alloy","response_format":"mp3"}' \
  --output hello.mp3

--device auto is the default and picks the best available execution provider: CUDA (if onnxruntime-gpu is installed and a GPU is present) → CoreML (macOS) → CPU.

Docker

docker build -t supertonic-server .

# CPU (works on any platform incl. Linux, Windows containers, macOS)
docker run --rm -p 8000:8000 -v supertonic-cache:/root/.cache supertonic-server

# NVIDIA GPU (see Dockerfile for full instructions)
docker run --rm --gpus all -p 8000:8000 -v supertonic-cache:/root/.cache \
  -e SUPERTONIC_DEVICE=cuda supertonic-server

The mounted volume caches the model weights so subsequent starts skip the download.

CLI

supertonic-server --help
  --host TEXT                     Bind address.
  --port INTEGER                  Bind port.
  --device [auto|cpu|coreml|cuda] ONNX execution provider.
  --model [supertonic|supertonic-2|supertonic-3]
  --model-dir PATH                Local model cache dir.
  --voice TEXT                    Default voice (F1-F5, M1-M5).
  --lang TEXT                     Default language code.
  --speed FLOAT                   Default speed (0.5..2.0).
  --total-steps INTEGER           Diffusion steps (4..16). Lower = faster.
  --intra-threads INTEGER         ONNX intra-op threads.
  --inter-threads INTEGER         ONNX inter-op threads.
  --max-concurrent INTEGER        Concurrent synthesis ops.
  --no-warmup                     Skip startup warmup.
  --warmup-text TEXT              Custom warmup utterance.
  --log-level TEXT                debug | info | warning | error.
  --reload                        Auto-reload (dev only).

Every CLI flag also reads from SUPERTONIC_* environment variables (e.g. SUPERTONIC_PORT=9000).

Endpoints

`POST /v1/audio/speech` — OpenAI-compatible

Body:

{
  "model": "supertonic-3",                     // any string; informational
  "input": "Text to speak (up to 20k chars).",
  "voice": "alloy",                            // see Voices below
  "response_format": "mp3",                    // "mp3" | "wav" | "pcm"
  "speed": 1.05,                               // 0.5..2.0
  "lang": "en",                                // extension: 31 codes, see below
  "total_steps": 8                             // extension: 4..16
}

The response is HTTP/1.1 chunked transfer — audio bytes stream out as each sentence finishes synthesizing. Useful headers:

X-Sample-Rate: 44100
X-Voice: F1 (the actual Supertonic voice selected, after alias resolution)
X-Language: en
X-Audio-Encoding: pcm_s16le_44100_1ch (PCM only)

`GET /v1/voices`

Returns every accepted voice name (OpenAI aliases + Supertonic IDs) with the underlying Supertonic voice each one maps to.

`GET /v1/models`

OpenAI-style model list (returns supertonic-3 plus tts-1, tts-1-hd, gpt-4o-mini-tts as aliases so clients that hard-code those names work).

`GET /healthz`

{"status":"ok","model":"supertonic-3","sample_rate":44100,"voices":[…],"languages":[…]}

Voices

10 Supertonic presets + OpenAI's 13 standard voice names mapped onto them:

OpenAI alias	Supertonic	OpenAI alias	Supertonic
alloy	F1	marin	F3
coral	F2	nova	F4
sage	F5	shimmer	F2
verse	F1	onyx	M1
ash	M1	ballad	M2
cedar	M3	echo	M4
fable	M5

F1–F5 and M1–M5 also pass through unchanged.

Languages

31 supported language codes plus na (fallback): en, ko, ja, ar, bg, cs, da, de, el, es, et, fi, fr, hi, hr, hu, id, it, lt, lv, nl, pl, pt, ro, ru, sk, sl, sv, tr, uk, vi, na.

Pass via the lang field, e.g. {"input": "안녕하세요.", "lang": "ko"}.

Use it from Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
audio = client.audio.speech.create(
    model="supertonic-3",
    voice="alloy",
    input="Drop-in replacement for OpenAI TTS.",
    response_format="mp3",
)
audio.stream_to_file("hello.mp3")

Use it from Pipecat

from pipecat.services.openai.tts import OpenAITTSService, OpenAITTSSettings

tts = OpenAITTSService(
    api_key="not-needed",
    base_url="http://localhost:8000/v1",
    settings=OpenAITTSSettings(model="supertonic-3", voice="nova"),
    sample_rate=44100,  # supertonic-3 native rate
)
# Plug into any Pipecat pipeline as the TTS service.

A standalone smoke test (no full pipeline) lives at examples/pipecat_smoke.py.

Use it from LiveKit Agents

Any LiveKit openai.TTS plugin works the same way:

from livekit.plugins import openai

tts = openai.TTS(
    base_url="http://localhost:8000/v1",
    api_key="not-needed",
    model="supertonic-3",
    voice="nova",
)

Performance — what to expect

Numbers from an Apple M4 Pro with --device auto (CoreML EP):

Workload	First-byte latency	RTF
Short single sentence (~3s audio)	~450–650 ms	0.10 – 0.25
Multi-sentence (~13 s audio, streaming)	~620 ms	0.18
Long form (~20 s audio)	~600 ms	0.15

Warmup runs a short utterance on startup so the first real request doesn't pay the CoreML graph-compile tax (~2 s on cold start). Use --no-warmup to skip if you really want to.

Tuning

--total-steps 4 — lower diffusion steps, faster but slightly less expressive.
--total-steps 12 — higher quality, ~50% slower.
--max-concurrent 2 — allow two simultaneous syntheses (default 1 to avoid CPU thrashing).
--device cpu — skip CoreML/CUDA even when available (more predictable cold start).

Architecture (one paragraph)

engine.SupertonicEngine wraps the supertonic Python SDK, owns the ONNX sessions, and exposes an async sentence-level streaming generator. Each request splits the input on sentence/clause boundaries, runs each chunk through the diffusion pipeline in a ThreadPoolExecutor (ONNX releases the GIL), converts float32 audio to int16 PCM, and yields the bytes through a small async queue that pipelines chunk N+1's synthesis with chunk N's network send. The HTTP layer wraps that PCM stream in a format-specific transformer (passthrough, streaming-header WAV, lameenc MP3) and returns it as a StreamingResponse.

Limitations

Only mp3, wav, pcm response formats. (Opus/AAC/FLAC are TODO.)
No voice cloning at runtime — use Supertone's separate Voice Builder for that.
Diffusion pipeline is per-chunk, so we stream at sentence granularity, not sub-sentence. This is the standard granularity Pipecat / LiveKit expect.

License

Server code: MIT
Supertonic-3 model weights: OpenRAIL-M (downloaded automatically from Hugging Face on first run)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

May 17, 2026

This version

0.1.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

supertonic_server-0.1.0.tar.gz (15.9 kB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

supertonic_server-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file supertonic_server-0.1.0.tar.gz.

File metadata

Download URL: supertonic_server-0.1.0.tar.gz
Upload date: May 15, 2026
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for supertonic_server-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c69394cf569f5b493dfad672125a7b9767ec4812c256a8bb1d6e27e40b79d651`
MD5	`699e52ed2516ef069f5fd551a4d42a5c`
BLAKE2b-256	`227e431e7bc39743659a3c6edb2cd92996e728c2605fb64a23d434e91c3bb438`

See more details on using hashes here.

File details

Details for the file supertonic_server-0.1.0-py3-none-any.whl.

File metadata

Download URL: supertonic_server-0.1.0-py3-none-any.whl
Upload date: May 15, 2026
Size: 16.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for supertonic_server-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`629ff080e7a620342359042dd8411d9e25f6b5f6faba83fc724dfa9af5d23ba4`
MD5	`571ab30cf95a6324eb959d8950868e33`
BLAKE2b-256	`ce5a0d467031b6ade061405fc10d29f35b07b9fa61dc6640eca2503ccd4dad25`

See more details on using hashes here.

supertonic-server 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

supertonic-server

Why

Quick start (local, Apple Silicon / Linux / Windows)

Docker

CLI

Endpoints

POST /v1/audio/speech — OpenAI-compatible

GET /v1/voices

GET /v1/models

GET /healthz

Voices

Languages

Use it from Python (OpenAI SDK)

Use it from Pipecat

Use it from LiveKit Agents

Performance — what to expect

Tuning

Architecture (one paragraph)

Limitations

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`POST /v1/audio/speech` — OpenAI-compatible

`GET /v1/voices`

`GET /v1/models`

`GET /healthz`