Python client and CLI for Volcengine/ByteDance Doubao seed-tts-2.0 bidirectional streaming TTS.
Project description
doubao-tts
English | 中文
A small, production-minded Python client and CLI for Volcengine Doubao seed-tts-2.0 bidirectional streaming TTS — native-quality Chinese voices with emotion control, ready for agents, scripts, and serving pipelines.
Why
doubao-tts is the first PyPI package targeting Volcengine's seed-tts-2.0
bidirectional-streaming endpoint. Existing Python TTS wrappers either:
- hit the older SAMI HTTP endpoint (no streaming, older voice quality), or
- aren't published to PyPI at all.
This package fills that gap with:
- A clean
synthesize(text, out_path)interface - A CLI that drops straight into agent frameworks (Hermes, Dify, LangChain, n8n, …)
- Strict mypy on every public module
- 95% unit test coverage, atomic output writes, proper credential redaction
Install
pip install doubao-tts
# or with uv:
uv add doubao-tts
# CLI-only, installed as a standalone tool:
uv tool install doubao-tts
Quick start
Python
from doubao_tts import synthesize
synthesize("你好,世界", "hello.mp3")
Async
from doubao_tts import synthesize_async
await synthesize_async(
"Hello from Doubao seed-tts-2.0!",
"hello.mp3",
voice="en-female-assistant",
speed=1.1,
)
CLI
# simplest
doubao-tts say "你好" --out hello.mp3
# pick a voice + adjust speed
doubao-tts say "好激动!" --voice zh-female-warm --speed 1.2 --out excited.mp3
# read from a file
doubao-tts say --text-file script.txt --out narration.mp3
# browse available voices
doubao-tts list-voices --lang zh
# inspect resolved config (tokens are redacted)
doubao-tts config show
Credentials
Credentials resolve in this order — first match wins:
- Keyword arguments to
synthesize(...) - Environment variables:
VOLCENGINE_APP_ID,VOLCENGINE_ACCESS_TOKEN(also accepted asDOUBAO_APP_ID,DOUBAO_ACCESS_TOKEN) ~/.doubao-tts/config.yaml- Built-in defaults (speaker, audio format, sample rate)
Example ~/.doubao-tts/config.yaml:
app_id: "1234567890"
access_token: "volc_...."
speaker: zh_female_vv_uranus_bigtts
audio_format: mp3
sample_rate: 24000
Get your app ID and access token from the Volcengine Speech console. You need the seed-tts-2.0 product activated on your account.
Integration: Hermes Agent
Hermes Agent v0.x+ supports
declarative TTS command providers via its tts.providers.<name> config
block. Plug doubao-tts in:
# ~/.hermes/config.yaml
tts:
provider: doubao
providers:
doubao:
type: command
command: 'doubao-tts say --text-file {input_path} --out {output_path}'
That's it. Any Hermes voice-out path now routes through Doubao seed-tts-2.0.
Voices
The CLI ships with a curated alias catalogue:
| Alias | Language | Gender | Style |
|---|---|---|---|
zh-female-warm (default) |
zh-CN | female | warm, conversational |
zh-female-reporter |
zh-CN | female | crisp, news-reporter |
zh-male-warm |
zh-CN | male | warm, narrator |
zh-male-energetic |
zh-CN | male | energetic host |
en-female-assistant |
en-US | female | assistant, neutral |
en-male-assistant |
en-US | male | assistant, neutral |
Volcengine publishes hundreds more speaker IDs. You can pass any raw
speaker ID to voice= directly — aliases are a convenience, not a gate.
Emotion control
seed-tts-2.0 supports per-utterance emotion tags:
synthesize(
"好激动,我终于做到了!",
"out.mp3",
emotion="excited",
emotion_scale=4.0, # 0-5; higher = more intense
)
Model-supported emotions vary by voice; consult the Volcengine console for the up-to-date list per speaker.
Performance notes
- One
synthesize()call opens a fresh WebSocket and tears it down at the end. End-to-end latency to a 24 kHz MP3 of ~2 seconds of speech is ~750 ms on a healthy connection — network dominates. - The seed-tts-2.0
bidi-streamsession currently accepts one synthesis per session (empirically verified); connection reuse saves only TCP+TLS setup (~180 ms / call). A daemon mode with connection pooling is planned for a future release, but most users don't need it. import doubao_ttsis cheap — ~3 ms — becausewebsocketsandyamlare only imported on first synthesis call.
Error handling
All user-facing errors inherit from DoubaoTTSError:
from doubao_tts import (
DoubaoTTSError, DoubaoConfigError,
DoubaoAuthError, DoubaoAPIError, DoubaoTimeoutError,
synthesize,
)
try:
synthesize("你好", "out.mp3")
except DoubaoAuthError:
... # rotate your token
except DoubaoTimeoutError:
... # retry or check network
except DoubaoTTSError as exc:
... # catch-all
Security
- Access tokens are redacted in all logs and CLI output — see
SECURITY.mdfor the exact policy. - User text is not logged by default. To troubleshoot protocol
issues, opt in with
DOUBAO_TTS_TRACE_PAYLOADS=1. ~/.doubao-tts/config.yamlis user-scoped; the shipped.gitignoreexcludes.envfiles at the project level.- Vulnerability reports:
hypnus.yuan@gmail.comor a private GitHub security advisory.
Development
git clone https://github.com/Hypnus-Yuan/doubao-tts.git
cd doubao-tts
uv sync --all-extras --group dev
uv run pre-commit install
uv run pytest
See CONTRIBUTING.md for the full workflow.
Roadmap
- v0.2 — connection-reuse daemon (saves ~180 ms / call on chained requests), streaming callback API, richer voice metadata.
- v0.3 — integration recipes for LangChain, LlamaIndex, Dify.
- v1.0 — API frozen, semver guarantees.
License
MIT — see LICENSE.
Credits
Protocol framing extracted and hardened from Hermes Agent community work. Thanks to the Volcengine Speech team for the seed-tts-2.0 bidirectional-streaming API.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doubao_tts-0.1.0.tar.gz.
File metadata
- Download URL: doubao_tts-0.1.0.tar.gz
- Upload date:
- Size: 29.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebe56efffc69036624d464cfda7da76f85c6a6c2e5c9807a5de3b07054ef94c2
|
|
| MD5 |
7e006218f25be8bf7eefe134fde9888e
|
|
| BLAKE2b-256 |
ff0b453f2ae7f04c29800a64b4cbdf6ef623f6d41095d86e3a5dab3498e478cc
|
File details
Details for the file doubao_tts-0.1.0-py3-none-any.whl.
File metadata
- Download URL: doubao_tts-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf4c90d0cd34a477b2dd037b67cebc236c84e91cb4f0d379748d0ddd5be3c82c
|
|
| MD5 |
dfea286d1fdc8300bd9cebd6212e33a7
|
|
| BLAKE2b-256 |
3604e8ba4dfb6349168dc00790333e5d193baa06daafd9987831edf7497554f3
|