Skip to main content

Python client and CLI for Volcengine/ByteDance Doubao seed-tts-2.0 bidirectional streaming TTS.

Project description

doubao-tts

English | 中文

PyPI Python CI Coverage License: MIT Ruff uv pre-commit mypy strict Downloads

A small, production-minded Python client and CLI for Volcengine Doubao seed-tts-2.0 bidirectional streaming TTS — native-quality Chinese voices with emotion control, ready for agents, scripts, and serving pipelines.

Why

doubao-tts is the first PyPI package targeting Volcengine's seed-tts-2.0 bidirectional-streaming endpoint. Existing Python TTS wrappers either:

  • hit the older SAMI HTTP endpoint (no streaming, older voice quality), or
  • aren't published to PyPI at all.

This package fills that gap with:

  • A clean synthesize(text, out_path) interface
  • A CLI that drops straight into agent frameworks (Hermes, Dify, LangChain, n8n, …)
  • Strict mypy on every public module
  • 95% unit test coverage, atomic output writes, proper credential redaction

Install

pip install doubao-tts

# or with uv:
uv add doubao-tts

# CLI-only, installed as a standalone tool:
uv tool install doubao-tts

Quick start

Python

from doubao_tts import synthesize

synthesize("你好,世界", "hello.mp3")

Async

from doubao_tts import synthesize_async

await synthesize_async(
    "Hello from Doubao seed-tts-2.0!",
    "hello.mp3",
    voice="en-female-assistant",
    speed=1.1,
)

CLI

# simplest
doubao-tts say "你好" --out hello.mp3

# pick a voice + adjust speed
doubao-tts say "好激动!" --voice zh-female-warm --speed 1.2 --out excited.mp3

# read from a file
doubao-tts say --text-file script.txt --out narration.mp3

# browse available voices
doubao-tts list-voices --lang zh

# inspect resolved config (tokens are redacted)
doubao-tts config show

Credentials

Credentials resolve in this order — first match wins:

  1. Keyword arguments to synthesize(...)
  2. Environment variables: VOLCENGINE_APP_ID, VOLCENGINE_ACCESS_TOKEN (also accepted as DOUBAO_APP_ID, DOUBAO_ACCESS_TOKEN)
  3. ~/.doubao-tts/config.yaml
  4. Built-in defaults (speaker, audio format, sample rate)

Example ~/.doubao-tts/config.yaml:

app_id: "1234567890"
access_token: "volc_...."
speaker: zh_female_vv_uranus_bigtts
audio_format: mp3
sample_rate: 24000

Get your app ID and access token from the Volcengine Speech console. You need the seed-tts-2.0 product activated on your account.

Integration: Hermes Agent

Hermes Agent v0.x+ supports declarative TTS command providers via its tts.providers.<name> config block. Plug doubao-tts in:

# ~/.hermes/config.yaml
tts:
  provider: doubao
  providers:
    doubao:
      type: command
      command: 'doubao-tts say --text-file {input_path} --out {output_path}'

That's it. Any Hermes voice-out path now routes through Doubao seed-tts-2.0.

Voices

The CLI ships with a curated alias catalogue:

Alias Language Gender Style
zh-female-warm (default) zh-CN female warm, conversational
zh-female-reporter zh-CN female crisp, news-reporter
zh-male-warm zh-CN male warm, narrator
zh-male-energetic zh-CN male energetic host
en-female-assistant en-US female assistant, neutral
en-male-assistant en-US male assistant, neutral

Volcengine publishes hundreds more speaker IDs. You can pass any raw speaker ID to voice= directly — aliases are a convenience, not a gate.

Emotion control

seed-tts-2.0 supports per-utterance emotion tags:

synthesize(
    "好激动,我终于做到了!",
    "out.mp3",
    emotion="excited",
    emotion_scale=4.0,  # 0-5; higher = more intense
)

Model-supported emotions vary by voice; consult the Volcengine console for the up-to-date list per speaker.

Performance notes

  • One synthesize() call opens a fresh WebSocket and tears it down at the end. End-to-end latency to a 24 kHz MP3 of ~2 seconds of speech is ~750 ms on a healthy connection — network dominates.
  • The seed-tts-2.0 bidi-stream session currently accepts one synthesis per session (empirically verified); connection reuse saves only TCP+TLS setup (~180 ms / call). A daemon mode with connection pooling is planned for a future release, but most users don't need it.
  • import doubao_tts is cheap — ~3 ms — because websockets and yaml are only imported on first synthesis call.

Error handling

All user-facing errors inherit from DoubaoTTSError:

from doubao_tts import (
    DoubaoTTSError, DoubaoConfigError,
    DoubaoAuthError, DoubaoAPIError, DoubaoTimeoutError,
    synthesize,
)

try:
    synthesize("你好", "out.mp3")
except DoubaoAuthError:
    ...  # rotate your token
except DoubaoTimeoutError:
    ...  # retry or check network
except DoubaoTTSError as exc:
    ...  # catch-all

Security

  • Access tokens are redacted in all logs and CLI output — see SECURITY.md for the exact policy.
  • User text is not logged by default. To troubleshoot protocol issues, opt in with DOUBAO_TTS_TRACE_PAYLOADS=1.
  • ~/.doubao-tts/config.yaml is user-scoped; the shipped .gitignore excludes .env files at the project level.
  • Vulnerability reports: hypnus.yuan@gmail.com or a private GitHub security advisory.

Development

git clone https://github.com/Hypnus-Yuan/doubao-tts.git
cd doubao-tts

uv sync --all-extras --group dev
uv run pre-commit install
uv run pytest

See CONTRIBUTING.md for the full workflow.

Roadmap

  • v0.2 — connection-reuse daemon (saves ~180 ms / call on chained requests), streaming callback API, richer voice metadata.
  • v0.3 — integration recipes for LangChain, LlamaIndex, Dify.
  • v1.0 — API frozen, semver guarantees.

License

MIT — see LICENSE.

Credits

Protocol framing extracted and hardened from Hermes Agent community work. Thanks to the Volcengine Speech team for the seed-tts-2.0 bidirectional-streaming API.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doubao_tts-0.1.0.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doubao_tts-0.1.0-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file doubao_tts-0.1.0.tar.gz.

File metadata

  • Download URL: doubao_tts-0.1.0.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for doubao_tts-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ebe56efffc69036624d464cfda7da76f85c6a6c2e5c9807a5de3b07054ef94c2
MD5 7e006218f25be8bf7eefe134fde9888e
BLAKE2b-256 ff0b453f2ae7f04c29800a64b4cbdf6ef623f6d41095d86e3a5dab3498e478cc

See more details on using hashes here.

File details

Details for the file doubao_tts-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: doubao_tts-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for doubao_tts-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf4c90d0cd34a477b2dd037b67cebc236c84e91cb4f0d379748d0ddd5be3c82c
MD5 dfea286d1fdc8300bd9cebd6212e33a7
BLAKE2b-256 3604e8ba4dfb6349168dc00790333e5d193baa06daafd9987831edf7497554f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page