Skip to main content

LiveKit Agents plugin for ByteDance and Volcengine AI services.

Project description

ByteDance plugin for LiveKit Agents

PyPI CI

Community-maintained LiveKit Agents plugin for ByteDance and Volcengine AI services.

This package is unofficial and is not currently maintained by ByteDance, Volcengine, or LiveKit.

Current Scope

livekit-plugins-bytedance is intentionally narrow today. The package name is reserved for the broader ByteDance/Volcengine ecosystem, but version 0.1.x only implements:

Service API LiveKit class Status
Volcengine TTS V3 bidirectional streaming wss://openspeech.bytedance.com/api/v3/tts/bidirection livekit.plugins.bytedance.TTS Supported
Volcengine BigModel streaming ASR, optimized bidirectional mode wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async livekit.plugins.bytedance.STT Supported

The implemented clients follow these Volcengine WebSocket APIs:

The supported WebSocket request paths are:

wss://openspeech.bytedance.com/api/v3/tts/bidirection
wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async

The TTS binary protocol constants are also checked against ByteDance's reference helper package named TTS Websocket Bidirection protocols, including the downstream event codes for UsageResponse (154), AudioMuted (250), TTSResponse (352), TTSEnded (359), and TTSSubtitle (364).

Explicitly Not Supported Yet

This package does not currently implement:

  • Volcengine legacy TTS v1 (/api/v1/tts/ws_binary)
  • Volcengine ASR batch/offline APIs
  • Volcengine ASR bigmodel_nostream streaming-input mode
  • Volcengine ASR legacy non-optimized bidirectional path (/api/v3/sauc/bigmodel)
  • Doubao/Ark LLM APIs
  • Volcengine realtime dialogue APIs
  • ByteDance video, image, embedding, or moderation APIs
  • Non-streaming LiveKit TTS.synthesize()

For those services, use a provider-specific package if one exists. The existing third-party livekit-plugins-volcengine package is separate from this package and uses the livekit.plugins.volcengine import namespace.

Supported TTS Features

  • Streaming synthesis through TTS.stream()
  • Volcengine TTS V3 connection/session/task binary protocol
  • X-Api-Key authentication for the current Volcengine console
  • Legacy console authentication through X-Api-App-Key and X-Api-Access-Key
  • resource_id values documented for this API:
    • seed-tts-2.0
    • seed-icl-2.0
  • Optional cloned-voice model selection:
    • seed-tts-2.0-standard
    • seed-tts-2.0-expressive
  • speaker
  • ssml
  • audio_format: pcm, mp3, ogg_opus, or wav
  • sample_rate
  • bit_rate
  • speech_rate
  • loudness_rate
  • enable_subtitle request flag
  • disable_markdown_filter
  • disable_emoji_filter
  • enable_latex_tn
  • latex_parser
  • explicit_language
  • explicit_dialect
  • aigc_watermark
  • aigc_metadata
  • cache_config
  • post_process
  • TTS 2.0 context_texts
  • use_tag_parser
  • X-Control-Require-Usage-Tokens-Return
  • Server-side sentence splitting
  • LiveKit retry behavior for transient websocket failures before audio is emitted

Subtitle and usage payloads can be requested from Volcengine, but this LiveKit TTS plugin currently exposes only synthesized audio frames through the LiveKit TTS stream. Non-audio protocol events such as usage responses, muted-audio signals, sentence boundaries, subtitles, and TTS-ended markers are parsed and ignored for now rather than surfaced as LiveKit TTS events.

Supported STT Features

  • Streaming recognition through STT.stream()
  • Volcengine BigModel ASR WebSocket binary protocol v3
  • Optimized bidirectional streaming endpoint: wss://openspeech.bytedance.com/api/v3/sauc/bigmodel_async
  • X-Api-Key authentication for the current Volcengine console
  • Legacy console authentication through X-Api-App-Key and X-Api-Access-Key
  • X-Api-Resource-Id, X-Api-Request-Id, X-Api-Sequence, and X-Api-Connect-Id headers
  • Default ASR 2.0 resource ID: volc.seedasr.sauc.duration
  • PCM input at 16 kHz, 16-bit, mono
  • Server-side VAD/final segmentation through enable_nonstream=True
  • Interim and final LiveKit transcript events
  • Word timestamps when Volcengine returns utterances[*].words
  • Selected BigModel request options, including enable_itn, enable_punc, enable_ddc, show_utterances, enable_speaker_info, ssd_version, result_type, VAD timing options, sensitive-word filtering, and corpus
  • Escape hatches through audio_options and request_options for provider fields that are not first-class constructor arguments yet

The plugin does not send the ASR audio.language field by default because the provider document scopes that field to the bigmodel_nostream endpoint, which this package does not currently support.

The plugin sends credentials with Volcengine's V3 websocket headers:

  • X-Api-Key
  • X-Api-Resource-Id
  • X-Api-Connect-Id

For legacy console credentials, it sends:

  • X-Api-App-Key
  • X-Api-Access-Key
  • X-Api-Resource-Id
  • X-Api-Connect-Id

Installation

pip install livekit-plugins-bytedance

Credentials

Create or locate your Volcengine TTS V3 credentials in the Volcengine console, then pass them to the plugin explicitly:

from livekit.plugins import bytedance

tts = bytedance.TTS(
    api_key="your-api-key",
    resource_id="seed-tts-2.0",
)

If your application prefers environment variables, load them in your own config layer and pass them to TTS. The plugin does not read environment variables by itself.

Suggested variable names:

export VOLCENGINE_TTS_V3_API_KEY=...
export VOLCENGINE_TTS_V3_RESOURCE_ID=seed-tts-2.0
export VOLCENGINE_ASR_API_KEY=...
export VOLCENGINE_ASR_RESOURCE_ID=volc.seedasr.sauc.duration

The API also supports old-console authentication. If you still use those credentials, pass both app_key and access_key instead of api_key.

Usage

Use the default TTS V3 model and speaker:

from livekit.plugins import bytedance

tts = bytedance.TTS(
    api_key="your-api-key",
)

Use a specific Seed TTS resource and speaker:

tts = bytedance.TTS(
    api_key="your-api-key",
    resource_id="seed-tts-2.0",
    speaker="zh_female_vv_uranus_bigtts",
)

Use TTS 2.0 style controls:

tts = bytedance.TTS(
    api_key="your-api-key",
    resource_id="seed-tts-2.0",
    speaker="zh_female_vv_uranus_bigtts",
    context_texts=["自然、专业、和善,像面试官一样说话"],
    speech_rate=0,
    loudness_rate=0,
)

Use the descriptive class name if you prefer:

from livekit.plugins.bytedance import VolcengineV3TTS

tts = VolcengineV3TTS(
    api_key="your-api-key",
)

Use streaming ASR:

from livekit.plugins import bytedance

stt = bytedance.STT(
    api_key="your-api-key",
    resource_id="volc.seedasr.sauc.duration",
)

stream = stt.stream()
stream.push_frame(audio_frame)
stream.end_input()

async for event in stream:
    if event.type == "final_transcript":
        print(event.alternatives[0].text)

Testing

Run the default suite:

uv run pytest livekit-plugins/livekit-plugins-bytedance

The default tests are hermetic and do not require Volcengine credentials. They cover the TTS V3 binary protocol, ASR WebSocket v3 binary protocol, websocket handshake headers, retry behavior, zombie websocket handling, server error classification, partial audio drain behavior, and ASR transcript event mapping.

Real end-to-end tests should use a separate marker and require:

export VOLCENGINE_TTS_V3_API_KEY=...
export VOLCENGINE_TTS_V3_RESOURCE_ID=seed-tts-2.0
export VOLCENGINE_ASR_API_KEY=...
export VOLCENGINE_ASR_RESOURCE_ID=volc.seedasr.sauc.duration

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_bytedance-0.1.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_bytedance-0.1.0-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_bytedance-0.1.0.tar.gz.

File metadata

File hashes

Hashes for livekit_plugins_bytedance-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e206250ffc858af0deac75a2f9ee3a529ff03fa73465d75c2e268dd87fdae626
MD5 9616016a403d3c74a9fa2796f9d3821b
BLAKE2b-256 785118316d5dc4f543eba0ca4aa002ae58cc8848efcc573b32912326c632be09

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_bytedance-0.1.0.tar.gz:

Publisher: publish-pypi-bytedance.yml on Ao-Last/livekit-plugins-extra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file livekit_plugins_bytedance-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_bytedance-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1057d0861a48801e091d1727363d38bee3a1dd68fed78ab145d0c7271edc1687
MD5 025cdef9428aca98d1ede150a344c720
BLAKE2b-256 74a06753b7171c3c9de9b145dc269d3786086b4ffe10d58d3cf70aeb2bc4ac09

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_bytedance-0.1.0-py3-none-any.whl:

Publisher: publish-pypi-bytedance.yml on Ao-Last/livekit-plugins-extra

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page