Skip to main content

Pipecat community TTS integration for the XTTSv2-vLLM streaming server

Project description

pipecat-xtts-vllm

A Pipecat community TTS integration that streams synthesized speech from an XTTSv2-vLLM streaming server.

XTTSVLLMTTSService is a drop-in Pipecat TTSService that:

  • Clones a voice from a reference audio clip.
  • Computes XTTSv2 conditioning once (via POST /v1/tts/conditioning) and caches it for the service lifetime — no per-request conditioning overhead.
  • Streams raw PCM audio chunks (via POST /v1/audio/speech) directly into the Pipecat pipeline.

Installation

pip install pipecat-xtts-vllm

To work on the package from source instead:

pip install -e .

Start the XTTSv2-vLLM server

The client talks to the heavy Docker server at wuxuedaifu/xttsv2-vllm-streaming-server. Follow its README to pull and run the image, for example:

docker run --gpus all -p 8000:8000 ghcr.io/wuxuedaifu/xttsv2-vllm-streaming-server:latest

Usage with a Pipeline

The snippet below shows the essential setup. See examples/foundational/xtts_vllm_say_one_thing.py for the full, runnable version.

import asyncio
from pathlib import Path

from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.workers.runner import WorkerRunner

from pipecat_xtts_vllm import XTTSVLLMTTSService

async def main():
    reference_audio = Path("reference.wav").read_bytes()

    tts = XTTSVLLMTTSService(
        base_url="http://localhost:8000",
        reference_audio=reference_audio,
        language="en",
        sample_rate=24000,
    )

    # Add tts (and any downstream processors) to a Pipeline.
    pipeline = Pipeline([tts, ...])

    worker = PipelineWorker(
        pipeline,
        params=PipelineParams(audio_out_sample_rate=24000),
        idle_timeout_secs=None,
    )

    runner = WorkerRunner()
    await runner.add_workers(worker)

    async def say():
        await worker.queue_frames([
            TTSSpeakFrame("Hello from the XTTSv2 vLLM streaming server."),
            EndFrame(),
        ])

    await asyncio.gather(runner.run(), say())

asyncio.run(main())

Alternatively, pass a precomputed XTTSVLLMConditioning object instead of reference_audio if you have cached the conditioning data externally:

from pipecat_xtts_vllm import XTTSVLLMConditioning, XTTSVLLMTTSService

conditioning = XTTSVLLMConditioning(
    gpt_cond_latent_b64="<base64-encoded latent>",
    speaker_embeddings_b64="<base64-encoded embeddings>",
)

tts = XTTSVLLMTTSService(
    base_url="http://localhost:8000",
    conditioning=conditioning,
)

Running the Example

Set the required environment variables, then run the script:

export XTTS_VLLM_BASE_URL=http://localhost:8000
export XTTS_VLLM_REFERENCE_AUDIO=/path/to/reference.wav

python examples/foundational/xtts_vllm_say_one_thing.py

The script synthesizes one sentence and writes the output to output.wav in the current directory.


Configuration

XTTSVLLMTTSService accepts keyword-only arguments. At least one of reference_audio or conditioning must be given; if both are provided, conditioning takes precedence.

Parameter Default Description
base_url (required) Base URL of the XTTSv2-vLLM streaming server (e.g. http://localhost:8000).
reference_audio None Raw bytes of a reference WAV clip (~6 s) used for voice cloning. Required unless conditioning is given.
conditioning None Optional precomputed XTTSVLLMConditioning (skips the /v1/tts/conditioning call); if set, it takes precedence over reference_audio.
language "en" Language code passed to the server (see Supported languages).
chunk_size 20 Number of tokens per streaming chunk.
speed 1.0 Speech rate multiplier.
sample_rate 24000 PCM sample rate in Hz (should match the server output).
aiohttp_session None External aiohttp.ClientSession to reuse. If None, a session is created and closed by the service.

Supported languages

XTTSv2 supports 17 languages. Pass the matching code as the language argument:

Code Language Code Language
en English nl Dutch
es Spanish cs Czech
fr French ar Arabic
de German zh-cn Chinese (Simplified)
it Italian hu Hungarian
pt Portuguese ko Korean
pl Polish ja Japanese
tr Turkish hi Hindi
ru Russian

Pass auto to let the server auto-detect the language.


Compatibility

Supports Python 3.11+; last tested with pipecat-ai v1.4.0 on Python 3.12.


License

Integration code: MIT — see LICENSE.

XTTSv2 model weights: distributed under the Coqui Public Model License (non-commercial use only). See the server repository for details.


Attribution

Developed by wuxuedaifu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipecat_xtts_vllm-0.1.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipecat_xtts_vllm-0.1.1-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file pipecat_xtts_vllm-0.1.1.tar.gz.

File metadata

  • Download URL: pipecat_xtts_vllm-0.1.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pipecat_xtts_vllm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 23e4d4948af9412b9e9dc805a4ae1b13381c3d9521400aa5abc7aad9f68ab70f
MD5 6acfccea4810321a0c25e6fb0aa51fd2
BLAKE2b-256 2dc04937a34602282445334505330fa62592994410b40a995523c8848302c095

See more details on using hashes here.

File details

Details for the file pipecat_xtts_vllm-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pipecat_xtts_vllm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8fa6c615b4edf93784ac8d4bcda3d12c3ebf97cef098a29b1dca0d30edd8e8e1
MD5 a369503fa202a839aae0ce4994e41e10
BLAKE2b-256 88861f0fe26b38f0940dabfb2afdf743abac14ea6478ee12e13d7da685308085

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page