Skip to main content

Pipecat community TTS integration for the XTTSv2-vLLM streaming server

Project description

pipecat-xtts-vllm

A Pipecat community TTS integration that streams synthesized speech from an XTTSv2-vLLM streaming server.

XTTSVLLMTTSService is a drop-in Pipecat TTSService that:

  • Clones a voice from a reference audio clip.
  • Computes XTTSv2 conditioning once (via POST /v1/tts/conditioning) and caches it for the service lifetime — no per-request conditioning overhead.
  • Streams raw PCM audio chunks (via POST /v1/audio/speech) directly into the Pipecat pipeline.

Installation

Note: pipecat-xtts-vllm will be published to PyPI in a later step. Until then, install from source:

pip install -e .

Once published:

pip install pipecat-xtts-vllm

Start the XTTSv2-vLLM server

The client talks to the heavy Docker server at wuxuedaifu/xttsv2-vllm-streaming-server. Follow its README to pull and run the image, for example:

docker run --gpus all -p 8000:8000 ghcr.io/wuxuedaifu/xttsv2-vllm-streaming-server:latest

Usage with a Pipeline

The snippet below shows the essential setup. See examples/foundational/xtts_vllm_say_one_thing.py for the full, runnable version.

import asyncio
from pathlib import Path

from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.workers.runner import WorkerRunner

from pipecat_xtts_vllm import XTTSVLLMTTSService

async def main():
    reference_audio = Path("reference.wav").read_bytes()

    tts = XTTSVLLMTTSService(
        base_url="http://localhost:8000",
        reference_audio=reference_audio,
        language="en",
        sample_rate=24000,
    )

    # Add tts (and any downstream processors) to a Pipeline.
    pipeline = Pipeline([tts, ...])

    worker = PipelineWorker(
        pipeline,
        params=PipelineParams(audio_out_sample_rate=24000),
        idle_timeout_secs=None,
    )

    runner = WorkerRunner()
    await runner.add_workers(worker)

    async def say():
        await worker.queue_frames([
            TTSSpeakFrame("Hello from the XTTSv2 vLLM streaming server."),
            EndFrame(),
        ])

    await asyncio.gather(runner.run(), say())

asyncio.run(main())

Alternatively, pass a precomputed XTTSVLLMConditioning object instead of reference_audio if you have cached the conditioning data externally:

from pipecat_xtts_vllm import XTTSVLLMConditioning, XTTSVLLMTTSService

conditioning = XTTSVLLMConditioning(
    gpt_cond_latent_b64="<base64-encoded latent>",
    speaker_embeddings_b64="<base64-encoded embeddings>",
)

tts = XTTSVLLMTTSService(
    base_url="http://localhost:8000",
    conditioning=conditioning,
)

Running the Example

Set the required environment variables, then run the script:

export XTTS_VLLM_BASE_URL=http://localhost:8000
export XTTS_VLLM_REFERENCE_AUDIO=/path/to/reference.wav

python examples/foundational/xtts_vllm_say_one_thing.py

The script synthesizes one sentence and writes the output to output.wav in the current directory.


Configuration

XTTSVLLMTTSService accepts keyword-only arguments. At least one of reference_audio or conditioning must be given; if both are provided, conditioning takes precedence.

Parameter Default Description
base_url (required) Base URL of the XTTSv2-vLLM streaming server (e.g. http://localhost:8000).
reference_audio None Raw bytes of a reference WAV clip (~6 s) used for voice cloning. Required unless conditioning is given.
conditioning None Optional precomputed XTTSVLLMConditioning (skips the /v1/tts/conditioning call); if set, it takes precedence over reference_audio.
language "en" BCP-47 language code passed to the server.
chunk_size 20 Number of tokens per streaming chunk.
speed 1.0 Speech rate multiplier.
sample_rate 24000 PCM sample rate in Hz (should match the server output).
aiohttp_session None External aiohttp.ClientSession to reuse. If None, a session is created and closed by the service.

Compatibility

Supports Python 3.11+; last tested with pipecat-ai v1.4.0 on Python 3.12.


License

Integration code: MIT — see LICENSE.

XTTSv2 model weights: distributed under the Coqui Public Model License (non-commercial use only). See the server repository for details.


Attribution

Developed by wuxuedaifu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipecat_xtts_vllm-0.1.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipecat_xtts_vllm-0.1.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file pipecat_xtts_vllm-0.1.0.tar.gz.

File metadata

  • Download URL: pipecat_xtts_vllm-0.1.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pipecat_xtts_vllm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e5e00da9f3ac5f1d9d66908c8029d9c64fa4aa84a2c46c6f27668213fe3b35d5
MD5 0ecd7328f3882a8368eb21d2f8eee6fa
BLAKE2b-256 ddb8f60e24c1943c34b7ae8d06321255d3790690696e4a4002ff58378ee5aede

See more details on using hashes here.

File details

Details for the file pipecat_xtts_vllm-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pipecat_xtts_vllm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 86aa0165bdc97b21a4b77bc3c86d471d90577e1a8367abd78e4758b3c9c84e72
MD5 ef12230abad76707bc4ba8a556456fdd
BLAKE2b-256 6cbb1bf676b0a66a3ce6ac98d199d07aa2a8f480332b4214986b38a9bd88013f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page