Pipecat community TTS integration for the XTTSv2-vLLM streaming server

These details have not been verified by PyPI

Project links

Project description

pipecat-xtts-vllm

A Pipecat community TTS integration that streams synthesized speech from an XTTSv2-vLLM streaming server.

XTTSVLLMTTSService is a drop-in Pipecat TTSService that:

Clones a voice from a reference audio clip.
Computes XTTSv2 conditioning once (via POST /v1/tts/conditioning) and caches it for the service lifetime — no per-request conditioning overhead.
Streams raw PCM audio chunks (via POST /v1/audio/speech) directly into the Pipecat pipeline.

Installation

pip install pipecat-xtts-vllm

To work on the package from source instead:

pip install -e .

Start the XTTSv2-vLLM server

The client talks to the heavy Docker server at wuxuedaifu/xttsv2-vllm-streaming-server. Follow its README to pull and run the image, for example:

docker run --gpus all -p 8000:8000 ghcr.io/wuxuedaifu/xttsv2-vllm-streaming-server:latest

Usage with a Pipeline

The snippet below shows the essential setup. See examples/foundational/xtts_vllm_say_one_thing.py for the full, runnable version.

import asyncio
from pathlib import Path

from pipecat.frames.frames import EndFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.workers.runner import WorkerRunner

from pipecat_xtts_vllm import XTTSVLLMTTSService

async def main():
    reference_audio = Path("reference.wav").read_bytes()

    tts = XTTSVLLMTTSService(
        base_url="http://localhost:8000",
        reference_audio=reference_audio,
        language="en",
        sample_rate=24000,
    )

    # Add tts (and any downstream processors) to a Pipeline.
    pipeline = Pipeline([tts, ...])

    worker = PipelineWorker(
        pipeline,
        params=PipelineParams(audio_out_sample_rate=24000),
        idle_timeout_secs=None,
    )

    runner = WorkerRunner()
    await runner.add_workers(worker)

    async def say():
        await worker.queue_frames([
            TTSSpeakFrame("Hello from the XTTSv2 vLLM streaming server."),
            EndFrame(),
        ])

    await asyncio.gather(runner.run(), say())

asyncio.run(main())

Alternatively, pass a precomputed XTTSVLLMConditioning object instead of reference_audio if you have cached the conditioning data externally:

from pipecat_xtts_vllm import XTTSVLLMConditioning, XTTSVLLMTTSService

conditioning = XTTSVLLMConditioning(
    gpt_cond_latent_b64="<base64-encoded latent>",
    speaker_embeddings_b64="<base64-encoded embeddings>",
)

tts = XTTSVLLMTTSService(
    base_url="http://localhost:8000",
    conditioning=conditioning,
)

Running the Example

Set the required environment variables, then run the script:

export XTTS_VLLM_BASE_URL=http://localhost:8000
export XTTS_VLLM_REFERENCE_AUDIO=/path/to/reference.wav

python examples/foundational/xtts_vllm_say_one_thing.py

The script synthesizes one sentence and writes the output to output.wav in the current directory.

Configuration

XTTSVLLMTTSService accepts keyword-only arguments. At least one of reference_audio or conditioning must be given; if both are provided, conditioning takes precedence.

Parameter	Default	Description
`base_url`	(required)	Base URL of the XTTSv2-vLLM streaming server (e.g. `http://localhost:8000`).
`reference_audio`	`None`	Raw bytes of a reference WAV clip (~6 s) used for voice cloning. Required unless `conditioning` is given.
`conditioning`	`None`	Optional precomputed `XTTSVLLMConditioning` (skips the `/v1/tts/conditioning` call); if set, it takes precedence over `reference_audio`.
`language`	`"en"`	Language code passed to the server (see Supported languages).
`chunk_size`	`20`	Number of tokens per streaming chunk.
`speed`	`1.0`	Speech rate multiplier.
`sample_rate`	`24000`	PCM sample rate in Hz (should match the server output).
`aiohttp_session`	`None`	External `aiohttp.ClientSession` to reuse. If `None`, a session is created and closed by the service.

Supported languages

XTTSv2 supports 17 languages. Pass the matching code as the language argument:

Code	Language	Code	Language
`en`	English	`nl`	Dutch
`es`	Spanish	`cs`	Czech
`fr`	French	`ar`	Arabic
`de`	German	`zh-cn`	Chinese (Simplified)
`it`	Italian	`hu`	Hungarian
`pt`	Portuguese	`ko`	Korean
`pl`	Polish	`ja`	Japanese
`tr`	Turkish	`hi`	Hindi
`ru`	Russian

Pass auto to let the server auto-detect the language.

Compatibility

Supports Python 3.11+; last tested with pipecat-ai v1.4.0 on Python 3.12.

License

Integration code: MIT — see LICENSE.

XTTSv2 model weights: distributed under the Coqui Public Model License (non-commercial use only). See the server repository for details.

Attribution

Developed by wuxuedaifu.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 26, 2026

0.1.0

Jun 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipecat_xtts_vllm-0.1.1.tar.gz (7.9 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pipecat_xtts_vllm-0.1.1-py3-none-any.whl (6.7 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file pipecat_xtts_vllm-0.1.1.tar.gz.

File metadata

Download URL: pipecat_xtts_vllm-0.1.1.tar.gz
Upload date: Jun 26, 2026
Size: 7.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pipecat_xtts_vllm-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`23e4d4948af9412b9e9dc805a4ae1b13381c3d9521400aa5abc7aad9f68ab70f`
MD5	`6acfccea4810321a0c25e6fb0aa51fd2`
BLAKE2b-256	`2dc04937a34602282445334505330fa62592994410b40a995523c8848302c095`

See more details on using hashes here.

File details

Details for the file pipecat_xtts_vllm-0.1.1-py3-none-any.whl.

File metadata

Download URL: pipecat_xtts_vllm-0.1.1-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 6.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pipecat_xtts_vllm-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8fa6c615b4edf93784ac8d4bcda3d12c3ebf97cef098a29b1dca0d30edd8e8e1`
MD5	`a369503fa202a839aae0ce4994e41e10`
BLAKE2b-256	`88861f0fe26b38f0940dabfb2afdf743abac14ea6478ee12e13d7da685308085`

See more details on using hashes here.

pipecat-xtts-vllm 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

pipecat-xtts-vllm

Installation

Start the XTTSv2-vLLM server

Usage with a Pipeline

Running the Example

Configuration

Supported languages

Compatibility

License

Attribution

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes