Asynchronous streaming TTS client for CosyVoice

These details have not been verified by PyPI

Project links

Project description

CosyVoice Python SDK

Production-Ready Async TTS Client for CosyVoice Services

Enterprise-grade asynchronous text-to-speech SDK designed for high-concurrency, low-latency real-time voice synthesis scenarios. The optimal choice for building intelligent voice interaction applications.

Overview

CosyVoice Python SDK is an async-first TTS client library that provides:

🚀 Real-time Streaming Synthesis: Text stream input, audio stream output with minimized time-to-first-byte
🎭 Custom Voice Management: Create unique voices from audio samples with zero-shot voice cloning
⚡ High-Performance Async Architecture: Support thousands of concurrent requests with auto-reconnection
🔧 Production-Ready: Complete error handling, monitoring metrics, and load balancing support
📡 Multi-Protocol Support: WebSocket streaming synthesis + HTTP RESTful voice management
🎵 Multi-Format Output: Support WAV, MP3, PCM and other audio formats

Quick Start

Installation

pip install cosyvoice-client
# or using uv
uv add cosyvoice-client

Authentication & Configuration

The SDK supports multiple authentication and configuration methods:

Environment Variables (Recommended)

export COSYVOICE_BASE_URL="https://api.cosyvoice.com"
export COSYVOICE_API_KEY="your_api_key_here"

Configuration in Code

import cosyvoice

# Method 1: Using connection string
client = await cosyvoice.create_client(
    server_url="wss://api.cosyvoice.com",
    api_key="your_api_key"
)

# Method 2: Using connection string with parameters
client = await cosyvoice.create_client(
    server_url="wss://api.cosyvoice.com",
    api_key="your_api_key",
    connect_timeout=30.0,
    request_timeout=60.0,
    ping_interval=20.0
)

# Method 3: Using context manager (recommended)
async with cosyvoice.connect_client(
    server_url="wss://api.cosyvoice.com",
    api_key="your_api_key"
) as client:
    # Use client
    pass

Basic Usage Example

import asyncio
import cosyvoice

async def basic_tts_example():
    # Connect to CosyVoice service
    async with cosyvoice.create_client() as client:

        # 1. Create custom speaker
        speaker = await client.speaker.create(
            prompt_text="Hello, this is my voice sample.",
            prompt_audio_path="https://example.com/voice_sample.wav"
        )

        # 2. Configure synthesis parameters
        config = cosyvoice.SynthesisConfig(
            speaker_id=speaker.zero_shot_spk_id,
            mode=cosyvoice.SynthesisMode.ZERO_SHOT,
            speed=1.2,
            output_format=cosyvoice.AudioFormat.WAV
        )

        # 3. Synthesize speech
        audio_data = await client.collect_audio(
            "Welcome to CosyVoice TTS service!",
            config
        )

        # 4. Save audio file
        with open("output.wav", "wb") as f:
            f.write(audio_data)

asyncio.run(basic_tts_example())

API Reference

Authentication

The SDK uses Bearer Token authentication for HTTP APIs and query parameter token for WebSocket connections.

Method	HTTP Header	WebSocket URL Parameter
Bearer Token	`Authorization: Bearer {token}`	`?token={token}`

Core Interfaces

1. Client Management

Create Client Connection

# Create client with auto-configuration from environment
client = await cosyvoice.create_client()

# Create client with explicit configuration
client = await cosyvoice.create_client(
    endpoint_url="wss://api.cosyvoice.com",
    api_key="your_api_key",
    timeout=30.0
)

# Using context manager (recommended)
async with cosyvoice.create_client() as client:
    # Use client
    pass

2. Speaker Management API

Create Speaker

Creates a new custom voice from reference audio.

Request:

speaker = await client.speaker.create(
    prompt_text: str,              # Reference text (1-500 chars)
    prompt_audio_path: str,        # HTTP/HTTPS URL to audio file
    zero_shot_spk_id: str = None   # Optional custom ID (auto-generated if not provided)
)

Response:

class SpeakerInfo:
    zero_shot_spk_id: str     # Speaker unique identifier
    prompt_text: str          # Reference text
    created_at: str           # Creation timestamp (ISO format)
    audio_url: str            # Reference audio URL

Get Speaker Information

speaker_info = await client.speaker.get_info(speaker_id: str)

Update Speaker

await client.speaker.update(
    speaker_id: str,
    prompt_text: str = None,        # Optional new reference text
    prompt_audio_path: str = None   # Optional new reference audio
)

Delete Speaker

await client.speaker.delete(speaker_id: str)

Check Speaker Existence

exists = await client.speaker.exists(speaker_id: str)

3. Speech Synthesis API

Synthesis Configuration

config = cosyvoice.SynthesisConfig(
    speaker_id: str,                                    # Required: Speaker ID
    mode: SynthesisMode = SynthesisMode.ZERO_SHOT,     # Synthesis mode
    speed: float = 1.0,                                # Speed multiplier (0.5-3.0)
    output_format: AudioFormat = AudioFormat.WAV,      # Audio format
    sample_rate: int = 22050,                          # Sample rate (Hz)
    instruct_text: str = None,                         # Instruction text (instruct mode only)
    bit_rate: int = 192000,                            # Bit rate for MP3 (bps)
    compression_level: int = 2                         # Compression level (0-9)
)

Supported Synthesis Modes:

ZERO_SHOT: Custom voice cloning mode
SFT: Pre-trained voice mode
CROSS_LINGUAL: Cross-lingual synthesis
INSTRUCT: Natural language instruction mode

Supported Audio Formats:

WAV: Uncompressed audio
MP3: Compressed audio with configurable bit rate
PCM: Raw audio data

Batch Synthesis

Synthesize entire text at once.

audio_data: bytes = await client.collect_audio(
    text: str,
    config: SynthesisConfig
)

Streaming Synthesis

Process audio chunks as they arrive for low-latency playback.

async for result in client.synthesize_text(text: str, config: SynthesisConfig):
    # result.audio_data: bytes - Audio chunk data
    # result.text_index: int - Text segment index
    # result.chunk_index: int - Audio chunk index within text segment


    # Process audio chunk immediately for real-time playback
    await audio_player.play(result.audio_data)

Text Stream Synthesis

Synthesize text as it arrives (ideal for LLM integration).

async def text_generator():
    # Simulate streaming text from LLM
    sentences = ["Hello", "How are you?", "Welcome to our service!"]
    for sentence in sentences:
        yield sentence
        await asyncio.sleep(0.1)

async for result in client.synthesize_stream(text_generator(), config):
    await audio_player.play(result.audio_data)

Quick Synthesis

One-shot synthesis with automatic speaker creation.

audio_data = await client.quick_synthesize(
    text: str,
    speaker_prompt_text: str,
    speaker_audio_file: str,
    speed: float = 1.0,
    output_file: str = None
)

Data Models

SynthesisResult

class SynthesisResult:
    audio_data: bytes      # Audio chunk data
    text_index: int        # Text segment index
    chunk_index: int       # Audio chunk index
    session_id: str        # Synthesis session ID
    metadata: dict         # Additional metadata

Error Handling

# Exception hierarchy
CosyVoiceError                 # Base exception
├── ConnectionError            # Network connection issues
├── AuthenticationError        # Authentication failures
├── SpeakerError              # Speaker management errors
├── SynthesisError            # Speech synthesis errors
├── InvalidStateError         # Client state errors
└── ValidationError           # Input validation errors

# Error handling example
try:
    async with cosyvoice.create_client() as client:
        audio = await client.collect_audio("Hello world", config)

except cosyvoice.ConnectionError as e:
    print(f"Connection failed: {e}")
except cosyvoice.SpeakerError as e:
    print(f"Speaker error: {e}")
except cosyvoice.SynthesisError as e:
    print(f"Synthesis error: {e}")

Advanced Usage

Production Integration Patterns

1. High-Concurrency Server Integration

import asyncio
import cosyvoice
from typing import Dict
import uuid

class ProductionTTSService:
    def __init__(self, endpoint: str, api_key: str):
        self.endpoint = endpoint
        self.api_key = api_key
        self.active_sessions: Dict[str, asyncio.Task] = {}
        self.session_semaphore = asyncio.Semaphore(1000)  # Max concurrent sessions

    async def create_session_client(self) -> cosyvoice.StreamClient:
        """Create dedicated client for each session"""
        return await cosyvoice.create_client(server_url=self.endpoint, api_key=self.api_key)

    async def handle_user_request(self, user_id: str, text: str, config: cosyvoice.SynthesisConfig):
        """Handle individual user TTS request"""
        async with self.session_semaphore:
            session_id = f"{user_id}_{uuid.uuid4().hex[:8]}"

            try:
                async with self.create_session_client() as client:
                    # Stream synthesis results
                    async for result in client.synthesize_text(text, config):
                        # Send to user immediately (WebSocket/SSE/etc.)
                        await self.send_to_user(user_id, result.audio_data)

            except Exception as e:
                await self.handle_error(user_id, e)
            finally:
                # Cleanup
                if session_id in self.active_sessions:
                    del self.active_sessions[session_id]

# FastAPI integration example
from fastapi import FastAPI, WebSocket

app = FastAPI()
tts_service = ProductionTTSService("wss://api.cosyvoice.com", "your_key")

@app.websocket("/tts/{user_id}")
async def tts_websocket(websocket: WebSocket, user_id: str):
    await websocket.accept()

    try:
        while True:
            # Receive TTS request
            data = await websocket.receive_json()

            config = cosyvoice.SynthesisConfig(
                speaker_id=data["speaker_id"],
                speed=data.get("speed", 1.0)
            )

            # Process in background
            task = asyncio.create_task(
                tts_service.handle_user_request(user_id, data["text"], config)
            )
            tts_service.active_sessions[f"{user_id}_current"] = task

    except Exception as e:
        print(f"WebSocket error: {e}")

2. LLM + TTS Integration

async def llm_with_voice_response(user_question: str, voice_config: cosyvoice.SynthesisConfig):
    """Stream LLM response directly to voice synthesis"""

    async def llm_text_stream():
        # Replace with your LLM client (OpenAI, Anthropic, etc.)
        async for text_chunk in your_llm_client.stream(user_question):
            yield text_chunk

    async with cosyvoice.create_client() as tts_client:
        # Stream voice synthesis from LLM output
        async for audio_result in tts_client.synthesize_stream(llm_text_stream(), voice_config):
            # Send audio to user in real-time
            await send_audio_to_user(audio_result.audio_data)

Performance Monitoring

import time
from prometheus_client import Counter, Histogram

# Metrics
tts_requests_total = Counter('cosyvoice_requests_total', 'Total TTS requests')
tts_duration_seconds = Histogram('cosyvoice_duration_seconds', 'TTS request duration')
tts_errors_total = Counter('cosyvoice_errors_total', 'Total TTS errors', ['error_type'])

async def monitored_synthesis(client: cosyvoice.StreamClient, text: str, config: cosyvoice.SynthesisConfig):
    """TTS with monitoring metrics"""
    tts_requests_total.inc()
    start_time = time.time()

    try:
        with tts_duration_seconds.time():
            audio_data = await client.collect_audio(text, config)
            return audio_data

    except cosyvoice.ConnectionError:
        tts_errors_total.labels(error_type='connection').inc()
        raise
    except cosyvoice.SynthesisError:
        tts_errors_total.labels(error_type='synthesis').inc()
        raise

Environment Variables Reference

Variable	Default	Description
`COSYVOICE_BASE_URL`	`http://localhost:8080`	Service endpoint URL
`COSYVOICE_API_KEY`	None	API authentication key
`COSYVOICE_CONNECTION_TIMEOUT`	`30.0`	Connection timeout (seconds)
`COSYVOICE_READ_TIMEOUT`	`60.0`	Read timeout (seconds)
`COSYVOICE_MAX_RECONNECT_ATTEMPTS`	`3`	Maximum reconnection attempts
`COSYVOICE_PING_INTERVAL`	`20.0`	WebSocket ping interval (seconds)
`COSYVOICE_PING_TIMEOUT`	`10.0`	WebSocket ping timeout (seconds)

Protocol Specifications

WebSocket Protocol

The SDK communicates with CosyVoice servers using a structured WebSocket protocol:

Message Format:

{
  "header": {
    "version": "1.0",
    "message_type": "TEXT_REQUEST",
    "timestamp": "2024-01-01T12:00:00Z",
    "sequence": 1
  },
  "payload": {
    "session_id": "session_123",
    "params": {
      "text": "Hello world",
      "mode": "zero_shot",
      "speed": 1.0,
      "output_format": "wav"
    }
  }
}

Message Types:

Client → Server: CONNECT_REQUEST, SESSION_REQUEST, TEXT_REQUEST, SYNTHESIS_END
Server → Client: AUDIO_RESPONSE, AUDIO_COMPLETE, ERROR_RESPONSE

HTTP API Endpoints

Speaker Management:

POST   /v1/speakers              # Create speaker
GET    /v1/speakers/{id}         # Get speaker info
PUT    /v1/speakers/{id}         # Update speaker
DELETE /v1/speakers/{id}         # Delete speaker

Development

Environment Setup

# Clone repository
git clone https://github.com/cosyvoice/cosyvoice-python.git
cd cosyvoice-python

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync --dev

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=cosyvoice --cov-report=html

# Run specific test types
uv run pytest -m unit           # Unit tests only
uv run pytest -m integration   # Integration tests only
uv run pytest -m slow          # Network-dependent tests

Code Quality

# Format code
uv run black cosyvoice tests examples
uv run isort cosyvoice tests examples

# Lint code
uv run ruff check cosyvoice tests examples
uv run ruff check --fix cosyvoice tests examples

# Type checking
uv run mypy cosyvoice

Running Examples

# Basic synthesis example
uv run python examples/basic_synthesis.py

# Real-time streaming example
uv run python examples/realtime_streaming.py

# Speaker management example
uv run python examples/speaker_management.py

Performance Guidelines

Latency Optimization

TTFB Target: < 300ms for optimal user experience
RTF Target: < 0.3 for real-time performance
Connection Reuse: Maintain persistent WebSocket connections
Streaming: Use synthesize_stream() for lowest latency

Throughput Optimization

Connection Pooling: Pre-create client connections
Concurrent Sessions: Support multiple parallel synthesis requests
Batch Processing: Group small text segments when possible
Format Selection: Use PCM for lowest processing overhead

Resource Management

Memory: Process audio chunks immediately, avoid accumulation
Connections: Use context managers for automatic cleanup
Error Recovery: Implement exponential backoff for reconnections

Troubleshooting

Common Issues

Connection Timeout

# Increase timeout values
client = await cosyvoice.create_client(
    server_url="wss://api.cosyvoice.com",
    connect_timeout=60.0,
    request_timeout=120.0
)

Speaker Not Found

# Always check speaker existence
if not await client.speaker.exists(speaker_id):
    speaker = await client.speaker.create(prompt_text, audio_url)
    speaker_id = speaker.zero_shot_spk_id

Audio Format Issues

# PCM format requires explicit WAV conversion for playback
from cosyvoice.utils.audio import write_wav_file
write_wav_file(pcm_data, "output.wav", sample_rate=22050)

Debug Logging

import logging
logging.basicConfig(level=logging.DEBUG)

# Enable detailed WebSocket and HTTP logging
logger = logging.getLogger("cosyvoice")
logger.setLevel(logging.DEBUG)

Support & Community

Examples: Complete integration samples in /examples directory

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

We welcome contributions! Please see our Contributing Guide for details on how to submit pull requests, report issues, and suggest improvements.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.4

Oct 17, 2025

1.0.3

Sep 15, 2025

1.0.1

Sep 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosyvoice_client-1.0.4.tar.gz (55.1 kB view details)

Uploaded Oct 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cosyvoice_client-1.0.4-py3-none-any.whl (46.5 kB view details)

Uploaded Oct 17, 2025 Python 3

File details

Details for the file cosyvoice_client-1.0.4.tar.gz.

File metadata

Download URL: cosyvoice_client-1.0.4.tar.gz
Upload date: Oct 17, 2025
Size: 55.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for cosyvoice_client-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`31f38920f52d17bb66868fa1ebb4bbfc845eff993238bbef6af6f26cf0581738`
MD5	`de311a0e4a74b6d25ada7458f429688f`
BLAKE2b-256	`716dc3a673cc0f1d7ce225bd618adccfaa70addc6b9031bbfac7edbcd6633cff`

See more details on using hashes here.

File details

Details for the file cosyvoice_client-1.0.4-py3-none-any.whl.

File metadata

Download URL: cosyvoice_client-1.0.4-py3-none-any.whl
Upload date: Oct 17, 2025
Size: 46.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for cosyvoice_client-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c569fadee9f7e2e812c259a4d1cdac22c703c00dd3383edc9cce1c55f75a0b7`
MD5	`1ac15cc822038848f5dcc8fd3004a6d2`
BLAKE2b-256	`9bf2af3e0c0d36e2be7ab813ee872dd6fff8d78c60f03b1eaf6ea96e1383ce46`

See more details on using hashes here.

cosyvoice-client 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CosyVoice Python SDK

Overview

Quick Start

Installation

Authentication & Configuration

Environment Variables (Recommended)

Configuration in Code

Basic Usage Example

API Reference

Authentication

Core Interfaces

1. Client Management

Create Client Connection

2. Speaker Management API

Create Speaker

Get Speaker Information

Update Speaker

Delete Speaker

Check Speaker Existence

3. Speech Synthesis API

Synthesis Configuration

Batch Synthesis

Streaming Synthesis

Text Stream Synthesis

Quick Synthesis

Data Models

SynthesisResult

Error Handling

Advanced Usage

Production Integration Patterns

1. High-Concurrency Server Integration

2. LLM + TTS Integration

Performance Monitoring

Environment Variables Reference

Protocol Specifications

WebSocket Protocol

HTTP API Endpoints

Development

Environment Setup

Running Tests

Code Quality

Running Examples

Performance Guidelines

Latency Optimization

Throughput Optimization

Resource Management

Troubleshooting

Common Issues

Debug Logging

Support & Community

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes