Official Python SDK for KugelAudio TTS API

These details have not been verified by PyPI

Project links

Project description

KugelAudio Python SDK

Official Python SDK for the KugelAudio Text-to-Speech API.

Installation

pip install kugelaudio

Or with uv:

uv add kugelaudio

Quick Start

from kugelaudio import KugelAudio

# Initialize the client - just needs an API key!
client = KugelAudio(api_key="your_api_key")

# Generate speech
audio = client.tts.generate(
    text="Hello, world!",
    model_id="kugel-1-turbo",
)

# Save to file
audio.save("output.wav")

Client Configuration

from kugelaudio import KugelAudio

# Simple setup - single URL handles everything
client = KugelAudio(api_key="your_api_key")

# Or with custom options
client = KugelAudio(
    api_key="your_api_key",           # Required: Your API key
    api_url="https://api.kugelaudio.com",  # Optional: API base URL (default)
    timeout=60.0,                      # Optional: Request timeout in seconds
)

Single URL Architecture

The SDK uses a single URL for both REST API and WebSocket streaming. The TTS server provides both REST endpoints (/v1/models, /v1/voices) and WebSocket (/ws/tts) - no proxy needed, minimal latency.

Local Development

For local development, point directly to your TTS server:

client = KugelAudio(
    api_key="your_api_key",
    api_url="http://localhost:8000",   # TTS server handles everything
)

Or if you have separate backend and TTS servers:

client = KugelAudio(
    api_key="your_api_key",
    api_url="http://localhost:8001",   # Backend for REST API
    tts_url="http://localhost:8000",   # TTS server for WebSocket streaming
)

Available Models

Model ID	Name	Description
`kugel-1-turbo`	Kugel 1 Turbo	Fast, low-latency model for real-time applications
`kugel-1`	Kugel 1	Premium quality model for pre-recorded content

List Available Models

models = client.models.list()

for model in models:
    print(f"{model.id}: {model.name}")
    print(f"  Description: {model.description}")
    print(f"  Max Input: {model.max_input_length} characters")
    print(f"  Sample Rate: {model.sample_rate} Hz")

Voices

List Available Voices

# List all available voices
voices = client.voices.list()

for voice in voices:
    print(f"{voice.id}: {voice.name}")
    print(f"  Category: {voice.category}")
    print(f"  Languages: {', '.join(voice.supported_languages)}")

# Filter by language
german_voices = client.voices.list(language="de")

# Get only public voices
public_voices = client.voices.list(include_public=True)

# Limit results
first_10 = client.voices.list(limit=10)

Get a Specific Voice

voice = client.voices.get(voice_id=123)
print(f"Voice: {voice.name}")
print(f"Sample text: {voice.sample_text}")

Text-to-Speech Generation

Basic Generation (Non-Streaming)

Generate complete audio and receive it all at once:

audio = client.tts.generate(
    text="Hello, this is a test of the KugelAudio text-to-speech system.",
    model_id="kugel-1-turbo",  # 'kugel-1-turbo' (fast) or 'kugel-1' (quality)
    voice_id=123,              # Optional: specific voice ID
    cfg_scale=2.0,             # Guidance scale (1.0-5.0)
    max_new_tokens=2048,       # Maximum tokens to generate
    sample_rate=24000,         # Output sample rate
    normalize=True,            # Enable text normalization (see below)
    language="en",             # Language for normalization
)

# Audio properties
print(f"Duration: {audio.duration_seconds:.2f}s")
print(f"Samples: {audio.samples}")
print(f"Sample rate: {audio.sample_rate} Hz")
print(f"Generation time: {audio.generation_ms:.0f}ms")
print(f"RTF: {audio.rtf:.2f}")  # Real-time factor

# Save to WAV file
audio.save("output.wav")

# Get raw PCM bytes
pcm_data = audio.audio

# Get WAV bytes (with header)
wav_bytes = audio.to_wav_bytes()

Streaming Audio Output

Receive audio chunks as they are generated for lower latency:

# Synchronous streaming
for item in client.tts.stream(
    text="Hello, this is streaming audio.",
    model_id="kugel-1-turbo",
):
    if hasattr(item, 'audio'):  # AudioChunk
        # Process audio chunk immediately
        print(f"Chunk {item.index}: {len(item.audio)} bytes, {item.samples} samples")
        # play_audio(item.audio)
    elif isinstance(item, dict) and item.get('final'):
        # Final stats
        print(f"Total duration: {item.get('dur_ms', 0):.0f}ms")
        print(f"Generation time: {item.get('gen_ms', 0):.0f}ms")

Async Streaming

For async applications:

import asyncio

async def generate_speech():
    async for item in client.tts.stream_async(
        text="Async streaming example.",
        model_id="kugel-1-turbo",
    ):
        if hasattr(item, 'audio'):
            # Process chunk
            pass

asyncio.run(generate_speech())

Async Generation

import asyncio

async def main():
    audio = await client.tts.generate_async(
        text="Async generation example.",
        model_id="kugel-1-turbo",
    )
    audio.save("async_output.wav")

asyncio.run(main())

Text Normalization

Text normalization converts numbers, dates, times, and other non-verbal text into spoken words. For example:

"I have 3 apples" → "I have three apples"
"The meeting is at 2:30 PM" → "The meeting is at two thirty PM"
"€50.99" → "fifty euros and ninety-nine cents"

Usage

# With explicit language (recommended - fastest)
audio = client.tts.generate(
    text="I bought 3 items for €50.99 on 01/15/2024.",
    normalize=True,
    language="en",  # Specify language for best performance
)

# With auto-detection (adds ~150ms latency)
audio = client.tts.generate(
    text="Ich habe 3 Artikel für 50,99€ gekauft.",
    normalize=True,
    # language not specified - will auto-detect
)

Supported Languages

Code	Language	Code	Language
`de`	German	`nl`	Dutch
`en`	English	`pl`	Polish
`fr`	French	`sv`	Swedish
`es`	Spanish	`da`	Danish
`it`	Italian	`no`	Norwegian
`pt`	Portuguese	`fi`	Finnish
`cs`	Czech	`hu`	Hungarian
`ro`	Romanian	`el`	Greek
`uk`	Ukrainian	`bg`	Bulgarian
`tr`	Turkish	`vi`	Vietnamese
`ar`	Arabic	`hi`	Hindi
`zh`	Chinese	`ja`	Japanese
`ko`	Korean

Performance Warning

⚠️ Latency Warning: Using normalize=True without specifying language adds approximately 150ms latency for language auto-detection. For best performance in latency-sensitive applications, always specify the language parameter.

LLM Integration: Streaming Text Input

For real-time TTS when streaming text from an LLM (like GPT-4, Claude, etc.):

Async Streaming Session

import asyncio

async def stream_from_llm():
    # Simulate LLM token stream
    llm_tokens = ["Hello, ", "this ", "is ", "a ", "streamed ", "response."]
    
    async with client.tts.streaming_session(
        voice_id=123,
        cfg_scale=2.0,
        flush_timeout_ms=500,  # Auto-flush after 500ms of no input
    ) as session:
        # Send tokens as they arrive from LLM
        for token in llm_tokens:
            async for chunk in session.send(token):
                # Play audio chunk immediately
                play_audio(chunk.audio)
        
        # Flush any remaining text
        async for chunk in session.flush():
            play_audio(chunk.audio)

asyncio.run(stream_from_llm())

Synchronous Streaming Session

with client.tts.streaming_session_sync(voice_id=123) as session:
    for token in llm_tokens:
        for chunk in session.send(token):
            play_audio(chunk.audio)
    
    for chunk in session.flush():
        play_audio(chunk.audio)

Error Handling

from kugelaudio import KugelAudio
from kugelaudio.exceptions import (
    KugelAudioError,
    AuthenticationError,
    RateLimitError,
    InsufficientCreditsError,
    ValidationError,
)

try:
    audio = client.tts.generate(text="Hello!")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit exceeded, please wait")
except InsufficientCreditsError:
    print("Not enough credits, please top up")
except ValidationError as e:
    print(f"Invalid request: {e}")
except KugelAudioError as e:
    print(f"API error: {e}")

Data Models

AudioChunk

Represents a single audio chunk from streaming:

class AudioChunk:
    audio: bytes          # Raw PCM16 audio data
    encoding: str         # 'pcm_s16le'
    index: int           # Chunk index (0-based)
    sample_rate: int     # Sample rate (24000)
    samples: int         # Number of samples in chunk
    
    @property
    def duration_seconds(self) -> float:
        """Duration of this chunk in seconds."""

AudioResponse

Complete audio response from generation:

class AudioResponse:
    audio: bytes              # Complete PCM16 audio
    sample_rate: int          # Sample rate (24000)
    samples: int              # Total samples
    duration_ms: float        # Duration in milliseconds
    generation_ms: float      # Generation time in milliseconds
    rtf: float               # Real-time factor
    
    @property
    def duration_seconds(self) -> float:
        """Duration in seconds."""
    
    def save(self, path: str) -> None:
        """Save as WAV file."""
    
    def to_wav_bytes(self) -> bytes:
        """Get WAV file as bytes."""

Model

TTS model information:

class Model:
    id: str                   # 'kugel-1-turbo' or 'kugel-1'
    name: str                 # Human-readable name
    description: str          # Model description
    max_input_length: int     # Maximum input characters
    sample_rate: int          # Output sample rate

Voice

Voice information:

class Voice:
    id: int                          # Voice ID
    name: str                        # Voice name
    description: Optional[str]       # Description
    category: Optional[VoiceCategory]  # 'premade', 'cloned', 'generated'
    sex: Optional[VoiceSex]          # 'male', 'female', 'neutral'
    age: Optional[VoiceAge]          # 'young', 'middle_aged', 'old'
    supported_languages: List[str]   # ['en', 'de', ...]
    sample_text: Optional[str]       # Sample text for preview
    avatar_url: Optional[str]        # Avatar image URL
    sample_url: Optional[str]        # Sample audio URL
    is_public: bool                  # Whether voice is public
    verified: bool                   # Whether voice is verified

Complete Example

from kugelaudio import KugelAudio

# Initialize client
client = KugelAudio(api_key="your_api_key")

# List available models
print("Available Models:")
for model in client.models.list():
    print(f"  - {model.id}: {model.name}")

# List available voices
print("\nAvailable Voices:")
for voice in client.voices.list(limit=5):
    print(f"  - {voice.id}: {voice.name}")

# Generate audio
print("\nGenerating audio...")
audio = client.tts.generate(
    text="Welcome to KugelAudio. This is an example of high-quality text-to-speech synthesis.",
    model_id="kugel-1-turbo",
)

print(f"Generated {audio.duration_seconds:.2f}s of audio in {audio.generation_ms:.0f}ms")
print(f"Real-time factor: {audio.rtf:.2f}x")

# Save to file
audio.save("example.wav")
print("Saved to example.wav")

# Close client
client.close()

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.2

May 28, 2026

1.3.1

May 21, 2026

1.3.0

May 21, 2026

1.2.3

May 14, 2026

1.2.2

May 10, 2026

1.2.1

May 10, 2026

1.2.0

Apr 28, 2026

1.1.2

Apr 19, 2026

1.1.1

Apr 18, 2026

1.1.0

Apr 18, 2026

1.0.0

Apr 15, 2026

0.5.0

Mar 27, 2026

This version

0.4.0

Mar 24, 2026

0.3.1

Mar 2, 2026

0.3.0

Mar 2, 2026

0.2.3

Feb 15, 2026

0.2.2

Feb 11, 2026

0.2.1

Jan 26, 2026

0.2.0

Jan 22, 2026

0.1.15

Feb 11, 2026

0.1.14

Feb 10, 2026

0.1.13

Feb 10, 2026

0.1.12

Feb 8, 2026

0.1.11

Feb 7, 2026

0.1.9

Feb 7, 2026

0.1.7

Jan 8, 2026

0.1.5

Jan 1, 2026

0.1.2

Dec 21, 2025

0.1.1

Dec 17, 2025

0.1.0

Dec 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kugelaudio-0.4.0.tar.gz (45.0 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kugelaudio-0.4.0-py3-none-any.whl (42.1 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file kugelaudio-0.4.0.tar.gz.

File metadata

Download URL: kugelaudio-0.4.0.tar.gz
Upload date: Mar 24, 2026
Size: 45.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kugelaudio-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`120aef64782e027396fb95085ca58d98eee4e0c3d1914657b49b45ccaa2421c6`
MD5	`37c07d30cd38c4177aeafe0ead0d1dcc`
BLAKE2b-256	`0ae82e24640de1eca5335b7d15c29925db40ec266d76d2740d3bd05e5c793068`

See more details on using hashes here.

File details

Details for the file kugelaudio-0.4.0-py3-none-any.whl.

File metadata

Download URL: kugelaudio-0.4.0-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 42.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kugelaudio-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`008a21b94ad6c95bc250b2f34eb84fc237f6aaba38e411dfa45184be8ce63f43`
MD5	`33b7a36818c632582f8a0af9bdf3480c`
BLAKE2b-256	`3b004a496e90c9c2a41dad72729170af829786191bb5c053d895a1ce3ff5f473`

See more details on using hashes here.

kugelaudio 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

KugelAudio Python SDK

Installation

Quick Start

Client Configuration

Single URL Architecture

Local Development

Available Models

List Available Models

Voices

List Available Voices

Get a Specific Voice

Text-to-Speech Generation

Basic Generation (Non-Streaming)

Streaming Audio Output

Async Streaming

Async Generation

Text Normalization

Usage

Supported Languages

Performance Warning

LLM Integration: Streaming Text Input

Async Streaming Session

Synchronous Streaming Session

Error Handling

Data Models

AudioChunk

AudioResponse

Model

Voice

Complete Example

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes