Official Python SDK for KugelAudio TTS API
Project description
KugelAudio Python SDK
Official Python SDK for the KugelAudio Text-to-Speech API.
Installation
pip install kugelaudio
Or with uv:
uv add kugelaudio
Quick Start
from kugelaudio import KugelAudio
# Initialize the client - just needs an API key!
client = KugelAudio(api_key="your_api_key")
# Generate speech
audio = client.tts.generate(
text="Hello, world!",
model_id="kugel-1-turbo",
)
# Save to file
audio.save("output.wav")
Client Configuration
from kugelaudio import KugelAudio
# Simple setup - single URL handles everything
client = KugelAudio(api_key="your_api_key")
# Or with custom options
client = KugelAudio(
api_key="your_api_key", # Required: Your API key
api_url="https://api.kugelaudio.com", # Optional: API base URL (default)
timeout=60.0, # Optional: Request timeout in seconds
)
Single URL Architecture
The SDK uses a single URL for both REST API and WebSocket streaming. The TTS server provides both REST endpoints (/v1/models, /v1/voices) and WebSocket (/ws/tts) - no proxy needed, minimal latency.
Local Development
For local development, point directly to your TTS server:
client = KugelAudio(
api_key="your_api_key",
api_url="http://localhost:8000", # TTS server handles everything
)
Or if you have separate backend and TTS servers:
client = KugelAudio(
api_key="your_api_key",
api_url="http://localhost:8001", # Backend for REST API
tts_url="http://localhost:8000", # TTS server for WebSocket streaming
)
Available Models
| Model ID | Name | Parameters | Description |
|---|---|---|---|
kugel-1-turbo |
Kugel 1 Turbo | 1.5B | Fast, low-latency model for real-time applications |
kugel-1 |
Kugel 1 | 7B | Premium quality model for pre-recorded content |
List Available Models
models = client.models.list()
for model in models:
print(f"{model.id}: {model.name}")
print(f" Description: {model.description}")
print(f" Parameters: {model.parameters}")
print(f" Max Input: {model.max_input_length} characters")
print(f" Sample Rate: {model.sample_rate} Hz")
Voices
List Available Voices
# List all available voices
voices = client.voices.list()
for voice in voices:
print(f"{voice.id}: {voice.name}")
print(f" Category: {voice.category}")
print(f" Languages: {', '.join(voice.supported_languages)}")
# Filter by language
german_voices = client.voices.list(language="de")
# Get only public voices
public_voices = client.voices.list(include_public=True)
# Limit results
first_10 = client.voices.list(limit=10)
Get a Specific Voice
voice = client.voices.get(voice_id=123)
print(f"Voice: {voice.name}")
print(f"Sample text: {voice.sample_text}")
Text-to-Speech Generation
Basic Generation (Non-Streaming)
Generate complete audio and receive it all at once:
audio = client.tts.generate(
text="Hello, this is a test of the KugelAudio text-to-speech system.",
model_id="kugel-1-turbo", # 'kugel-1-turbo' (fast) or 'kugel-1' (quality)
voice_id=123, # Optional: specific voice ID
cfg_scale=2.0, # Guidance scale (1.0-5.0)
max_new_tokens=2048, # Maximum tokens to generate
sample_rate=24000, # Output sample rate
normalize=True, # Enable text normalization (see below)
language="en", # Language for normalization
)
# Audio properties
print(f"Duration: {audio.duration_seconds:.2f}s")
print(f"Samples: {audio.samples}")
print(f"Sample rate: {audio.sample_rate} Hz")
print(f"Generation time: {audio.generation_ms:.0f}ms")
print(f"RTF: {audio.rtf:.2f}") # Real-time factor
# Save to WAV file
audio.save("output.wav")
# Get raw PCM bytes
pcm_data = audio.audio
# Get WAV bytes (with header)
wav_bytes = audio.to_wav_bytes()
Streaming Audio Output
Receive audio chunks as they are generated for lower latency:
# Synchronous streaming
for item in client.tts.stream(
text="Hello, this is streaming audio.",
model_id="kugel-1-turbo",
):
if hasattr(item, 'audio'): # AudioChunk
# Process audio chunk immediately
print(f"Chunk {item.index}: {len(item.audio)} bytes, {item.samples} samples")
# play_audio(item.audio)
elif isinstance(item, dict) and item.get('final'):
# Final stats
print(f"Total duration: {item.get('dur_ms', 0):.0f}ms")
print(f"Time to first audio: {item.get('ttfa_ms', 0):.0f}ms")
Async Streaming
For async applications:
import asyncio
async def generate_speech():
async for item in client.tts.stream_async(
text="Async streaming example.",
model_id="kugel-1-turbo",
):
if hasattr(item, 'audio'):
# Process chunk
pass
asyncio.run(generate_speech())
Async Generation
import asyncio
async def main():
audio = await client.tts.generate_async(
text="Async generation example.",
model_id="kugel-1-turbo",
)
audio.save("async_output.wav")
asyncio.run(main())
Text Normalization
Text normalization converts numbers, dates, times, and other non-verbal text into spoken words. For example:
- "I have 3 apples" → "I have three apples"
- "The meeting is at 2:30 PM" → "The meeting is at two thirty PM"
- "€50.99" → "fifty euros and ninety-nine cents"
Usage
# With explicit language (recommended - fastest)
audio = client.tts.generate(
text="I bought 3 items for €50.99 on 01/15/2024.",
normalize=True,
language="en", # Specify language for best performance
)
# With auto-detection (adds ~150ms latency)
audio = client.tts.generate(
text="Ich habe 3 Artikel für 50,99€ gekauft.",
normalize=True,
# language not specified - will auto-detect
)
Supported Languages
| Code | Language | Code | Language |
|---|---|---|---|
de |
German | nl |
Dutch |
en |
English | pl |
Polish |
fr |
French | sv |
Swedish |
es |
Spanish | da |
Danish |
it |
Italian | no |
Norwegian |
pt |
Portuguese | fi |
Finnish |
cs |
Czech | hu |
Hungarian |
ro |
Romanian | el |
Greek |
uk |
Ukrainian | bg |
Bulgarian |
tr |
Turkish | vi |
Vietnamese |
ar |
Arabic | hi |
Hindi |
zh |
Chinese | ja |
Japanese |
ko |
Korean |
Performance Warning
⚠️ Latency Warning: Using
normalize=Truewithout specifyinglanguageadds approximately 150ms latency for language auto-detection. For best performance in latency-sensitive applications, always specify thelanguageparameter.
LLM Integration: Streaming Text Input
For real-time TTS when streaming text from an LLM (like GPT-4, Claude, etc.):
Async Streaming Session
import asyncio
async def stream_from_llm():
# Simulate LLM token stream
llm_tokens = ["Hello, ", "this ", "is ", "a ", "streamed ", "response."]
async with client.tts.streaming_session(
voice_id=123,
cfg_scale=2.0,
flush_timeout_ms=500, # Auto-flush after 500ms of no input
) as session:
# Send tokens as they arrive from LLM
for token in llm_tokens:
async for chunk in session.send(token):
# Play audio chunk immediately
play_audio(chunk.audio)
# Flush any remaining text
async for chunk in session.flush():
play_audio(chunk.audio)
asyncio.run(stream_from_llm())
Synchronous Streaming Session
with client.tts.streaming_session_sync(voice_id=123) as session:
for token in llm_tokens:
for chunk in session.send(token):
play_audio(chunk.audio)
for chunk in session.flush():
play_audio(chunk.audio)
Error Handling
from kugelaudio import KugelAudio
from kugelaudio.exceptions import (
KugelAudioError,
AuthenticationError,
RateLimitError,
InsufficientCreditsError,
ValidationError,
ConnectionError,
)
try:
audio = client.tts.generate(text="Hello!")
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit exceeded, please wait")
except InsufficientCreditsError:
print("Not enough credits, please top up")
except ValidationError as e:
print(f"Invalid request: {e}")
except ConnectionError:
print("Failed to connect to server")
except KugelAudioError as e:
print(f"API error: {e}")
Data Models
AudioChunk
Represents a single audio chunk from streaming:
class AudioChunk:
audio: bytes # Raw PCM16 audio data
encoding: str # 'pcm_s16le'
index: int # Chunk index (0-based)
sample_rate: int # Sample rate (24000)
samples: int # Number of samples in chunk
@property
def duration_seconds(self) -> float:
"""Duration of this chunk in seconds."""
AudioResponse
Complete audio response from generation:
class AudioResponse:
audio: bytes # Complete PCM16 audio
sample_rate: int # Sample rate (24000)
samples: int # Total samples
duration_ms: float # Duration in milliseconds
generation_ms: float # Generation time in milliseconds
rtf: float # Real-time factor
@property
def duration_seconds(self) -> float:
"""Duration in seconds."""
def save(self, path: str) -> None:
"""Save as WAV file."""
def to_wav_bytes(self) -> bytes:
"""Get WAV file as bytes."""
Model
TTS model information:
class Model:
id: str # 'kugel-1-turbo' or 'kugel-1'
name: str # Human-readable name
description: str # Model description
parameters: str # Parameter count ('1.5B', '7B')
max_input_length: int # Maximum input characters
sample_rate: int # Output sample rate
Voice
Voice information:
class Voice:
id: int # Voice ID
name: str # Voice name
description: Optional[str] # Description
category: Optional[VoiceCategory] # 'premade', 'cloned', 'generated'
sex: Optional[VoiceSex] # 'male', 'female', 'neutral'
age: Optional[VoiceAge] # 'young', 'middle_aged', 'old'
supported_languages: List[str] # ['en', 'de', ...]
sample_text: Optional[str] # Sample text for preview
avatar_url: Optional[str] # Avatar image URL
sample_url: Optional[str] # Sample audio URL
is_public: bool # Whether voice is public
verified: bool # Whether voice is verified
Complete Example
from kugelaudio import KugelAudio
# Initialize client
client = KugelAudio(api_key="your_api_key")
# List available models
print("Available Models:")
for model in client.models.list():
print(f" - {model.id}: {model.name} ({model.parameters})")
# List available voices
print("\nAvailable Voices:")
for voice in client.voices.list(limit=5):
print(f" - {voice.id}: {voice.name}")
# Generate audio
print("\nGenerating audio...")
audio = client.tts.generate(
text="Welcome to KugelAudio. This is an example of high-quality text-to-speech synthesis.",
model_id="kugel-1-turbo",
)
print(f"Generated {audio.duration_seconds:.2f}s of audio in {audio.generation_ms:.0f}ms")
print(f"Real-time factor: {audio.rtf:.2f}x")
# Save to file
audio.save("example.wav")
print("Saved to example.wav")
# Close client
client.close()
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kugelaudio-0.1.12.tar.gz.
File metadata
- Download URL: kugelaudio-0.1.12.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
407b911bc88258a262c595b39b1618e2df6924920fe6a96bd50ed8fc91c3a8bb
|
|
| MD5 |
698d926f707303c317a22657f84e7a0a
|
|
| BLAKE2b-256 |
2ceabe38c66aa873f045d1d1e1e701e1b9a0b72cad6142c73877595e7d8ef798
|
File details
Details for the file kugelaudio-0.1.12-py3-none-any.whl.
File metadata
- Download URL: kugelaudio-0.1.12-py3-none-any.whl
- Upload date:
- Size: 34.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f43c1a35a8276b186783f3d78c468c9c80674688a79d39bfce532a68dceb58c7
|
|
| MD5 |
cf50b1fc2eb3a3fa2a1f67a3bd0b16d4
|
|
| BLAKE2b-256 |
462d86986177eed494d805138ce16953f6796132b3975345459e8c248c8a2543
|