Shunyalabs Python SDK — ASR, TTS, and Conversational AI

These details have not been verified by PyPI

Project links

Project description

Shunyalabs Python SDK

The official Python SDK for Shunyalabs Speech AI APIs — ASR (speech-to-text) and TTS (text-to-speech).

Supports HTTP batch and WebSocket streaming modes with a fully async client.

Installation

pip install shunyalabs[all]

Install only what you need:

pip install shunyalabs[ASR]     # Speech-to-text only
pip install shunyalabs[TTS]     # Text-to-speech only
pip install shunyalabs[extras]  # Audio playback helpers (sounddevice)

Authentication

All API calls use Authorization: Bearer <api_key> header authentication.

from shunyalabs import AsyncShunyaClient

client = AsyncShunyaClient(api_key="your-api-key")

Or set the SHUNYALABS_API_KEY environment variable and omit api_key=.

Quick Start

TTS — Batch (HTTP)

import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig

async def main():
    async with AsyncShunyaClient(api_key="your-api-key") as client:
        result = await client.tts.synthesize(
            "Hello, world!",
            config=TTSConfig(model="zero-indic", voice="Varun"),
        )
        result.save("output.mp3")
        print(f"{len(result.audio_data)} bytes saved")

asyncio.run(main())

TTS — Streaming (WebSocket)

import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.tts import TTSConfig

async def main():
    async with AsyncShunyaClient(api_key="your-api-key") as client:
        chunks = []
        async for audio in await client.tts.stream(
            "Hello, world!",
            config=TTSConfig(model="zero-indic", voice="Varun"),
        ):
            chunks.append(audio)
        print(f"{len(chunks)} chunks, {sum(len(c) for c in chunks)} bytes")

asyncio.run(main())

ASR — Batch (HTTP)

import asyncio
from shunyalabs import AsyncShunyaClient
from shunyalabs.asr import TranscriptionConfig

async def main():
    async with AsyncShunyaClient(api_key="your-api-key") as client:
        result = await client.asr.transcribe(
            "audio.wav",
            config=TranscriptionConfig(model="zero-indic"),
        )
        print(result.text)

asyncio.run(main())

ASR — Streaming (WebSocket)

import asyncio, subprocess
from shunyalabs import AsyncShunyaClient
from shunyalabs.asr import StreamingConfig, StreamingMessageType

async def main():
    async with AsyncShunyaClient(api_key="your-api-key") as client:
        conn = await client.asr.stream(
            config=StreamingConfig(language="en", sample_rate=16000),
        )

        @conn.on(StreamingMessageType.FINAL_SEGMENT)
        def on_seg(msg):
            print(f"[seg] {msg.text}")

        @conn.on(StreamingMessageType.FINAL)
        def on_final(msg):
            print(f"[final] {msg.text}")

        # Convert audio to 16kHz mono PCM and stream
        pcm = subprocess.run(
            ["ffmpeg", "-i", "audio.wav", "-ar", "16000", "-ac", "1", "-f", "s16le", "-"],
            capture_output=True,
        ).stdout

        for i in range(0, len(pcm), 4096):
            await conn.send_audio(pcm[i : i + 4096])

        await conn.end()
        await conn.close()

asyncio.run(main())

API Reference

Client Configuration

`AsyncShunyaClient`

Parameter	Type	Default	Description
`api_key`	`str`	`None`	API key. Falls back to `SHUNYALABS_API_KEY` env var.
`timeout`	`float`	`60.0`	Request timeout in seconds.
`max_retries`	`int`	`2`	Retries for failed requests (5xx, connection errors).
`asr_url`	`str`	`https://asr.shunyalabs.ai`	ASR batch API base URL.
`asr_ws_url`	`str`	`wss://asr.shunyalabs.ai/ws`	ASR streaming WebSocket URL.
`tts_url`	`str`	`https://tts.shunyalabs.ai`	TTS batch API base URL.
`tts_ws_url`	`str`	`wss://tts.shunyalabs.ai/ws`	TTS streaming WebSocket URL.

All URL parameters can also be set via environment variables: SHUNYALABS_ASR_URL, SHUNYALABS_ASR_WS_URL, SHUNYALABS_TTS_URL, SHUNYALABS_TTS_WS_URL.

Examples:

# Default — uses production endpoints
client = AsyncShunyaClient(api_key="your-api-key")

# Custom timeout and retries
client = AsyncShunyaClient(api_key="your-api-key", timeout=120.0, max_retries=5)

# Self-hosted endpoints
client = AsyncShunyaClient(
    api_key="your-api-key",
    asr_url="https://my-asr-server.example.com",
    tts_url="https://my-tts-server.example.com",
    tts_ws_url="wss://my-tts-server.example.com/ws",
)

TTS API

`TTSConfig`

Configuration for synthesis requests. Passed as config= to synthesize() and stream().

Parameter	Type	Default	Description
`model`	`str`	required	Model name (e.g. `"zero-indic"`).
`voice`	`str`	required	Speaker voice name. See Available Speakers.
`response_format`	`OutputFormat`	`"mp3"`	Output audio format. See Output Formats.
`speed`	`float`	`1.0`	Speaking speed multiplier (0.25–4.0).
`trim_silence`	`bool`	`False`	Trim leading/trailing silence from audio.
`volume_normalization`	`str`	`None`	`"peak"` or `"loudness"`.
`word_timestamps`	`bool`	`False`	Return word-level timestamps (batch only).
`background_audio`	`str`	`None`	Preset name or base64-encoded background audio.
`background_volume`	`float`	`0.1`	Background volume relative to speech (0.0–1.0).

TTS Parameter Examples

model — Select the TTS model

# Currently available: "zero-indic"
config = TTSConfig(model="zero-indic", voice="Rajesh")
result = await client.tts.synthesize("Hello!", config=config)
# Output: 48000 bytes saved to output.mp3

voice — Choose a speaker

# Male English speaker
config = TTSConfig(model="zero-indic", voice="Varun")

# Female Hindi speaker
config = TTSConfig(model="zero-indic", voice="Sunita")

# Any speaker can speak any language — voice only controls vocal characteristics
config = TTSConfig(model="zero-indic", voice="Murugan")  # Tamil-native male speaking English
result = await client.tts.synthesize("Good morning, how are you?", config=config)

response_format — Output audio format

Values: "pcm", "wav", "mp3", "ogg_opus", "flac", "mulaw", "alaw"

# MP3 (default) — compressed, good for storage
config = TTSConfig(model="zero-indic", voice="Varun", response_format="mp3")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.mp3")
# Output: 12480 bytes (compressed)

# WAV — uncompressed, good for processing
config = TTSConfig(model="zero-indic", voice="Varun", response_format="wav")
result = await client.tts.synthesize("Hello!", config=config)
result.save("output.wav")
# Output: 96044 bytes (uncompressed with header)

# PCM — raw samples, for real-time pipelines
config = TTSConfig(model="zero-indic", voice="Varun", response_format="pcm")
result = await client.tts.synthesize("Hello!", config=config)
# Output: 96000 bytes (raw 16-bit samples)

# OGG Opus — compressed, good for web streaming
config = TTSConfig(model="zero-indic", voice="Varun", response_format="ogg_opus")

# mu-law / A-law — for telephony systems
config = TTSConfig(model="zero-indic", voice="Varun", response_format="mulaw")
config = TTSConfig(model="zero-indic", voice="Varun", response_format="alaw")

speed — Speaking speed multiplier

Range: 0.25 (very slow) to 4.0 (very fast). Default: 1.0.

# Slow — good for language learning
config = TTSConfig(model="zero-indic", voice="Nisha", speed=0.75)
result = await client.tts.synthesize("Take your time to understand this.", config=config)
# Output: longer audio, ~33% slower than normal

# Normal speed (default)
config = TTSConfig(model="zero-indic", voice="Nisha", speed=1.0)

# Fast — good for notifications or summaries
config = TTSConfig(model="zero-indic", voice="Nisha", speed=1.5)
result = await client.tts.synthesize("Quick update: your order has shipped.", config=config)
# Output: shorter audio, ~50% faster than normal

# Very fast
config = TTSConfig(model="zero-indic", voice="Nisha", speed=2.0)

trim_silence — Remove silence padding

# Without trim (default) — audio may have leading/trailing silence
config = TTSConfig(model="zero-indic", voice="Rajesh", trim_silence=False)
result = await client.tts.synthesize("Hello.", config=config)
# Output: 64000 bytes (includes silence padding)

# With trim — tighter audio, no dead air
config = TTSConfig(model="zero-indic", voice="Rajesh", trim_silence=True)
result = await client.tts.synthesize("Hello.", config=config)
# Output: 48000 bytes (silence stripped)

volume_normalization — Normalize audio loudness

Values: None (off), "peak", "loudness"

# No normalization (default)
config = TTSConfig(model="zero-indic", voice="Rajesh")

# Peak normalization — scale so the loudest sample hits 0 dBFS
config = TTSConfig(model="zero-indic", voice="Rajesh", volume_normalization="peak")
result = await client.tts.synthesize("This audio will have consistent peak levels.", config=config)

# Loudness normalization — perceptually even loudness (EBU R128)
config = TTSConfig(model="zero-indic", voice="Rajesh", volume_normalization="loudness")
result = await client.tts.synthesize("This audio will sound equally loud regardless of content.", config=config)

word_timestamps — Get timing for each word (batch only)

config = TTSConfig(model="zero-indic", voice="Varun", word_timestamps=True)
result = await client.tts.synthesize("Hello world, how are you?", config=config)

for wt in result.word_timestamps:
    print(f"  '{wt.word}' — {wt.start:.2f}s to {wt.end:.2f}s")

# Output:
#   'Hello' — 0.00s to 0.32s
#   'world,' — 0.32s to 0.68s
#   'how' — 0.72s to 0.88s
#   'are' — 0.88s to 1.02s
#   'you?' — 1.02s to 1.28s

background_audio + background_volume — Add background music

import base64

# Using a preset name
config = TTSConfig(
    model="zero-indic",
    voice="Nisha",
    background_audio="cafe-ambience",
    background_volume=0.15,  # 15% volume relative to speech
)
result = await client.tts.synthesize("Welcome to our podcast.", config=config)

# Using custom audio (base64-encoded)
with open("background.mp3", "rb") as f:
    bg_b64 = base64.b64encode(f.read()).decode()

config = TTSConfig(
    model="zero-indic",
    voice="Nisha",
    background_audio=bg_b64,
    background_volume=0.1,  # 10% volume (subtle background)
)
result = await client.tts.synthesize("Welcome to our podcast.", config=config)
result.save("podcast_intro.mp3")

Available Speakers

Each speaker has a native language listed below, but every speaker can speak in any language — the native language only indicates the speaker's voice characteristics and accent.

Language	Male Speaker	Female Speaker
Assamese	`Bimal`	`Anjana`
Bengali	`Arjun`	`Priyanka`
Bodo	`Daimalu`	`Hasina`
Dogri	`Vishal`	`Neelam`
English	`Varun`	`Nisha`
Gujarati	`Rakesh`	`Pooja`
Hindi	`Rajesh`	`Sunita`
Kannada	`Kiran`	`Shreya`
Kashmiri	`Farooq`	`Habba`
Konkani	`Mohan`	`Sarita`
Maithili	`Suresh`	`Meera`
Malayalam	`Krishnan`	`Deepa`
Manipuri	`Tomba`	`Ibemhal`
Marathi	`Siddharth`	`Ananya`
Nepali	`Bikash`	`Sapana`
Odia	`Bijay`	`Sujata`
Punjabi	`Gurpreet`	`Simran`
Sanskrit	`Vedant`	`Gayatri`
Santali	`Chandu`	`Roshni`
Sindhi	`Amjad`	`Kavita`
Tamil	`Murugan`	`Thangam`
Telugu	`Vishnu`	`Lakshmi`
Urdu	`Salman`	`Fatima`

23 languages, 46 speakers (1 male + 1 female per language).

Expression Styles

Control the emotional tone by passing a style tag in the text prefix (e.g. "Rajesh: <Happy> Hello!").

Style Tag	Description
`<Happy>`	Joyful, upbeat tone
`<Sad>`	Somber, melancholic tone
`<Angry>`	Forceful, intense tone
`<Fearful>`	Anxious, trembling tone
`<Surprised>`	Exclamatory, astonished tone
`<Disgust>`	Repulsed, disapproving tone
`<News>`	Formal news-anchor style
`<Conversational>`	Casual, everyday speech
`<Narrative>`	Storytelling / audiobook style
`<Enthusiastic>`	Energetic, passionate tone
`<Neutral>`	Clean read-speech (default, no tag needed)

Expression style examples:

# Happy greeting
config = TTSConfig(model="zero-indic", voice="Rajesh")
result = await client.tts.synthesize("Rajesh: <Happy> Welcome aboard! We're thrilled to have you.", config=config)

# News anchor reading
config = TTSConfig(model="zero-indic", voice="Nisha")
result = await client.tts.synthesize("Nisha: <News> Breaking news: the markets rallied today.", config=config)

# Storytelling
config = TTSConfig(model="zero-indic", voice="Krishnan")
result = await client.tts.synthesize("Krishnan: <Narrative> Once upon a time, in a land far away...", config=config)

# Conversational chatbot
config = TTSConfig(model="zero-indic", voice="Simran")
result = await client.tts.synthesize("Simran: <Conversational> Hey! How's it going?", config=config)

# Neutral (default — no tag needed)
config = TTSConfig(model="zero-indic", voice="Varun")
result = await client.tts.synthesize("Varun: Your account balance is five thousand rupees.", config=config)

Output Formats

Format	Value
PCM (raw)	`"pcm"`
WAV	`"wav"`
MP3	`"mp3"`
OGG Opus	`"ogg_opus"`
FLAC	`"flac"`
mu-law	`"mulaw"`
A-law	`"alaw"`

TTS Methods

Batch (HTTP)

result = await client.tts.synthesize("text", config=TTSConfig(...))
result.save("output.mp3")       # Save to file
result.audio_data               # Raw bytes
result.duration_seconds          # Audio duration
result.sample_rate               # Sample rate (Hz)
result.word_timestamps           # List[WordTimestamp] if requested

Streaming (WebSocket)

# Iterate audio chunks
async for audio_bytes in await client.tts.stream("text", config=TTSConfig(...)):
    play(audio_bytes)

# With chunk metadata
async for chunk_meta, audio_bytes in await client.tts.stream("text", config=TTSConfig(...), detailed=True):
    print(chunk_meta.chunk_index, len(audio_bytes))

# Collect all and return combined bytes
audio = await client.tts.synthesize_stream("text", config=TTSConfig(...))

# Stream directly to file
await client.tts.stream_to_file("text", "output.pcm", config=TTSConfig(...))

`TTSResult`

Returned by synthesize().

Attribute	Type	Description
`request_id`	`str`	Unique request identifier.
`audio_data`	`bytes`	Decoded audio bytes.
`sample_rate`	`int`	Audio sample rate in Hz.
`duration_seconds`	`float`	Total audio duration.
`format`	`str`	Audio format string.
`word_timestamps`	`list[WordTimestamp]`	Word-level timestamps (if requested).

ASR API

`TranscriptionConfig`

Configuration for batch transcription. Passed as config= to transcribe().

Parameter	Type	Default	Description
`model`	`str`	required	Model name (e.g. `"zero-indic"`).
`language_code`	`str`	`"auto"`	Language code or `"auto"` for auto-detection.
`task`	`str`	`"transcribe"`	Task type (`"transcribe"`).
`output_script`	`str`	`"auto"`	Output script (`"auto"`, `"latin"`, `"native"`).
`enable_diarization`	`bool`	`False`	Enable speaker diarization.

NLP Features:

Parameter	Type	Default	Description
`enable_intent_detection`	`bool`	`False`	Detect intent from transcript.
`intent_choices`	`list[str]`	`None`	Constrain intent to specific choices.
`enable_summarization`	`bool`	`False`	Generate transcript summary.
`summary_max_length`	`int`	`150`	Maximum summary length.
`enable_sentiment_analysis`	`bool`	`False`	Analyze sentiment.
`enable_emotion_diarization`	`bool`	`False`	Detect emotions per segment.

Post-processing:

Parameter	Type	Default	Description
`enable_profanity_hashing`	`bool`	`False`	Hash profane words.
`hash_keywords`	`list[str]`	`None`	Custom keywords to hash.
`enable_keyterm_normalization`	`bool`	`False`	Normalize key terms.
`enable_translation`	`bool`	`False`	Translate transcript.
`target_language`	`str`	`None`	Target language for translation.
`enable_transliteration`	`bool`	`False`	Transliterate transcript.
`project`	`str`	`None`	Project name for tracking.

ASR Parameter Examples

model + language_code — Basic transcription

# Auto-detect language (default)
config = TranscriptionConfig(model="zero-indic", language_code="auto")
result = await client.asr.transcribe("audio.wav", config=config)
print(result.text)
print(f"Detected: {result.detected_language}")
# Output:
#   "Hello, how are you doing today?"
#   Detected: en

# Specify language for better accuracy
config = TranscriptionConfig(model="zero-indic", language_code="hi")
result = await client.asr.transcribe("hindi_audio.wav", config=config)
print(result.text)
# Output: "नमस्ते, आप कैसे हैं?"

output_script — Control output script

Values: "auto", "latin", "native"

# Native script (default for auto)
config = TranscriptionConfig(model="zero-indic", language_code="hi", output_script="native")
result = await client.asr.transcribe("hindi_audio.wav", config=config)
print(result.text)
# Output: "नमस्ते, आप कैसे हैं?"

# Latin/Roman script — transliterated output
config = TranscriptionConfig(model="zero-indic", language_code="hi", output_script="latin")
result = await client.asr.transcribe("hindi_audio.wav", config=config)
print(result.text)
# Output: "namaste, aap kaise hain?"

# Auto — server decides based on language
config = TranscriptionConfig(model="zero-indic", output_script="auto")

enable_diarization — Speaker identification

config = TranscriptionConfig(model="zero-indic", enable_diarization=True)
result = await client.asr.transcribe("meeting.wav", config=config)
for seg in result.segments:
    print(f"  [{seg.start:.1f}s - {seg.end:.1f}s] {seg.text}")
# Output:
#   [0.0s - 3.2s] [Speaker 1] Good morning, let's begin the meeting.
#   [3.5s - 6.8s] [Speaker 2] Sure, I have the report ready.
#   [7.0s - 10.1s] [Speaker 1] Great, please go ahead.

enable_intent_detection + intent_choices — Detect user intent

# Open intent detection
config = TranscriptionConfig(
    model="zero-indic",
    enable_intent_detection=True,
)
result = await client.asr.transcribe("customer_call.wav", config=config)
print(result.text)
print(result.nlp_analysis.intent)
# Output:
#   "I want to cancel my subscription"
#   {"intent": "cancellation", "confidence": 0.94}

# Constrained intent — pick from specific choices
config = TranscriptionConfig(
    model="zero-indic",
    enable_intent_detection=True,
    intent_choices=["booking", "cancellation", "complaint", "inquiry"],
)
result = await client.asr.transcribe("customer_call.wav", config=config)
print(result.nlp_analysis.intent)
# Output: {"intent": "cancellation", "confidence": 0.97}

enable_summarization + summary_max_length — Summarize transcript

config = TranscriptionConfig(
    model="zero-indic",
    enable_summarization=True,
    summary_max_length=100,  # max 100 characters
)
result = await client.asr.transcribe("meeting_recording.wav", config=config)
print(f"Full transcript: {result.text[:80]}...")
print(f"Summary: {result.nlp_analysis.summary}")
# Output:
#   Full transcript: Good morning everyone. Today we'll review Q3 results. Revenue grew by...
#   Summary: Q3 review meeting covering revenue growth, cost optimization, and next quarter targets.

enable_sentiment_analysis — Detect sentiment

config = TranscriptionConfig(
    model="zero-indic",
    enable_sentiment_analysis=True,
)
result = await client.asr.transcribe("feedback.wav", config=config)
print(result.text)
print(result.nlp_analysis.sentiment)
# Output:
#   "The product is amazing, I absolutely love it!"
#   {"label": "positive", "score": 0.96}

enable_emotion_diarization — Detect emotions per segment

config = TranscriptionConfig(
    model="zero-indic",
    enable_emotion_diarization=True,
)
result = await client.asr.transcribe("conversation.wav", config=config)
print(result.nlp_analysis.emotion)
# Output:
#   {"segments": [
#     {"start": 0.0, "end": 3.2, "emotion": "neutral", "text": "Hello, how can I help?"},
#     {"start": 3.5, "end": 7.1, "emotion": "angry", "text": "I've been waiting for an hour!"},
#     {"start": 7.4, "end": 10.0, "emotion": "empathetic", "text": "I'm sorry about that."}
#   ]}

enable_profanity_hashing + hash_keywords — Redact sensitive words

# Hash common profanity
config = TranscriptionConfig(
    model="zero-indic",
    enable_profanity_hashing=True,
)
result = await client.asr.transcribe("audio.wav", config=config)
print(result.text)
# Output: "What the #### is going on?"

# Hash custom keywords (e.g., names, account numbers)
config = TranscriptionConfig(
    model="zero-indic",
    enable_profanity_hashing=True,
    hash_keywords=["John", "Acme Corp", "Project Alpha"],
)
result = await client.asr.transcribe("meeting.wav", config=config)
print(result.text)
# Output: "#### from ######### said ############# is on track."

enable_translation + target_language — Translate transcript

config = TranscriptionConfig(
    model="zero-indic",
    language_code="hi",
    enable_translation=True,
    target_language="en",
)
result = await client.asr.transcribe("hindi_audio.wav", config=config)
print(f"Original: {result.text}")
print(f"Translation: {result.nlp_analysis.translation}")
# Output:
#   Original: नमस्ते, आज मौसम बहुत अच्छा है।
#   Translation: {"text": "Hello, the weather is very nice today.", "target_language": "en"}

enable_transliteration — Transliterate to Latin script

config = TranscriptionConfig(
    model="zero-indic",
    language_code="hi",
    enable_transliteration=True,
)
result = await client.asr.transcribe("hindi_audio.wav", config=config)
print(f"Native: {result.text}")
print(f"Transliterated: {result.nlp_analysis.transliteration}")
# Output:
#   Native: नमस्ते, आज मौसम बहुत अच्छा है।
#   Transliterated: {"text": "namaste, aaj mausam bahut achha hai."}

enable_keyterm_normalization — Normalize domain terms

config = TranscriptionConfig(
    model="zero-indic",
    enable_keyterm_normalization=True,
)
result = await client.asr.transcribe("tech_audio.wav", config=config)
print(result.text)
# Output: "The API returns JSON over HTTPS." (normalized from "A P I", "jay son", "H T T P S")

project — Tag requests for tracking

config = TranscriptionConfig(
    model="zero-indic",
    project="customer-support-q1",
)
result = await client.asr.transcribe("call.wav", config=config)
# Request is tagged with project name for usage tracking and analytics

ASR Methods

Batch (HTTP)

# From file path
result = await client.asr.transcribe("audio.wav", config=TranscriptionConfig(model="zero-indic"))

# From file object
with open("audio.wav", "rb") as f:
    result = await client.asr.transcribe_file(f, config=TranscriptionConfig(model="zero-indic"))

# From URL
result = await client.asr.transcribe_url("https://example.com/audio.wav", config=TranscriptionConfig(model="zero-indic"))

Streaming (WebSocket)

conn = await client.asr.stream(config=StreamingConfig(language="en"))

`StreamingConfig`

Configuration for the WebSocket streaming session.

Parameter	Type	Default	Description
`language`	`str`	`"auto"`	Language code or `"auto"`.
`sample_rate`	`int`	`16000`	Audio sample rate in Hz.
`dtype`	`str`	`"int16"`	Audio data type (`"int16"`, `"float32"`).
`chunk_size_sec`	`float`	`1.0`	Processing chunk size in seconds.
`silence_threshold_sec`	`float`	`0.5`	Silence duration to trigger segmentation.

Streaming Parameter Examples

language — Set recognition language

# Auto-detect (default)
conn = await client.asr.stream(config=StreamingConfig(language="auto"))

# Specific language for better accuracy
conn = await client.asr.stream(config=StreamingConfig(language="en"))
conn = await client.asr.stream(config=StreamingConfig(language="hi"))
conn = await client.asr.stream(config=StreamingConfig(language="ta"))

sample_rate + dtype — Match your audio source

# Standard microphone input: 16kHz, 16-bit integer (default)
conn = await client.asr.stream(config=StreamingConfig(
    sample_rate=16000,
    dtype="int16",
))

# High-quality audio: 48kHz, 32-bit float
conn = await client.asr.stream(config=StreamingConfig(
    sample_rate=48000,
    dtype="float32",
))

chunk_size_sec — Processing window size

# Smaller chunks = lower latency, more partial results
conn = await client.asr.stream(config=StreamingConfig(chunk_size_sec=0.5))

# Larger chunks = more context, potentially better accuracy
conn = await client.asr.stream(config=StreamingConfig(chunk_size_sec=2.0))

silence_threshold_sec — Control segment boundaries

# Quick segmentation — short pauses trigger a new segment
conn = await client.asr.stream(config=StreamingConfig(silence_threshold_sec=0.3))
# Good for: fast-paced dialogue, command recognition

# Patient segmentation — only split on longer pauses
conn = await client.asr.stream(config=StreamingConfig(silence_threshold_sec=1.5))
# Good for: lectures, monologues, dictation

Streaming Events

conn = await client.asr.stream(config=StreamingConfig(language="en"))

@conn.on(StreamingMessageType.PARTIAL)
def on_partial(msg):
    print(f"Interim: {msg.text}")

@conn.on(StreamingMessageType.FINAL_SEGMENT)
def on_segment(msg):
    print(f"Segment: {msg.text}")

@conn.on(StreamingMessageType.FINAL)
def on_final(msg):
    print(f"Final: {msg.text} ({msg.audio_duration_sec}s)")

@conn.on(StreamingMessageType.DONE)
def on_done(msg):
    print(f"Done. {msg.total_segments} segments, {msg.total_audio_duration_sec}s")

@conn.on(StreamingMessageType.ERROR)
def on_error(msg):
    print(f"Error: {msg.message}")

Event	Model	Key Attributes
`PARTIAL`	`StreamingPartial`	`text`, `language`, `segment_id`, `latency_ms`
`FINAL_SEGMENT`	`StreamingFinalSegment`	`text`, `language`, `segment_id`, `silence_duration_ms`
`FINAL`	`StreamingFinal`	`text`, `language`, `audio_duration_sec`, `inference_time_ms`
`DONE`	`StreamingDone`	`total_segments`, `total_audio_duration_sec`
`ERROR`	`StreamingError`	`message`, `code`

Streaming Connection Methods

await conn.send_audio(pcm_bytes)   # Send raw PCM audio
await conn.end()                    # Signal end of audio stream
await conn.close()                  # Close WebSocket connection
conn.is_closed                      # Check connection status
conn.session_id                     # Server-assigned session ID

`TranscriptionResult`

Returned by transcribe().

Attribute	Type	Description
`success`	`bool`	Whether transcription succeeded.
`request_id`	`str`	Unique request identifier.
`text`	`str`	Full transcription text.
`segments`	`list[SegmentResult]`	Time-aligned segments (`start`, `end`, `text`).
`detected_language`	`str`	Detected language code.
`audio_duration`	`float`	Audio duration in seconds.
`inference_time_ms`	`float`	Server inference time in ms.
`nlp_analysis`	`NLPAnalysis`	NLP results (if any `enable_*` flags were set).

Exceptions

All exceptions inherit from ShunyalabsError.

Exception	Description
`AuthenticationError`	Invalid or missing API key (401).
`PermissionDeniedError`	Insufficient permissions (403).
`NotFoundError`	Resource not found (404).
`RateLimitError`	Rate limit exceeded (429).
`ServerError`	Server-side error (5xx).
`TimeoutError`	Request timed out.
`ConnectionError`	Network connectivity issue.
`TranscriptionError`	ASR-specific error.
`SynthesisError`	TTS-specific error.

Framework Plugins

Framework	Package	Install
LiveKit Agents	`livekit-plugins-shunyalabs`	`pip install livekit-plugins-shunyalabs`
Pipecat	`pipecat-shunyalabs`	`pip install pipecat-shunyalabs`

Development

git clone https://github.com/Shunyalabsai/shunyalabs-python-sdk.git
cd shunyalabs-python-sdk

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/
black --check src/
mypy src/

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.0.3

Apr 13, 2026

3.0.2

Mar 19, 2026

3.0.1

Mar 16, 2026

3.0.0

Mar 12, 2026

2.0.2rc2 pre-release

Mar 12, 2026

2.0.2rc1 pre-release

Mar 12, 2026

2.0.1rc1 pre-release

Mar 12, 2026

This version

2.0.0

Mar 8, 2026

1.0.2

Nov 7, 2025

1.0.1

Nov 7, 2025

1.0.0

Nov 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shunyalabs-2.0.0.tar.gz (50.3 kB view details)

Uploaded Mar 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shunyalabs-2.0.0-py3-none-any.whl (49.7 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file shunyalabs-2.0.0.tar.gz.

File metadata

Download URL: shunyalabs-2.0.0.tar.gz
Upload date: Mar 8, 2026
Size: 50.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for shunyalabs-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`3bfe9ec73b501cd5c429209c4aea5b9d689289288b700e4bffd9288fe11c9537`
MD5	`135f99376ad52143f59ba902bfcf6b04`
BLAKE2b-256	`f8a6bb07efbbea16c557ac6bd26d8582915a76cbaa47c9edf7f7ba20f293a607`

See more details on using hashes here.

File details

Details for the file shunyalabs-2.0.0-py3-none-any.whl.

File metadata

Download URL: shunyalabs-2.0.0-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 49.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for shunyalabs-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eb2b1a4d4fbf5671eb33f72744e013081cf71e4717ec7aaaeeb2609f0883602c`
MD5	`a51af7fa35cc74a09663b502d85db522`
BLAKE2b-256	`ad8118586d319679a34322e18527ec9ae7c50f462238582d4091aa28278ab379`

See more details on using hashes here.

shunyalabs 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Shunyalabs Python SDK

Installation

Authentication

Quick Start

TTS — Batch (HTTP)

TTS — Streaming (WebSocket)

ASR — Batch (HTTP)

ASR — Streaming (WebSocket)

API Reference

Client Configuration

AsyncShunyaClient

TTS API

TTSConfig

TTS Parameter Examples

Available Speakers

Expression Styles

Output Formats

TTS Methods

TTSResult

ASR API

TranscriptionConfig

ASR Parameter Examples

ASR Methods

StreamingConfig

Streaming Parameter Examples

Streaming Events

Streaming Connection Methods

TranscriptionResult

Exceptions

Framework Plugins

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`AsyncShunyaClient`

`TTSConfig`

`TTSResult`

`TranscriptionConfig`

`StreamingConfig`

`TranscriptionResult`