Shunyalabs ASR & TTS services for Pipecat
Project description
pipecat-shunyalabs
Shunyalabs STT and TTS services for Pipecat.
Provides ShunyalabsSTTService and ShunyalabsTTSService that integrate with Pipecat's pipeline framework, backed by the Shunyalabs Python SDK.
Key capabilities:
- Real-time streaming ASR with interim and final transcription frames
- High-fidelity voice synthesis with 46 speakers across 23 languages
- 11 emotion/delivery style tags for expressive voice responses
- Native Pipecat frame protocol — drop-in with any Pipecat pipeline
- Persistent WebSocket for STT; per-request WebSocket for TTS
- Output formats: PCM, WAV, MP3, OGG Opus, FLAC, mu-law, A-law
Installation
Requirements: Python 3.8+, Pipecat framework, a valid Shunyalabs API key.
pip install pipecat-shunyalabs
Install with a transport:
# Daily WebRTC transport
pip install pipecat-shunyalabs pipecat-ai[daily]
Authentication
Set your API key as an environment variable (recommended):
export SHUNYALABS_API_KEY="your-api-key"
Or pass it directly:
stt = ShunyalabsSTTService(api_key="your-api-key")
tts = ShunyalabsTTSService(api_key="your-api-key")
Security: Never commit API keys to source control. Use a secrets manager (GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault) in production.
Quick Start
import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService
async def main():
transport = LocalAudioTransport()
stt = ShunyalabsSTTService(
api_key=os.environ["SHUNYALABS_API_KEY"],
language="en",
)
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
model="gpt-4o",
)
tts = ShunyalabsTTSService(
api_key=os.environ["SHUNYALABS_API_KEY"],
voice="Rajesh",
language="en",
style="<Conversational>",
)
pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
await PipelineRunner().run(task)
if __name__ == "__main__":
asyncio.run(main())
STT — ShunyalabsSTTService
Real-time streaming speech-to-text over WebSocket. Maintains a persistent connection for the lifetime of the pipeline. Supports 23 Indian and international languages with automatic language detection.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
None |
API key. Falls back to SHUNYALABS_API_KEY env var. |
language |
str |
"auto" |
Language code (e.g. "en", "hi") or "auto" for auto-detection. |
url |
str |
wss://asr.shunyalabs.ai/ws |
WebSocket endpoint URL. |
sample_rate |
int |
16000 |
Expected audio sample rate in Hz. Must match transport input. |
How It Works
- On pipeline
start, opens a WebSocket connection to the Shunyalabs ASR gateway. - Audio chunks from the pipeline input are forwarded via
send_audio(). - The gateway's built-in VAD detects speech boundaries and emits transcription events.
- Events are mapped to Pipecat frames and pushed into the pipeline.
Frame Mapping
| Shunyalabs Event | Pipecat Frame |
|---|---|
PARTIAL |
InterimTranscriptionFrame — emitted continuously as speech is recognized |
FINAL_SEGMENT |
TranscriptionFrame — emitted at speech segment boundary |
FINAL |
TranscriptionFrame — emitted when full utterance is finalized |
Example
from pipecat_shunyalabs import ShunyalabsSTTService
stt = ShunyalabsSTTService(
language="hi", # Hindi; or 'auto' for detection
sample_rate=16000,
)
Auto-Reconnect
If the WebSocket connection drops during audio streaming, the service automatically reconnects and resumes sending audio.
TTS — ShunyalabsTTSService
Streaming text-to-speech over WebSocket. Each synthesis request opens a new connection, streams audio chunks back as TTSAudioRawFrame frames. Supports 46 speakers across 23 languages — any speaker can synthesize in any language.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
None |
API key. Falls back to SHUNYALABS_API_KEY env var. |
url |
str |
wss://tts.shunyalabs.ai/ws |
WebSocket endpoint URL. |
model |
str |
"zero-indic" |
TTS model identifier. |
voice |
str |
"Rajesh" |
Speaker voice. See Available Speakers. |
speaker |
str |
"Rajesh" |
Speaker identifier (typically same as voice). |
style |
str |
"<Neutral>" |
Emotion/delivery style tag. See Style Tags. |
language |
str |
"en" |
Output language code (e.g. "en", "hi", "ta"). |
output_format |
str |
"pcm" |
Audio encoding. See Output Formats. |
speed |
float |
1.0 |
Speaking speed multiplier (0.25–4.0). |
Output Formats
| Format | Value | Recommended Use |
|---|---|---|
| PCM (raw 16-bit) | pcm |
Real-time pipelines, Pipecat TTSAudioRawFrame |
| WAV | wav |
Uncompressed storage, offline processing |
| MP3 | mp3 |
Compressed storage, web delivery |
| OGG Opus | ogg_opus |
Compressed web streaming |
| FLAC | flac |
Lossless compressed storage |
| mu-law | mulaw |
Telephony systems (G.711) |
| A-law | alaw |
Telephony systems (G.711 European) |
Style Tags
| Tag | Description |
|---|---|
<Neutral> |
Clean read-speech — default |
<Happy> |
Joyful, upbeat tone |
<Sad> |
Somber, melancholic tone |
<Angry> |
Forceful, intense tone |
<Fearful> |
Anxious, trembling tone |
<Surprised> |
Exclamatory, astonished tone |
<Disgust> |
Repulsed, disapproving tone |
<News> |
Formal news-anchor style |
<Conversational> |
Casual, everyday speech — recommended for voice agents |
<Narrative> |
Storytelling / audiobook delivery style |
<Enthusiastic> |
Energetic, passionate tone |
Text Formatting
The service automatically formats text as "<Style> text" before sending to the API:
tts = ShunyalabsTTSService(speaker="Rajesh", style="<Happy>")
# Input: "Welcome!"
# Sent: "<Happy> Welcome!"
Available Speakers
46 speakers across 23 languages (1 male + 1 female per language). Every speaker can synthesize in any language.
| Language | Male | Female |
|---|---|---|
| English | Varun | Nisha |
| Hindi | Rajesh (default) | Sunita |
| Bengali | Arjun | Priyanka |
| Tamil | Murugan | Thangam |
| Telugu | Vishnu | Lakshmi |
| Kannada | Kiran | Shreya |
| Malayalam | Krishnan | Deepa |
| Marathi | Siddharth | Ananya |
| Gujarati | Rakesh | Pooja |
| Punjabi | Gurpreet | Simran |
| Urdu | Salman | Fatima |
| Odia | Bijay | Sujata |
| Assamese | Bimal | Anjana |
| Maithili | Suresh | Meera |
| Nepali | Bikash | Sapana |
| Sanskrit | Vedant | Gayatri |
| Kashmiri | Farooq | Habba |
| Konkani | Mohan | Sarita |
| Dogri | Vishal | Neelam |
| Sindhi | Amjad | Kavita |
| Manipuri | Tomba | Ibemhal |
| Santali | Chandu | Roshni |
| Bodo | Daimalu | Hasina |
Frame Output
| Frame | Description |
|---|---|
TTSStartedFrame |
Emitted when synthesis begins. |
TTSAudioRawFrame |
Emitted for each audio chunk (PCM, 16kHz, mono). |
TTSStoppedFrame |
Emitted when synthesis completes. |
Example
from pipecat_shunyalabs import ShunyalabsTTSService
tts = ShunyalabsTTSService(
model="zero-indic",
voice="Nisha",
speaker="Nisha",
style="<Enthusiastic>",
language="en",
speed=1.1,
output_format="pcm",
)
Full Pipeline Example
A complete voice agent using Shunyalabs STT and TTS with OpenAI LLM on the Daily WebRTC transport:
import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext, OpenAILLMContextAggregator,
)
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService
async def run_voice_agent(room_url: str, token: str):
transport = DailyTransport(
room_url, token, "Shunyalabs Agent",
DailyParams(audio_out_enabled=True, transcription_enabled=False),
)
stt = ShunyalabsSTTService(
api_key=os.environ["SHUNYALABS_API_KEY"],
language="auto",
sample_rate=16000,
)
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
model="gpt-4o",
)
messages = [{
"role": "system",
"content": (
"You are a helpful voice assistant powered by Shunyalabs. "
"Keep responses concise and natural for voice delivery."
),
}]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
tts = ShunyalabsTTSService(
api_key=os.environ["SHUNYALABS_API_KEY"],
voice="Rajesh",
language="hi",
style="<Conversational>",
)
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
])
task = PipelineTask(
pipeline,
PipelineParams(allow_interruptions=True, enable_metrics=True),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frames([context_aggregator.user().get_context_frame()])
await PipelineRunner().run(task)
if __name__ == "__main__":
asyncio.run(run_voice_agent(
room_url=os.environ["DAILY_ROOM_URL"],
token=os.environ["DAILY_TOKEN"],
))
Multilingual Example
# Hindi conversational bot
tts = ShunyalabsTTSService(
voice="Rajesh",
language="hi",
style="<Conversational>",
)
# English news-style bot
tts = ShunyalabsTTSService(
voice="Varun",
language="en",
style="<News>",
)
Error Reference
All Shunyalabs SDK exceptions inherit from ShunyalabsError.
| Exception | HTTP Code | Description |
|---|---|---|
AuthenticationError |
401 | Invalid or missing API key. |
PermissionDeniedError |
403 | API key lacks permission for the resource. |
NotFoundError |
404 | Requested resource not found. |
RateLimitError |
429 | Rate limit exceeded. Implement exponential backoff. |
ServerError |
5xx | Server-side error. Retried automatically. |
TimeoutError |
— | Request exceeded timeout (default 60s). |
ConnectionError |
— | Network connectivity issue. |
TranscriptionError |
— | ASR-specific failure (e.g. unsupported audio format). |
SynthesisError |
— | TTS-specific failure (e.g. invalid voice parameter). |
from shunyalabs.exceptions import AuthenticationError, RateLimitError, ShunyalabsError
try:
result = await client.tts.synthesize(text, config=config)
except AuthenticationError:
print("Invalid API key — check SHUNYALABS_API_KEY")
except RateLimitError as e:
print(f"Rate limited — retry after {e.retry_after}s")
except ShunyalabsError as e:
print(f"Unexpected error: {e}")
Troubleshooting
| Symptom | Resolution |
|---|---|
AuthenticationError on startup |
Verify SHUNYALABS_API_KEY is set and valid. |
| WebSocket connection refused | Ensure outbound WSS (port 443) is open to asr.shunyalabs.ai and tts.shunyalabs.ai. |
| No transcription output | Check sample_rate matches your transport input. Verify audio source is active. |
| TTS audio silent or missing | Ensure output_format=pcm matches transport output. Verify TTSStartedFrame is received. |
| High latency on first TTS chunk | Deploy closer to the Shunyalabs gateway region (asia-south1). |
RateLimitError |
Implement exponential backoff. Check e.retry_after. |
ImportError: pipecat_shunyalabs |
Run pip install pipecat-shunyalabs. Confirm virtual environment is activated. |
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pipecat_shunyalabs-1.0.3.tar.gz.
File metadata
- Download URL: pipecat_shunyalabs-1.0.3.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b52e154e5d3803afee9d714f815f06b1a166d4cb2b334cfd746e73002259640
|
|
| MD5 |
4b36bd3f74400443538d462f6483779c
|
|
| BLAKE2b-256 |
363794293df4ca0da08741f4ec50373fcf07be48029d850aa5ad807457e7c57f
|
File details
Details for the file pipecat_shunyalabs-1.0.3-py3-none-any.whl.
File metadata
- Download URL: pipecat_shunyalabs-1.0.3-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5202ee13e659f8abd48de8a505a385e55072ccd24f72fb34d457de3891449d35
|
|
| MD5 |
d143ef558cdb34ec5970531faa1c0483
|
|
| BLAKE2b-256 |
8d7a516576e9f9df22d62963c2ccfa3e68abb0d0414d46d3e0f62a5b2a417084
|