Shunyalabs ASR & TTS services for Pipecat
Project description
pipecat-shunyalabs
Shunyalabs STT and TTS services for Pipecat.
Provides ShunyalabsSTTService and ShunyalabsTTSService that integrate with Pipecat's pipeline framework, backed by the Shunyalabs Python SDK.
Key capabilities:
- Real-time streaming ASR with interim and final transcription frames
- High-fidelity voice synthesis with 46 speakers across 23 languages
- 11 emotion/delivery style tags for expressive voice responses
- Native Pipecat frame protocol — drop-in with any Pipecat pipeline
- Persistent WebSocket for STT; per-request WebSocket for TTS
- Output formats: PCM, WAV, MP3, OGG Opus, FLAC, mu-law, A-law
⚠️ Upgrading from 1.0.0 → 1.0.1
If you're already using pipecat-shunyalabs, read this first. There are two breaking changes that affect TTS:
1. language is now required
Old (1.0.0) silently accepted missing language; the gateway now returns HTTP 422 if it's missing. Always pass an ISO 639 code:
tts = ShunyalabsTTSService(
voice="Rajesh",
+ language="en", # required — pass "en", "hi", "ta", etc.
)
2. speaker parameter removed; _format_text no longer prepends speaker name
Old behaviour produced text like "Rajesh: <Neutral> Hello" — but the gateway already prepends the speaker name server-side, which caused "Rajesh: Rajesh: <Neutral> Hello" in the LLM prompt and resulted in muddied output. Fixed in 1.0.1.
tts = ShunyalabsTTSService(
voice="Rajesh",
- speaker="Rajesh", # remove — was a duplicate of `voice`
- style="<Neutral>", # optional now — gateway defaults to <Conversational>
+ style="<Happy>", # only set this if you want a non-default style
language="en",
)
3. style is now optional
If you don't pass style, the gateway automatically applies <Conversational>. You only need to set style when you want a specific emotion (e.g. <Happy>, <Sad>, <News>).
Quick install upgrade
pip install --upgrade pipecat-shunyalabs
That's it for migrations. Everything else works as before.
Installation (New Users)
Requirements: Python 3.9+, Pipecat framework, a valid Shunyalabs API key.
pip install pipecat-shunyalabs
Install with a transport:
# Daily WebRTC transport
pip install pipecat-shunyalabs pipecat-ai[daily]
Authentication
Set your API key as an environment variable (recommended):
export SHUNYALABS_API_KEY="your-api-key"
Or pass it directly:
stt = ShunyalabsSTTService(api_key="your-api-key")
tts = ShunyalabsTTSService(api_key="your-api-key")
Security: Never commit API keys to source control. Use a secrets manager (GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault) in production.
Quick Start
import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService
async def main():
transport = LocalAudioTransport()
stt = ShunyalabsSTTService(
api_key=os.environ["SHUNYALABS_API_KEY"],
language="en",
)
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
model="gpt-4o",
)
tts = ShunyalabsTTSService(
api_key=os.environ["SHUNYALABS_API_KEY"],
voice="Rajesh",
language="en",
# style is optional — defaults to <Conversational>
)
pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
await PipelineRunner().run(task)
if __name__ == "__main__":
asyncio.run(main())
STT — ShunyalabsSTTService
Real-time streaming speech-to-text over WebSocket. Maintains a persistent connection for the lifetime of the pipeline. Supports 23 Indian and international languages with automatic language detection.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
None |
API key. Falls back to SHUNYALABS_API_KEY env var. |
language |
str |
"auto" |
Language code (e.g. "en", "hi") or "auto" for auto-detection. |
url |
str |
wss://asr.shunyalabs.ai/ws |
WebSocket endpoint URL. |
sample_rate |
int |
16000 |
Expected audio sample rate in Hz. Must match transport input. |
How It Works
- On pipeline
start, opens a WebSocket connection to the Shunyalabs ASR gateway. - Audio chunks from the pipeline input are forwarded via
send_audio(). - The gateway's built-in VAD detects speech boundaries and emits transcription events.
- Events are mapped to Pipecat frames and pushed into the pipeline.
Frame Mapping
| Shunyalabs Event | Pipecat Frame |
|---|---|
PARTIAL |
InterimTranscriptionFrame — emitted continuously as speech is recognized |
FINAL_SEGMENT |
TranscriptionFrame — emitted at speech segment boundary |
FINAL |
TranscriptionFrame — emitted when full utterance is finalized |
Auto-Reconnect
If the WebSocket connection drops during audio streaming, the service automatically reconnects and resumes sending audio.
TTS — ShunyalabsTTSService
Streaming text-to-speech over WebSocket. Each synthesis request opens a new connection, streams audio chunks back as TTSAudioRawFrame frames. Supports 46 speakers across 23 languages — any speaker can synthesize in any language.
Parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
api_key |
str |
env SHUNYALABS_API_KEY |
✓ | API key. Pass directly or via env var. |
voice |
str |
"Rajesh" |
✓ | Speaker voice. See Available Speakers. |
language |
str |
"en" |
✓ | ISO 639 language code ("en", "hi", "ta", etc.). Now required by gateway. |
model |
str |
"zero-indic" |
TTS model identifier. | |
style |
str |
None |
Emotion/delivery style tag. If omitted, gateway uses <Conversational>. |
|
url |
str |
wss://tts.shunyalabs.ai/ws |
WebSocket endpoint URL. | |
output_format |
str |
"pcm" |
Audio encoding. See Output Formats. | |
speed |
float |
1.0 |
Speaking speed multiplier (0.25–4.0). | |
sample_rate |
int |
16000 |
Output sample rate in Hz. |
Note for upgraders: The
speakerparameter has been removed in 1.0.1 — usevoiceonly. See migration notes.
Output Formats
| Format | Value | Recommended Use |
|---|---|---|
| PCM (raw 16-bit) | pcm |
Real-time pipelines, Pipecat TTSAudioRawFrame |
| WAV | wav |
Uncompressed storage, offline processing |
| MP3 | mp3 |
Compressed storage, web delivery |
| OGG Opus | ogg_opus |
Compressed web streaming |
| FLAC | flac |
Lossless compressed storage |
| mu-law | mulaw |
Telephony systems (G.711) |
| A-law | alaw |
Telephony systems (G.711 European) |
Style Tags
| Tag | Description |
|---|---|
<Conversational> |
Casual, everyday speech — default if style omitted |
<Neutral> |
Clean read-speech |
<Happy> |
Joyful, upbeat tone |
<Sad> |
Somber, melancholic tone |
<Angry> |
Forceful, intense tone |
<Fearful> |
Anxious, trembling tone |
<Surprised> |
Exclamatory, astonished tone |
<Disgust> |
Repulsed, disapproving tone |
<News> |
Formal news-anchor style |
<Narrative> |
Storytelling / audiobook delivery style |
<Enthusiastic> |
Energetic, passionate tone |
Text Formatting
The plugin only prepends the style tag (if you set one). The gateway handles the speaker prefix and default style tag server-side.
tts = ShunyalabsTTSService(voice="Rajesh", style="<Happy>", language="en")
# Plugin sends: "<Happy> Welcome!"
# Gateway expands: "Rajesh: <Happy> Welcome!"
If you omit style:
tts = ShunyalabsTTSService(voice="Rajesh", language="en")
# Plugin sends: "Welcome!"
# Gateway expands: "Rajesh: <Conversational> Welcome!"
Available Speakers
46 speakers across 23 languages (1 male + 1 female per language). Every speaker can synthesize in any language.
| Language | Male | Female |
|---|---|---|
| English | Varun | Nisha |
| Hindi | Rajesh (default) | Sunita |
| Bengali | Arjun | Priyanka |
| Tamil | Murugan | Thangam |
| Telugu | Vishnu | Lakshmi |
| Kannada | Kiran | Shreya |
| Malayalam | Krishnan | Deepa |
| Marathi | Siddharth | Ananya |
| Gujarati | Rakesh | Pooja |
| Punjabi | Gurpreet | Simran |
| Urdu | Salman | Fatima |
| Odia | Bijay | Sujata |
| Assamese | Bimal | Anjana |
| Maithili | Suresh | Meera |
| Nepali | Bikash | Sapana |
| Sanskrit | Vedant | Gayatri |
| Kashmiri | Farooq | Habba |
| Konkani | Mohan | Sarita |
| Dogri | Vishal | Neelam |
| Sindhi | Amjad | Kavita |
| Manipuri | Tomba | Ibemhal |
| Santali | Chandu | Roshni |
| Bodo | Daimalu | Hasina |
Frame Output
| Frame | Description |
|---|---|
TTSStartedFrame |
Emitted when synthesis begins. |
TTSAudioRawFrame |
Emitted for each audio chunk (PCM, 16kHz, mono). |
TTSStoppedFrame |
Emitted when synthesis completes. |
Examples
Default (Conversational) — recommended for voice agents:
tts = ShunyalabsTTSService(
voice="Nisha",
language="en",
)
Custom emotion + speed:
tts = ShunyalabsTTSService(
voice="Nisha",
style="<Enthusiastic>",
language="en",
speed=1.1,
output_format="pcm",
)
Hindi news-style:
tts = ShunyalabsTTSService(
voice="Rajesh",
language="hi",
style="<News>",
)
Full Pipeline Example
A complete voice agent using Shunyalabs STT and TTS with OpenAI LLM on the Daily WebRTC transport:
import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
OpenAILLMContext, OpenAILLMContextAggregator,
)
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService
async def run_voice_agent(room_url: str, token: str):
transport = DailyTransport(
room_url, token, "Shunyalabs Agent",
DailyParams(audio_out_enabled=True, transcription_enabled=False),
)
stt = ShunyalabsSTTService(
api_key=os.environ["SHUNYALABS_API_KEY"],
language="auto",
sample_rate=16000,
)
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
model="gpt-4o",
)
messages = [{
"role": "system",
"content": (
"You are a helpful voice assistant powered by Shunyalabs. "
"Keep responses concise and natural for voice delivery."
),
}]
context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)
tts = ShunyalabsTTSService(
api_key=os.environ["SHUNYALABS_API_KEY"],
voice="Rajesh",
language="hi", # required
)
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
])
task = PipelineTask(
pipeline,
PipelineParams(allow_interruptions=True, enable_metrics=True),
)
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
await task.queue_frames([context_aggregator.user().get_context_frame()])
await PipelineRunner().run(task)
if __name__ == "__main__":
asyncio.run(run_voice_agent(
room_url=os.environ["DAILY_ROOM_URL"],
token=os.environ["DAILY_TOKEN"],
))
Multilingual Examples
# Hindi conversational bot (default style)
tts = ShunyalabsTTSService(voice="Rajesh", language="hi")
# English news-style bot
tts = ShunyalabsTTSService(voice="Varun", language="en", style="<News>")
# Tamil narrative voice
tts = ShunyalabsTTSService(voice="Murugan", language="ta", style="<Narrative>")
Error Reference
All Shunyalabs SDK exceptions inherit from ShunyalabsError.
| Exception | HTTP Code | Description |
|---|---|---|
AuthenticationError |
401 | Invalid or missing API key. |
PermissionDeniedError |
403 | API key lacks permission for the resource. |
NotFoundError |
404 | Requested resource not found. |
ValidationError |
422 | Missing required field (e.g. language). |
RateLimitError |
429 | Rate limit exceeded. Implement exponential backoff. |
ServerError |
5xx | Server-side error. Retried automatically. |
TimeoutError |
— | Request exceeded timeout (default 60s). |
ConnectionError |
— | Network connectivity issue. |
TranscriptionError |
— | ASR-specific failure (e.g. unsupported audio format). |
SynthesisError |
— | TTS-specific failure (e.g. invalid voice parameter). |
from shunyalabs.exceptions import AuthenticationError, RateLimitError, ShunyalabsError
try:
result = await client.tts.synthesize(text, config=config)
except AuthenticationError:
print("Invalid API key — check SHUNYALABS_API_KEY")
except RateLimitError as e:
print(f"Rate limited — retry after {e.retry_after}s")
except ShunyalabsError as e:
print(f"Unexpected error: {e}")
Troubleshooting
| Symptom | Resolution |
|---|---|
| HTTP 422 "language: Field required" | Add language="en" (or another ISO 639 code) to ShunyalabsTTSService(...). |
| TTS audio sounds wrong / muddied (after upgrade) | Remove speaker=... from the constructor — it's no longer needed and was causing a double-prefix bug in 1.0.0. |
AuthenticationError on startup |
Verify SHUNYALABS_API_KEY is set and valid. |
| WebSocket connection refused | Ensure outbound WSS (port 443) is open to asr.shunyalabs.ai and tts.shunyalabs.ai. |
| No transcription output | Check sample_rate matches your transport input. Verify audio source is active. |
| TTS audio silent or missing | Ensure output_format=pcm matches transport output. Verify TTSStartedFrame is received. |
| High latency on first TTS chunk | Deploy closer to the Shunyalabs gateway region (asia-south1). |
RateLimitError |
Implement exponential backoff. Check e.retry_after. |
ImportError: pipecat_shunyalabs |
Run pip install pipecat-shunyalabs. Confirm virtual environment is activated. |
Changelog
1.0.1 (2026-04-11)
- Breaking:
languageis now required by the gateway. Always pass an ISO 639 code. - Breaking: Removed
speakerparameter (was a duplicate ofvoice). - Bug fix:
_format_textno longer prepends the speaker name on top of the gateway's server-side prefix. Old 1.0.0 behaviour produced"Rajesh: Rajesh: <Neutral> ..."in the model prompt and degraded output quality. styleis now optional — defaults to<Conversational>server-side.
1.0.0
- Initial public release.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pipecat_shunyalabs-1.1.0.tar.gz.
File metadata
- Download URL: pipecat_shunyalabs-1.1.0.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5054696ceb9ab0d31ef153b28513291083b8a175ed22b3dd9c6424020ce63c83
|
|
| MD5 |
745f3bd956b726d9fc1af8649d03b5ea
|
|
| BLAKE2b-256 |
3814a20fbf5f3c0ed638360b12cbb8f14e1a829e67be40f7cece1ace6f69c9b8
|
File details
Details for the file pipecat_shunyalabs-1.1.0-py3-none-any.whl.
File metadata
- Download URL: pipecat_shunyalabs-1.1.0-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
077d2c922cd5940517f07b2a075c0e49ee1df1719b4ce5d1009db2723b773c99
|
|
| MD5 |
270f84c75607d163d75b88e6d09fd790
|
|
| BLAKE2b-256 |
9bf8d67d34f2c4875c3e707775d2163969316cc25fcfea7510b059d99c11660c
|