Skip to main content

Shunyalabs ASR & TTS services for Pipecat

Project description

pipecat-shunyalabs

PyPI License: MIT

Shunyalabs STT and TTS services for Pipecat.

Provides ShunyalabsSTTService and ShunyalabsTTSService that integrate with Pipecat's pipeline framework, backed by the Shunyalabs Python SDK.

Key capabilities:

  • Real-time streaming ASR with interim and final transcription frames
  • High-fidelity voice synthesis with 46 speakers across 23 languages
  • 11 emotion/delivery style tags for expressive voice responses
  • Native Pipecat frame protocol — drop-in with any Pipecat pipeline
  • Persistent WebSocket for STT; per-request WebSocket for TTS
  • Output formats: PCM, WAV, MP3, OGG Opus, FLAC, mu-law, A-law

⚠️ Upgrading from 1.0.0 → 1.0.1

If you're already using pipecat-shunyalabs, read this first. There are two breaking changes that affect TTS:

1. language is now required

Old (1.0.0) silently accepted missing language; the gateway now returns HTTP 422 if it's missing. Always pass an ISO 639 code:

  tts = ShunyalabsTTSService(
      voice="Rajesh",
+     language="en",        # required — pass "en", "hi", "ta", etc.
  )

2. speaker parameter removed; _format_text no longer prepends speaker name

Old behaviour produced text like "Rajesh: <Neutral> Hello" — but the gateway already prepends the speaker name server-side, which caused "Rajesh: Rajesh: <Neutral> Hello" in the LLM prompt and resulted in muddied output. Fixed in 1.0.1.

  tts = ShunyalabsTTSService(
      voice="Rajesh",
-     speaker="Rajesh",     # remove — was a duplicate of `voice`
-     style="<Neutral>",    # optional now — gateway defaults to <Conversational>
+     style="<Happy>",      # only set this if you want a non-default style
      language="en",
  )

3. style is now optional

If you don't pass style, the gateway automatically applies <Conversational>. You only need to set style when you want a specific emotion (e.g. <Happy>, <Sad>, <News>).

Quick install upgrade

pip install --upgrade pipecat-shunyalabs

That's it for migrations. Everything else works as before.


Installation (New Users)

Requirements: Python 3.9+, Pipecat framework, a valid Shunyalabs API key.

pip install pipecat-shunyalabs

Install with a transport:

# Daily WebRTC transport
pip install pipecat-shunyalabs pipecat-ai[daily]

Authentication

Set your API key as an environment variable (recommended):

export SHUNYALABS_API_KEY="your-api-key"

Or pass it directly:

stt = ShunyalabsSTTService(api_key="your-api-key")
tts = ShunyalabsTTSService(api_key="your-api-key")

Security: Never commit API keys to source control. Use a secrets manager (GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault) in production.


Quick Start

import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService

async def main():
    transport = LocalAudioTransport()

    stt = ShunyalabsSTTService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        language="en",
    )

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o",
    )

    tts = ShunyalabsTTSService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        voice="Rajesh",
        language="en",
        # style is optional — defaults to <Conversational>
    )

    pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
    task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
    await PipelineRunner().run(task)

if __name__ == "__main__":
    asyncio.run(main())

STT — ShunyalabsSTTService

Real-time streaming speech-to-text over WebSocket. Maintains a persistent connection for the lifetime of the pipeline. Supports 23 Indian and international languages with automatic language detection.

Parameters

Parameter Type Default Description
api_key str None API key. Falls back to SHUNYALABS_API_KEY env var.
language str "auto" Language code (e.g. "en", "hi") or "auto" for auto-detection.
url str wss://asr.shunyalabs.ai/ws WebSocket endpoint URL.
sample_rate int 16000 Expected audio sample rate in Hz. Must match transport input.

How It Works

  1. On pipeline start, opens a WebSocket connection to the Shunyalabs ASR gateway.
  2. Audio chunks from the pipeline input are forwarded via send_audio().
  3. The gateway's built-in VAD detects speech boundaries and emits transcription events.
  4. Events are mapped to Pipecat frames and pushed into the pipeline.

Frame Mapping

Shunyalabs Event Pipecat Frame
PARTIAL InterimTranscriptionFrame — emitted continuously as speech is recognized
FINAL_SEGMENT TranscriptionFrame — emitted at speech segment boundary
FINAL TranscriptionFrame — emitted when full utterance is finalized

Auto-Reconnect

If the WebSocket connection drops during audio streaming, the service automatically reconnects and resumes sending audio.


TTS — ShunyalabsTTSService

Streaming text-to-speech over WebSocket. Each synthesis request opens a new connection, streams audio chunks back as TTSAudioRawFrame frames. Supports 46 speakers across 23 languages — any speaker can synthesize in any language.

Parameters

Parameter Type Default Required Description
api_key str env SHUNYALABS_API_KEY API key. Pass directly or via env var.
voice str "Rajesh" Speaker voice. See Available Speakers.
language str "en" ISO 639 language code ("en", "hi", "ta", etc.). Now required by gateway.
model str "zero-indic" TTS model identifier.
style str None Emotion/delivery style tag. If omitted, gateway uses <Conversational>.
url str wss://tts.shunyalabs.ai/ws WebSocket endpoint URL.
output_format str "pcm" Audio encoding. See Output Formats.
speed float 1.0 Speaking speed multiplier (0.25–4.0).
sample_rate int 16000 Output sample rate in Hz.

Note for upgraders: The speaker parameter has been removed in 1.0.1 — use voice only. See migration notes.

Output Formats

Format Value Recommended Use
PCM (raw 16-bit) pcm Real-time pipelines, Pipecat TTSAudioRawFrame
WAV wav Uncompressed storage, offline processing
MP3 mp3 Compressed storage, web delivery
OGG Opus ogg_opus Compressed web streaming
FLAC flac Lossless compressed storage
mu-law mulaw Telephony systems (G.711)
A-law alaw Telephony systems (G.711 European)

Style Tags

Tag Description
<Conversational> Casual, everyday speech — default if style omitted
<Neutral> Clean read-speech
<Happy> Joyful, upbeat tone
<Sad> Somber, melancholic tone
<Angry> Forceful, intense tone
<Fearful> Anxious, trembling tone
<Surprised> Exclamatory, astonished tone
<Disgust> Repulsed, disapproving tone
<News> Formal news-anchor style
<Narrative> Storytelling / audiobook delivery style
<Enthusiastic> Energetic, passionate tone

Text Formatting

The plugin only prepends the style tag (if you set one). The gateway handles the speaker prefix and default style tag server-side.

tts = ShunyalabsTTSService(voice="Rajesh", style="<Happy>", language="en")
# Plugin sends:    "<Happy> Welcome!"
# Gateway expands: "Rajesh: <Happy> Welcome!"

If you omit style:

tts = ShunyalabsTTSService(voice="Rajesh", language="en")
# Plugin sends:    "Welcome!"
# Gateway expands: "Rajesh: <Conversational> Welcome!"

Available Speakers

46 speakers across 23 languages (1 male + 1 female per language). Every speaker can synthesize in any language.

Language Male Female
English Varun Nisha
Hindi Rajesh (default) Sunita
Bengali Arjun Priyanka
Tamil Murugan Thangam
Telugu Vishnu Lakshmi
Kannada Kiran Shreya
Malayalam Krishnan Deepa
Marathi Siddharth Ananya
Gujarati Rakesh Pooja
Punjabi Gurpreet Simran
Urdu Salman Fatima
Odia Bijay Sujata
Assamese Bimal Anjana
Maithili Suresh Meera
Nepali Bikash Sapana
Sanskrit Vedant Gayatri
Kashmiri Farooq Habba
Konkani Mohan Sarita
Dogri Vishal Neelam
Sindhi Amjad Kavita
Manipuri Tomba Ibemhal
Santali Chandu Roshni
Bodo Daimalu Hasina

Frame Output

Frame Description
TTSStartedFrame Emitted when synthesis begins.
TTSAudioRawFrame Emitted for each audio chunk (PCM, 16kHz, mono).
TTSStoppedFrame Emitted when synthesis completes.

Examples

Default (Conversational) — recommended for voice agents:

tts = ShunyalabsTTSService(
    voice="Nisha",
    language="en",
)

Custom emotion + speed:

tts = ShunyalabsTTSService(
    voice="Nisha",
    style="<Enthusiastic>",
    language="en",
    speed=1.1,
    output_format="pcm",
)

Hindi news-style:

tts = ShunyalabsTTSService(
    voice="Rajesh",
    language="hi",
    style="<News>",
)

Full Pipeline Example

A complete voice agent using Shunyalabs STT and TTS with OpenAI LLM on the Daily WebRTC transport:

import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
    OpenAILLMContext, OpenAILLMContextAggregator,
)
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService

async def run_voice_agent(room_url: str, token: str):
    transport = DailyTransport(
        room_url, token, "Shunyalabs Agent",
        DailyParams(audio_out_enabled=True, transcription_enabled=False),
    )

    stt = ShunyalabsSTTService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        language="auto",
        sample_rate=16000,
    )

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o",
    )

    messages = [{
        "role": "system",
        "content": (
            "You are a helpful voice assistant powered by Shunyalabs. "
            "Keep responses concise and natural for voice delivery."
        ),
    }]
    context = OpenAILLMContext(messages)
    context_aggregator = llm.create_context_aggregator(context)

    tts = ShunyalabsTTSService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        voice="Rajesh",
        language="hi",  # required
    )

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ])

    task = PipelineTask(
        pipeline,
        PipelineParams(allow_interruptions=True, enable_metrics=True),
    )

    @transport.event_handler("on_first_participant_joined")
    async def on_first_participant_joined(transport, participant):
        await task.queue_frames([context_aggregator.user().get_context_frame()])

    await PipelineRunner().run(task)

if __name__ == "__main__":
    asyncio.run(run_voice_agent(
        room_url=os.environ["DAILY_ROOM_URL"],
        token=os.environ["DAILY_TOKEN"],
    ))

Multilingual Examples

# Hindi conversational bot (default style)
tts = ShunyalabsTTSService(voice="Rajesh", language="hi")

# English news-style bot
tts = ShunyalabsTTSService(voice="Varun", language="en", style="<News>")

# Tamil narrative voice
tts = ShunyalabsTTSService(voice="Murugan", language="ta", style="<Narrative>")

Error Reference

All Shunyalabs SDK exceptions inherit from ShunyalabsError.

Exception HTTP Code Description
AuthenticationError 401 Invalid or missing API key.
PermissionDeniedError 403 API key lacks permission for the resource.
NotFoundError 404 Requested resource not found.
ValidationError 422 Missing required field (e.g. language).
RateLimitError 429 Rate limit exceeded. Implement exponential backoff.
ServerError 5xx Server-side error. Retried automatically.
TimeoutError Request exceeded timeout (default 60s).
ConnectionError Network connectivity issue.
TranscriptionError ASR-specific failure (e.g. unsupported audio format).
SynthesisError TTS-specific failure (e.g. invalid voice parameter).
from shunyalabs.exceptions import AuthenticationError, RateLimitError, ShunyalabsError

try:
    result = await client.tts.synthesize(text, config=config)
except AuthenticationError:
    print("Invalid API key — check SHUNYALABS_API_KEY")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except ShunyalabsError as e:
    print(f"Unexpected error: {e}")

Troubleshooting

Symptom Resolution
HTTP 422 "language: Field required" Add language="en" (or another ISO 639 code) to ShunyalabsTTSService(...).
TTS audio sounds wrong / muddied (after upgrade) Remove speaker=... from the constructor — it's no longer needed and was causing a double-prefix bug in 1.0.0.
AuthenticationError on startup Verify SHUNYALABS_API_KEY is set and valid.
WebSocket connection refused Ensure outbound WSS (port 443) is open to asr.shunyalabs.ai and tts.shunyalabs.ai.
No transcription output Check sample_rate matches your transport input. Verify audio source is active.
TTS audio silent or missing Ensure output_format=pcm matches transport output. Verify TTSStartedFrame is received.
High latency on first TTS chunk Deploy closer to the Shunyalabs gateway region (asia-south1).
RateLimitError Implement exponential backoff. Check e.retry_after.
ImportError: pipecat_shunyalabs Run pip install pipecat-shunyalabs. Confirm virtual environment is activated.

Changelog

1.0.1 (2026-04-11)

  • Breaking: language is now required by the gateway. Always pass an ISO 639 code.
  • Breaking: Removed speaker parameter (was a duplicate of voice).
  • Bug fix: _format_text no longer prepends the speaker name on top of the gateway's server-side prefix. Old 1.0.0 behaviour produced "Rajesh: Rajesh: <Neutral> ..." in the model prompt and degraded output quality.
  • style is now optional — defaults to <Conversational> server-side.

1.0.0

  • Initial public release.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipecat_shunyalabs-1.1.1.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipecat_shunyalabs-1.1.1-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file pipecat_shunyalabs-1.1.1.tar.gz.

File metadata

  • Download URL: pipecat_shunyalabs-1.1.1.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pipecat_shunyalabs-1.1.1.tar.gz
Algorithm Hash digest
SHA256 83a9498c581d5ebaab4583ba3c4057a16e8923594cd2ee2bb14c63aa4350c07f
MD5 ca6fb6d8bf88d480e0e299fd332d831e
BLAKE2b-256 21a155ccf74694c81739b7cddf6baa766443bd0a31e1151962d3830d3644085a

See more details on using hashes here.

File details

Details for the file pipecat_shunyalabs-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pipecat_shunyalabs-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc4e28a8821e2a4cd11aaab5346110bf845f87c18be2f285081352a91bd20061
MD5 a36cdc8421efba675ce01bb9707bbf50
BLAKE2b-256 b348ab0f76ffedc99ca2663a15d3c9497ab8024f189d814db41301f1036d4b64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page