Shunyalabs ASR & TTS services for Pipecat

These details have not been verified by PyPI

Project links

Project description

pipecat-shunyalabs

Shunyalabs STT and TTS services for Pipecat.

Provides ShunyalabsSTTService and ShunyalabsTTSService that integrate with Pipecat's pipeline framework, backed by the Shunyalabs Python SDK.

Key capabilities:

Real-time streaming ASR with interim and final transcription frames
High-fidelity voice synthesis with 46 speakers across 23 languages
11 emotion/delivery style tags for expressive voice responses
Native Pipecat frame protocol — drop-in with any Pipecat pipeline
Persistent WebSocket for STT; per-request WebSocket for TTS
Output formats: PCM, WAV, MP3, OGG Opus, FLAC, mu-law, A-law

⚠️ Upgrading from 1.0.0 → 1.0.1

If you're already using pipecat-shunyalabs, read this first. There are two breaking changes that affect TTS:

1. `language` is now required

Old (1.0.0) silently accepted missing language; the gateway now returns HTTP 422 if it's missing. Always pass an ISO 639 code:

  tts = ShunyalabsTTSService(
      voice="Rajesh",
+     language="en",        # required — pass "en", "hi", "ta", etc.
  )

2. `speaker` parameter removed; `_format_text` no longer prepends speaker name

Old behaviour produced text like "Rajesh: <Neutral> Hello" — but the gateway already prepends the speaker name server-side, which caused "Rajesh: Rajesh: <Neutral> Hello" in the LLM prompt and resulted in muddied output. Fixed in 1.0.1.

  tts = ShunyalabsTTSService(
      voice="Rajesh",
-     speaker="Rajesh",     # remove — was a duplicate of `voice`
-     style="<Neutral>",    # optional now — gateway defaults to <Conversational>
+     style="<Happy>",      # only set this if you want a non-default style
      language="en",
  )

3. `style` is now optional

If you don't pass style, the gateway automatically applies <Conversational>. You only need to set style when you want a specific emotion (e.g. <Happy>, <Sad>, <News>).

Quick install upgrade

pip install --upgrade pipecat-shunyalabs

That's it for migrations. Everything else works as before.

Installation (New Users)

Requirements: Python 3.9+, Pipecat framework, a valid Shunyalabs API key.

pip install pipecat-shunyalabs

Install with a transport:

# Daily WebRTC transport
pip install pipecat-shunyalabs pipecat-ai[daily]

Authentication

Set your API key as an environment variable (recommended):

export SHUNYALABS_API_KEY="your-api-key"

Or pass it directly:

stt = ShunyalabsSTTService(api_key="your-api-key")
tts = ShunyalabsTTSService(api_key="your-api-key")

Security: Never commit API keys to source control. Use a secrets manager (GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault) in production.

Quick Start

import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService

async def main():
    transport = LocalAudioTransport()

    stt = ShunyalabsSTTService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        language="en",
    )

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o",
    )

    tts = ShunyalabsTTSService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        voice="Rajesh",
        language="en",
        # style is optional — defaults to <Conversational>
    )

    pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])
    task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))
    await PipelineRunner().run(task)

if __name__ == "__main__":
    asyncio.run(main())

STT — `ShunyalabsSTTService`

Real-time streaming speech-to-text over WebSocket. Maintains a persistent connection for the lifetime of the pipeline. Supports 23 Indian and international languages with automatic language detection.

Parameters

Parameter	Type	Default	Description
`api_key`	`str`	`None`	API key. Falls back to `SHUNYALABS_API_KEY` env var.
`language`	`str`	`"auto"`	Language code (e.g. `"en"`, `"hi"`) or `"auto"` for auto-detection.
`url`	`str`	`wss://asr.shunyalabs.ai/ws`	WebSocket endpoint URL.
`sample_rate`	`int`	`16000`	Expected audio sample rate in Hz. Must match transport input.

How It Works

On pipeline start, opens a WebSocket connection to the Shunyalabs ASR gateway.
Audio chunks from the pipeline input are forwarded via send_audio().
The gateway's built-in VAD detects speech boundaries and emits transcription events.
Events are mapped to Pipecat frames and pushed into the pipeline.

Frame Mapping

Shunyalabs Event	Pipecat Frame
`PARTIAL`	`InterimTranscriptionFrame` — emitted continuously as speech is recognized
`FINAL_SEGMENT`	`TranscriptionFrame` — emitted at speech segment boundary
`FINAL`	`TranscriptionFrame` — emitted when full utterance is finalized

Auto-Reconnect

If the WebSocket connection drops during audio streaming, the service automatically reconnects and resumes sending audio.

TTS — `ShunyalabsTTSService`

Streaming text-to-speech over WebSocket. Each synthesis request opens a new connection, streams audio chunks back as TTSAudioRawFrame frames. Supports 46 speakers across 23 languages — any speaker can synthesize in any language.

Parameters

Parameter	Type	Default	Required	Description
`api_key`	`str`	env `SHUNYALABS_API_KEY`	✓	API key. Pass directly or via env var.
`voice`	`str`	`"Rajesh"`	✓	Speaker voice. See Available Speakers.
`language`	`str`	`"en"`	✓	ISO 639 language code (`"en"`, `"hi"`, `"ta"`, etc.). Now required by gateway.
`model`	`str`	`"zero-indic"`		TTS model identifier.
`style`	`str`	`None`		Emotion/delivery style tag. If omitted, gateway uses `<Conversational>`.
`url`	`str`	`wss://tts.shunyalabs.ai/ws`		WebSocket endpoint URL.
`output_format`	`str`	`"pcm"`		Audio encoding. See Output Formats.
`speed`	`float`	`1.0`		Speaking speed multiplier (0.25–4.0).
`sample_rate`	`int`	`16000`		Output sample rate in Hz.

Note for upgraders: The speaker parameter has been removed in 1.0.1 — use voice only. See migration notes.

Output Formats

Format	Value	Recommended Use
PCM (raw 16-bit)	`pcm`	Real-time pipelines, Pipecat `TTSAudioRawFrame`
WAV	`wav`	Uncompressed storage, offline processing
MP3	`mp3`	Compressed storage, web delivery
OGG Opus	`ogg_opus`	Compressed web streaming
FLAC	`flac`	Lossless compressed storage
mu-law	`mulaw`	Telephony systems (G.711)
A-law	`alaw`	Telephony systems (G.711 European)

Style Tags

Tag	Description
`<Conversational>`	Casual, everyday speech — default if `style` omitted
`<Neutral>`	Clean read-speech
`<Happy>`	Joyful, upbeat tone
`<Sad>`	Somber, melancholic tone
`<Angry>`	Forceful, intense tone
`<Fearful>`	Anxious, trembling tone
`<Surprised>`	Exclamatory, astonished tone
`<Disgust>`	Repulsed, disapproving tone
`<News>`	Formal news-anchor style
`<Narrative>`	Storytelling / audiobook delivery style
`<Enthusiastic>`	Energetic, passionate tone

Text Formatting

The plugin only prepends the style tag (if you set one). The gateway handles the speaker prefix and default style tag server-side.

tts = ShunyalabsTTSService(voice="Rajesh", style="<Happy>", language="en")
# Plugin sends:    "<Happy> Welcome!"
# Gateway expands: "Rajesh: <Happy> Welcome!"

If you omit style:

tts = ShunyalabsTTSService(voice="Rajesh", language="en")
# Plugin sends:    "Welcome!"
# Gateway expands: "Rajesh: <Conversational> Welcome!"

Available Speakers

46 speakers across 23 languages (1 male + 1 female per language). Every speaker can synthesize in any language.

Language	Male	Female
English	Varun	Nisha
Hindi	Rajesh (default)	Sunita
Bengali	Arjun	Priyanka
Tamil	Murugan	Thangam
Telugu	Vishnu	Lakshmi
Kannada	Kiran	Shreya
Malayalam	Krishnan	Deepa
Marathi	Siddharth	Ananya
Gujarati	Rakesh	Pooja
Punjabi	Gurpreet	Simran
Urdu	Salman	Fatima
Odia	Bijay	Sujata
Assamese	Bimal	Anjana
Maithili	Suresh	Meera
Nepali	Bikash	Sapana
Sanskrit	Vedant	Gayatri
Kashmiri	Farooq	Habba
Konkani	Mohan	Sarita
Dogri	Vishal	Neelam
Sindhi	Amjad	Kavita
Manipuri	Tomba	Ibemhal
Santali	Chandu	Roshni
Bodo	Daimalu	Hasina

Frame Output

Frame	Description
`TTSStartedFrame`	Emitted when synthesis begins.
`TTSAudioRawFrame`	Emitted for each audio chunk (PCM, 16kHz, mono).
`TTSStoppedFrame`	Emitted when synthesis completes.

Examples

Default (Conversational) — recommended for voice agents:

tts = ShunyalabsTTSService(
    voice="Nisha",
    language="en",
)

Custom emotion + speed:

tts = ShunyalabsTTSService(
    voice="Nisha",
    style="<Enthusiastic>",
    language="en",
    speed=1.1,
    output_format="pcm",
)

Hindi news-style:

tts = ShunyalabsTTSService(
    voice="Rajesh",
    language="hi",
    style="<News>",
)

Full Pipeline Example

A complete voice agent using Shunyalabs STT and TTS with OpenAI LLM on the Daily WebRTC transport:

import asyncio, os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
    OpenAILLMContext, OpenAILLMContextAggregator,
)
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.services.daily import DailyParams, DailyTransport
from pipecat_shunyalabs import ShunyalabsSTTService, ShunyalabsTTSService

async def run_voice_agent(room_url: str, token: str):
    transport = DailyTransport(
        room_url, token, "Shunyalabs Agent",
        DailyParams(audio_out_enabled=True, transcription_enabled=False),
    )

    stt = ShunyalabsSTTService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        language="auto",
        sample_rate=16000,
    )

    llm = OpenAILLMService(
        api_key=os.environ["OPENAI_API_KEY"],
        model="gpt-4o",
    )

    messages = [{
        "role": "system",
        "content": (
            "You are a helpful voice assistant powered by Shunyalabs. "
            "Keep responses concise and natural for voice delivery."
        ),
    }]
    context = OpenAILLMContext(messages)
    context_aggregator = llm.create_context_aggregator(context)

    tts = ShunyalabsTTSService(
        api_key=os.environ["SHUNYALABS_API_KEY"],
        voice="Rajesh",
        language="hi",  # required
    )

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ])

    task = PipelineTask(
        pipeline,
        PipelineParams(allow_interruptions=True, enable_metrics=True),
    )

    @transport.event_handler("on_first_participant_joined")
    async def on_first_participant_joined(transport, participant):
        await task.queue_frames([context_aggregator.user().get_context_frame()])

    await PipelineRunner().run(task)

if __name__ == "__main__":
    asyncio.run(run_voice_agent(
        room_url=os.environ["DAILY_ROOM_URL"],
        token=os.environ["DAILY_TOKEN"],
    ))

Multilingual Examples

# Hindi conversational bot (default style)
tts = ShunyalabsTTSService(voice="Rajesh", language="hi")

# English news-style bot
tts = ShunyalabsTTSService(voice="Varun", language="en", style="<News>")

# Tamil narrative voice
tts = ShunyalabsTTSService(voice="Murugan", language="ta", style="<Narrative>")

Error Reference

All Shunyalabs SDK exceptions inherit from ShunyalabsError.

Exception	HTTP Code	Description
`AuthenticationError`	401	Invalid or missing API key.
`PermissionDeniedError`	403	API key lacks permission for the resource.
`NotFoundError`	404	Requested resource not found.
`ValidationError`	422	Missing required field (e.g. `language`).
`RateLimitError`	429	Rate limit exceeded. Implement exponential backoff.
`ServerError`	5xx	Server-side error. Retried automatically.
`TimeoutError`	—	Request exceeded timeout (default 60s).
`ConnectionError`	—	Network connectivity issue.
`TranscriptionError`	—	ASR-specific failure (e.g. unsupported audio format).
`SynthesisError`	—	TTS-specific failure (e.g. invalid voice parameter).

from shunyalabs.exceptions import AuthenticationError, RateLimitError, ShunyalabsError

try:
    result = await client.tts.synthesize(text, config=config)
except AuthenticationError:
    print("Invalid API key — check SHUNYALABS_API_KEY")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except ShunyalabsError as e:
    print(f"Unexpected error: {e}")

Troubleshooting

Symptom	Resolution
HTTP 422 "language: Field required"	Add `language="en"` (or another ISO 639 code) to `ShunyalabsTTSService(...)`.
TTS audio sounds wrong / muddied (after upgrade)	Remove `speaker=...` from the constructor — it's no longer needed and was causing a double-prefix bug in 1.0.0.
`AuthenticationError` on startup	Verify `SHUNYALABS_API_KEY` is set and valid.
WebSocket connection refused	Ensure outbound WSS (port 443) is open to `asr.shunyalabs.ai` and `tts.shunyalabs.ai`.
No transcription output	Check `sample_rate` matches your transport input. Verify audio source is active.
TTS audio silent or missing	Ensure `output_format=pcm` matches transport output. Verify `TTSStartedFrame` is received.
High latency on first TTS chunk	Deploy closer to the Shunyalabs gateway region (`asia-south1`).
`RateLimitError`	Implement exponential backoff. Check `e.retry_after`.
`ImportError: pipecat_shunyalabs`	Run `pip install pipecat-shunyalabs`. Confirm virtual environment is activated.

Changelog

1.0.1 (2026-04-11)

Breaking: language is now required by the gateway. Always pass an ISO 639 code.
Breaking: Removed speaker parameter (was a duplicate of voice).
Bug fix: _format_text no longer prepends the speaker name on top of the gateway's server-side prefix. Old 1.0.0 behaviour produced "Rajesh: Rajesh: <Neutral> ..." in the model prompt and degraded output quality.
style is now optional — defaults to <Conversational> server-side.

1.0.0

Initial public release.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.1

May 28, 2026

This version

1.1.0

May 28, 2026

1.0.4

Apr 17, 2026

1.0.3

Apr 16, 2026

1.0.2

Apr 16, 2026

1.0.1

Apr 13, 2026

1.0.0

Mar 16, 2026

0.1.0

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipecat_shunyalabs-1.1.0.tar.gz (24.0 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pipecat_shunyalabs-1.1.0-py3-none-any.whl (20.1 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file pipecat_shunyalabs-1.1.0.tar.gz.

File metadata

Download URL: pipecat_shunyalabs-1.1.0.tar.gz
Upload date: May 28, 2026
Size: 24.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pipecat_shunyalabs-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5054696ceb9ab0d31ef153b28513291083b8a175ed22b3dd9c6424020ce63c83`
MD5	`745f3bd956b726d9fc1af8649d03b5ea`
BLAKE2b-256	`3814a20fbf5f3c0ed638360b12cbb8f14e1a829e67be40f7cece1ace6f69c9b8`

See more details on using hashes here.

File details

Details for the file pipecat_shunyalabs-1.1.0-py3-none-any.whl.

File metadata

Download URL: pipecat_shunyalabs-1.1.0-py3-none-any.whl
Upload date: May 28, 2026
Size: 20.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pipecat_shunyalabs-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`077d2c922cd5940517f07b2a075c0e49ee1df1719b4ce5d1009db2723b773c99`
MD5	`270f84c75607d163d75b88e6d09fd790`
BLAKE2b-256	`9bf8d67d34f2c4875c3e707775d2163969316cc25fcfea7510b059d99c11660c`

See more details on using hashes here.

pipecat-shunyalabs 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pipecat-shunyalabs

⚠️ Upgrading from 1.0.0 → 1.0.1

1. language is now required

2. speaker parameter removed; _format_text no longer prepends speaker name

3. style is now optional

Quick install upgrade

Installation (New Users)

Authentication

Quick Start

STT — ShunyalabsSTTService

Parameters

How It Works

Frame Mapping

Auto-Reconnect

TTS — ShunyalabsTTSService

Parameters

Output Formats

Style Tags

Text Formatting

Available Speakers

Frame Output

Examples

Full Pipeline Example

Multilingual Examples

Error Reference

Troubleshooting

Changelog

1.0.1 (2026-04-11)

1.0.0

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `language` is now required

2. `speaker` parameter removed; `_format_text` no longer prepends speaker name

3. `style` is now optional

STT — `ShunyalabsSTTService`

TTS — `ShunyalabsTTSService`