Skip to main content

Shunyalabs ASR & TTS plugin for LiveKit Agents

Project description

livekit-plugins-shunyalabs

PyPI License: MIT

Shunyalabs STT and TTS plugin for LiveKit Agents.

Provides STT (speech-to-text) and TTS (text-to-speech) classes that integrate with LiveKit's agent framework, backed by the Shunyalabs Python SDK.

Installation

pip install livekit-plugins-shunyalabs

Authentication

Set your API key as an environment variable:

export SHUNYALABS_API_KEY="your-api-key"

Or pass it directly:

stt = shunyalabs.STT(api_key="your-api-key")
tts = shunyalabs.TTS(api_key="your-api-key")

Quick Start

from livekit.agents import AgentSession
from livekit.plugins import shunyalabs, silero

session = AgentSession(
    stt=shunyalabs.STT(language="en"),
    tts=shunyalabs.TTS(speaker="Rajesh", style="<Neutral>"),
    vad=silero.VAD.load(),
)

STT (Speech-to-Text)

shunyalabs.STT

Parameter Type Default Description
api_key str None API key. Falls back to SHUNYALABS_API_KEY env var.
language str "auto" BCP-47 language code or "auto" for auto-detection.
api_url str https://asr.shunyalabs.ai REST batch endpoint base URL.
ws_url str wss://asr.shunyalabs.ai/ws WebSocket streaming endpoint URL.

Capabilities

Capability Supported
Streaming (real-time) Yes
Interim results Yes
Offline/batch recognition Yes

Streaming STT

Real-time transcription over WebSocket. Audio frames from LiveKit are forwarded to the Shunyalabs ASR gateway; transcription events are pushed back as SpeechEvents.

from livekit.agents import AgentSession
from livekit.plugins import shunyalabs, silero

session = AgentSession(
    stt=shunyalabs.STT(language="en"),
    vad=silero.VAD.load(),
)

@session.on("user_speech_committed")
def on_speech(ev):
    print(f"User said: {ev.transcript}")

Event mapping:

Shunyalabs Event LiveKit SpeechEventType
PARTIAL INTERIM_TRANSCRIPT
FINAL_SEGMENT FINAL_TRANSCRIPT + END_OF_SPEECH
FINAL FINAL_TRANSCRIPT + RECOGNITION_USAGE

Batch STT

Single-shot transcription of an audio buffer. Uses POST /v1/audio/transcriptions via the SDK's AsyncBatchASR.

from livekit.plugins import shunyalabs

stt = shunyalabs.STT(language="en")

# In an agent context:
event = await stt.recognize(audio_buffer)
print(event.alternatives[0].text)

TTS (Text-to-Speech)

shunyalabs.TTS

Parameter Type Default Description
api_key str None API key. Falls back to SHUNYALABS_API_KEY env var.
api_url str https://tts.shunyalabs.ai HTTP batch endpoint base URL.
ws_url str wss://tts.shunyalabs.ai/ws WebSocket streaming endpoint URL.
model str "zero-indic" TTS model name.
voice str "Rajesh" Voice name for the API.
speaker str "Rajesh" Speaker name prefix for text formatting.
style str "<Neutral>" Emotion style tag. See Style Tags.
language str "en" Language code for transliteration.
sample_rate int 16000 Output audio sample rate in Hz.
output_format str "pcm" Audio format ("pcm", "wav", "mp3", "ogg_opus", "flac").
speed float 1.0 Speaking speed multiplier (0.25–4.0).

Style Tags

Tag Description
<Neutral> Neutral tone
<Happy> Happy/cheerful
<Sad> Sad/melancholic
<Angry> Angry/intense
<Fearful> Fearful/anxious
<Surprised> Surprised/excited
<Disgust> Disgusted
<News> News anchor style
<Conversational> Casual conversational
<Narrative> Storytelling/narration
<Enthusiastic> Enthusiastic/energetic

Text Formatting

The plugin automatically formats text as "<Style> text" before sending to the API. For example:

tts = shunyalabs.TTS(speaker="Rajesh", style="<Happy>")
# Input: "Welcome to our platform"
# Sent:  "<Happy> Welcome to our platform"

Streaming TTS

Token-by-token streaming. Collects text tokens, then synthesizes on flush via WebSocket streaming.

from livekit.agents import AgentSession
from livekit.plugins import shunyalabs

session = AgentSession(
    tts=shunyalabs.TTS(
        speaker="Nisha",
        style="<Conversational>",
        model="zero-indic",
        voice="Nisha",
    ),
)

Chunked (Batch) TTS

Single text → audio synthesis via HTTP batch API.

from livekit.plugins import shunyalabs

tts = shunyalabs.TTS(speaker="Varun", voice="Varun")
stream = tts.synthesize("Hello, how can I help you today?")

Full Agent Example

import asyncio
from livekit import api
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import shunyalabs, silero

class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful voice assistant.",
        )

async def entrypoint(ctx):
    session = AgentSession(
        stt=shunyalabs.STT(language="auto"),
        tts=shunyalabs.TTS(
            model="zero-indic",
            voice="Rajesh",
            speaker="Rajesh",
            style="<Conversational>",
        ),
        vad=silero.VAD.load(),
    )
    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_input_options=RoomInputOptions(),
    )

Multilingual Example

# Hindi speaker
tts_hindi = shunyalabs.TTS(
    speaker="Rajesh",
    voice="Rajesh",
    language="hi",
    style="<Neutral>",
)

# English speaker
tts_english = shunyalabs.TTS(
    speaker="Varun",
    voice="Varun",
    language="en",
    style="<Conversational>",
)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_shunyalabs-1.0.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_shunyalabs-1.0.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_shunyalabs-1.0.0.tar.gz.

File metadata

File hashes

Hashes for livekit_plugins_shunyalabs-1.0.0.tar.gz
Algorithm Hash digest
SHA256 33c9ef01f7e6617f06fc0c1fc380ce2095c66031e832ffdd0eb15f2c726eede9
MD5 6186016fe4abb7dbaf517c0809897aa7
BLAKE2b-256 251264ab2e85caf0c4ef08372901fb0671b773f75b1942fe6a75814f6b3f5927

See more details on using hashes here.

File details

Details for the file livekit_plugins_shunyalabs-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_shunyalabs-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2444a46e25984d95a4a76e3ea6e7354bffef85fdb46e622e9d48d7d10072d7a
MD5 c585c36e43f825445452d182ea14f6d3
BLAKE2b-256 01408373dbfd039bf68531e25987e106cec4db0fd09a56c79eaaa94fcee190b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page