Skip to main content

Shunyalabs ASR & TTS plugin for LiveKit Agents

Project description

livekit-plugins-shunyalabs

PyPI License: MIT

Shunyalabs STT and TTS plugin for LiveKit Agents.

Provides STT (speech-to-text) and TTS (text-to-speech) classes that integrate with LiveKit's agent framework, backed by the Shunyalabs Python SDK.

Installation

pip install livekit-plugins-shunyalabs

Authentication

Set your API key as an environment variable:

export SHUNYALABS_API_KEY="your-api-key"

Or pass it directly:

stt = shunyalabs.STT(api_key="your-api-key")
tts = shunyalabs.TTS(api_key="your-api-key")

Quick Start

from livekit.agents import AgentSession
from livekit.plugins import shunyalabs, silero

session = AgentSession(
    stt=shunyalabs.STT(language="en"),
    tts=shunyalabs.TTS(speaker="Rajesh", style="<Neutral>"),
    vad=silero.VAD.load(),
)

STT (Speech-to-Text)

shunyalabs.STT

Parameter Type Default Description
api_key str None API key. Falls back to SHUNYALABS_API_KEY env var.
language str "auto" BCP-47 language code or "auto" for auto-detection.
api_url str https://asr.shunyalabs.ai REST batch endpoint base URL.
ws_url str wss://asr.shunyalabs.ai/ws WebSocket streaming endpoint URL.

Capabilities

Capability Supported
Streaming (real-time) Yes
Interim results Yes
Offline/batch recognition Yes

Streaming STT

Real-time transcription over WebSocket. Audio frames from LiveKit are forwarded to the Shunyalabs ASR gateway; transcription events are pushed back as SpeechEvents.

from livekit.agents import AgentSession
from livekit.plugins import shunyalabs, silero

session = AgentSession(
    stt=shunyalabs.STT(language="en"),
    vad=silero.VAD.load(),
)

@session.on("user_speech_committed")
def on_speech(ev):
    print(f"User said: {ev.transcript}")

Event mapping:

Shunyalabs Event LiveKit SpeechEventType
PARTIAL INTERIM_TRANSCRIPT
FINAL_SEGMENT FINAL_TRANSCRIPT + END_OF_SPEECH
FINAL FINAL_TRANSCRIPT + RECOGNITION_USAGE

Batch STT

Single-shot transcription of an audio buffer. Uses POST /v1/audio/transcriptions via the SDK's AsyncBatchASR.

from livekit.plugins import shunyalabs

stt = shunyalabs.STT(language="en")

# In an agent context:
event = await stt.recognize(audio_buffer)
print(event.alternatives[0].text)

TTS (Text-to-Speech)

shunyalabs.TTS

Parameter Type Default Description
api_key str None API key. Falls back to SHUNYALABS_API_KEY env var.
api_url str https://tts.shunyalabs.ai HTTP batch endpoint base URL.
ws_url str wss://tts.shunyalabs.ai/ws WebSocket streaming endpoint URL.
model str "zero-indic" TTS model name.
voice str "Rajesh" Voice name for the API.
speaker str "Rajesh" Speaker name prefix for text formatting.
style str "<Neutral>" Emotion style tag. See Style Tags.
language str "en" Language code for transliteration.
sample_rate int 16000 Output audio sample rate in Hz.
output_format str "pcm" Audio format ("pcm", "wav", "mp3", "ogg_opus", "flac").
speed float 1.0 Speaking speed multiplier (0.25–4.0).

Style Tags

Tag Description
<Neutral> Neutral tone
<Happy> Happy/cheerful
<Sad> Sad/melancholic
<Angry> Angry/intense
<Fearful> Fearful/anxious
<Surprised> Surprised/excited
<Disgust> Disgusted
<News> News anchor style
<Conversational> Casual conversational
<Narrative> Storytelling/narration
<Enthusiastic> Enthusiastic/energetic

Text Formatting

The plugin automatically formats text as "<Style> text" before sending to the API. For example:

tts = shunyalabs.TTS(speaker="Rajesh", style="<Happy>")
# Input: "Welcome to our platform"
# Sent:  "<Happy> Welcome to our platform"

Streaming TTS

Token-by-token streaming. Collects text tokens, then synthesizes on flush via WebSocket streaming.

from livekit.agents import AgentSession
from livekit.plugins import shunyalabs

session = AgentSession(
    tts=shunyalabs.TTS(
        speaker="Nisha",
        style="<Conversational>",
        model="zero-indic",
        voice="Nisha",
    ),
)

Chunked (Batch) TTS

Single text → audio synthesis via HTTP batch API.

from livekit.plugins import shunyalabs

tts = shunyalabs.TTS(speaker="Varun", voice="Varun")
stream = tts.synthesize("Hello, how can I help you today?")

Full Agent Example

import asyncio
from livekit import api
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import shunyalabs, silero

class MyAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful voice assistant.",
        )

async def entrypoint(ctx):
    session = AgentSession(
        stt=shunyalabs.STT(language="auto"),
        tts=shunyalabs.TTS(
            model="zero-indic",
            voice="Rajesh",
            speaker="Rajesh",
            style="<Conversational>",
        ),
        vad=silero.VAD.load(),
    )
    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_input_options=RoomInputOptions(),
    )

Multilingual Example

# Hindi speaker
tts_hindi = shunyalabs.TTS(
    speaker="Rajesh",
    voice="Rajesh",
    language="hi",
    style="<Neutral>",
)

# English speaker
tts_english = shunyalabs.TTS(
    speaker="Varun",
    voice="Varun",
    language="en",
    style="<Conversational>",
)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_shunyalabs-1.0.1.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_shunyalabs-1.0.1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_shunyalabs-1.0.1.tar.gz.

File metadata

File hashes

Hashes for livekit_plugins_shunyalabs-1.0.1.tar.gz
Algorithm Hash digest
SHA256 38f9c0e538b828c2643def70c3d6870a5982ab538e6255a417a55ea57cf35ee9
MD5 2153f06026cdf022a69c03520a5152d8
BLAKE2b-256 07c2d020c8efe7002230847bf3a1d6545f9b49a5e3486a360e802432f5158bd6

See more details on using hashes here.

File details

Details for the file livekit_plugins_shunyalabs-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_shunyalabs-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8b6ba387ab9e44b55213833340eaac6559e49fed10b8ce49178af8f7bb450753
MD5 81b0b2f3b8d6fe59feda5da1f2a1340f
BLAKE2b-256 91dfc12538103e1298ccdfe2120f1cf2fcca78ac99fcc75e5bd2f95bef0ffdfa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page