Skip to main content

LiveKit Agents plugin for text-to-speech with Rumik AI (muga & mulberry).

Project description

livekit-plugins-rumik-ai

Rumik AI text-to-speech plugin for LiveKit Agents.

Streams low-latency 24 kHz speech from Rumik's Silk models over a reusable WebSocket session:

  • muga — emotion-controlled via a leading [tone] tag (e.g. [happy], [sad]) plus optional <laugh>/<chuckle>/<sigh> events. Tuned for Romanized Hinglish.
  • mulberry — steered by a natural-language voice description or a preset speaker, with optional pitch shift (f0_up_key).

Install

pip install livekit-plugins-rumik-ai

This depends on livekit-agents (1.5+). Set your key:

export RUMIK_API_KEY="your-rumik-api-key"

Quickstart

from livekit.agents import AgentSession
from livekit.plugins import rumik_ai

# muga: the LLM should start each reply with one tone tag, e.g. "[happy] ..."
session = AgentSession(
    stt=...,
    llm=...,
    tts=rumik_ai.TTS(model="muga"),
)

Mulberry, steered by a voice description (or a preset speaker):

tts = rumik_ai.TTS(
    model="mulberry",
    description="warm, gentle female friend",
    # speaker="speaker_1",      # optional preset, overrides description
    # f0_up_key=2.0,            # optional pitch shift, -12..12 semitones
)

Changing the voice at runtime

description, speaker, f0_up_key, and the sampling params are sent on every request, so you can change mulberry's voice between turns without reconnecting — the pooled WebSocket is reused (only a model change re-mints the session):

tts.update_options(description="excited young man, fast and energetic")
# the next synthesis request uses the new voice

Latency vs. smoothness

The default is model-aware:

  • muga buffers the full LLM reply and synthesizes it in one request, so its leading [tone] tag conditions the whole utterance (and there are no per-request TTFB gaps).
  • mulberry streams sentence-by-sentence for lower time-to-first-word, since it has no tone tag to protect.

Override either with full_response_aggregation:

rumik_ai.TTS(model="muga", full_response_aggregation=False, tone="neutral")  # muga, lower latency
rumik_ai.TTS(model="mulberry", full_response_aggregation=True)               # mulberry, smoother

When you turn aggregation off for muga, set a fallback tone= so every sentence keeps a tone tag.

Barge-in & cancel

Built for the live "call with AI" case. When the caller talks over the agent, LiveKit interrupts the TTS and the plugin sends an explicit cancel to Rumik, so the in-flight generation stops immediately (and billing is finalized cleanly). The pooled WebSocket is kept warm across the interruption, so the next utterance doesn't pay a reconnect.

Configuration

Argument Models Notes
model both "muga" (default) or "mulberry"
tone muga fallback tone when input is untagged
description mulberry natural-language voice description
speaker mulberry speaker_1..speaker_4
f0_up_key mulberry pitch shift, -12..12
temperature, top_p, top_k, repetition_penalty, max_new_tokens both omitted unless set (Rumik defaults apply)
full_response_aggregation both buffer the full reply (True) vs. stream per sentence (False). Default: True for muga, False for mulberry
api_key defaults to RUMIK_API_KEY
base_url defaults to https://silk-api.rumik.ai

Examples

See examples/ for a full voice agent (rumik_ai_agent.py) and a record-to-WAV demo (rumik_ai_tts.py).

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_rumik_ai-0.2.0.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_rumik_ai-0.2.0-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_rumik_ai-0.2.0.tar.gz.

File metadata

File hashes

Hashes for livekit_plugins_rumik_ai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b0b76791a3c56d3667ea27f902b6d58a6afb7d23548780ab99c5a8cdc8e7a933
MD5 0c28b64f534c65e1af52873eaa3a5f33
BLAKE2b-256 a28ad4a862bec127abbe423bf8454373bb5823805fb73240fe9603ed350b8f3d

See more details on using hashes here.

File details

Details for the file livekit_plugins_rumik_ai-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_rumik_ai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 82175c753a802ef4801618d765a04f75651ec739758849f2e72d7f054e4134eb
MD5 910fd94aae68e2fcc5df873b2bbad831
BLAKE2b-256 0ec979b4d10d83870384b16a6fadff473b2d01e3a2c79d3bef2c1fc6101abb24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page