Skip to main content

LiveKit Agents plugin for text-to-speech with Rumik AI (muga & mulberry).

Project description

livekit-plugins-rumik-ai

Rumik AI text-to-speech plugin for LiveKit Agents.

Streams low-latency 24 kHz speech from Rumik's Silk models over a reusable WebSocket session:

  • muga — emotion-controlled via a leading [tone] tag (e.g. [happy], [sad]) plus optional <laugh>/<chuckle>/<sigh> events. Tuned for Romanized Hinglish.
  • mulberry — steered by a natural-language voice description or a preset speaker, with optional pitch shift (f0_up_key).

Install

pip install livekit-plugins-rumik-ai

This depends on livekit-agents (1.5+). Set your key:

export RUMIK_API_KEY="your-rumik-api-key"

Quickstart

from livekit.agents import AgentSession
from livekit.plugins import rumik_ai

# muga: the LLM should start each reply with one tone tag, e.g. "[happy] ..."
session = AgentSession(
    stt=...,
    llm=...,
    tts=rumik_ai.TTS(model="muga"),
)

Mulberry, steered by a voice description (or a preset speaker):

tts = rumik_ai.TTS(
    model="mulberry",
    description="warm, gentle female friend",
    # speaker="speaker_1",      # optional preset, overrides description
    # f0_up_key=2.0,            # optional pitch shift, -12..12 semitones
)

Changing the voice at runtime

description, speaker, f0_up_key, and the sampling params are sent on every request, so you can change mulberry's voice between turns without reconnecting — the pooled WebSocket is reused (only a model change re-mints the session):

tts.update_options(description="excited young man, fast and energetic")
# the next synthesis request uses the new voice

Latency vs. smoothness

The default is model-aware:

  • muga buffers the full LLM reply and synthesizes it in one request, so its leading [tone] tag conditions the whole utterance (and there are no per-request TTFB gaps).
  • mulberry streams sentence-by-sentence for lower time-to-first-word, since it has no tone tag to protect.

Override either with full_response_aggregation:

rumik_ai.TTS(model="muga", full_response_aggregation=False, tone="neutral")  # muga, lower latency
rumik_ai.TTS(model="mulberry", full_response_aggregation=True)               # mulberry, smoother

When you turn aggregation off for muga, set a fallback tone= so every sentence keeps a tone tag.

Configuration

Argument Models Notes
model both "muga" (default) or "mulberry"
tone muga fallback tone when input is untagged
description mulberry natural-language voice description
speaker mulberry speaker_1..speaker_4
f0_up_key mulberry pitch shift, -12..12
temperature, top_p, top_k, repetition_penalty, max_new_tokens both omitted unless set (Rumik defaults apply)
full_response_aggregation both buffer the full reply (True) vs. stream per sentence (False). Default: True for muga, False for mulberry
api_key defaults to RUMIK_API_KEY
base_url defaults to https://silk-api.rumik.ai

Examples

See examples/ for a full voice agent (rumik_ai_agent.py) and a record-to-WAV demo (rumik_ai_tts.py).

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_rumik_ai-0.1.1.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_rumik_ai-0.1.1-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_rumik_ai-0.1.1.tar.gz.

File metadata

File hashes

Hashes for livekit_plugins_rumik_ai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 020bd5a7092272c1c22cb6d85e2391f2ca06db99ded3133b811bf6e0617f7f4a
MD5 1b74cc740154b3d21652ddb3820e1e95
BLAKE2b-256 bdaa94c581b146ce1e198c219c5ffc2b18eb5ef4d84b1966d226ab3450bc44c2

See more details on using hashes here.

File details

Details for the file livekit_plugins_rumik_ai-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_rumik_ai-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2946132489ad04b608ef99a9d755bd04d6bc797da02440dbc43fdf02d42a8651
MD5 684a555fcecf71dd9b3805e4ddaefcf8
BLAKE2b-256 2f2a96aa397d91c83fd4f0493a7606bfd0553d8898d8242c7bfacd9e5884789f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page