Skip to main content

Agent Framework plugin for voice synthesis and speech-to-text with Inworld's API.

Project description

Inworld plugin for LiveKit Agents

Support for voice synthesis and speech-to-text with Inworld TTS and Inworld STT.

See Inworld TTS and Inworld STT for more information.

Installation

pip install livekit-plugins-inworld

Authentication

Set INWORLD_API_KEY in your .env file (get one here).

Usage

TTS

Use Inworld TTS within an AgentSession or as a standalone speech generator.

from livekit.plugins import inworld

tts = inworld.TTS()

Or with options:

from livekit.plugins import inworld

tts = inworld.TTS(
    voice="Hades",                 # voice ID (default or custom cloned voice)
    model="inworld-tts-1",         # or "inworld-tts-1-max"
    encoding="OGG_OPUS",           # LINEAR16, MP3, OGG_OPUS, ALAW, MULAW, FLAC
    sample_rate=48000,             # 8000-48000 Hz
    bit_rate=64000,                # bits per second (for compressed formats)
    speaking_rate=1.0,             # 0.5-1.5
    temperature=1.1,               # 0-2
    timestamp_type="WORD",         # WORD, CHARACTER, or TIMESTAMP_TYPE_UNSPECIFIED
    text_normalization="OFF",      # ON, OFF, or APPLY_TEXT_NORMALIZATION_UNSPECIFIED
)

TTS Streaming

Inworld TTS supports WebSocket streaming for lower latency real-time synthesis. Use the stream() method for streaming text as it's generated:

from livekit.plugins import inworld

tts = inworld.TTS(
    voice="Hades",
    model="inworld-tts-1",
    buffer_char_threshold=100,     # chars before triggering synthesis (default: 100)
    max_buffer_delay_ms=3000,      # max buffer time in ms (default: 3000)
)

# Create a stream for real-time synthesis
stream = tts.stream()

# Push text incrementally
stream.push_text("Hello, ")
stream.push_text("how are you today?")
stream.flush()  # Flush any remaining buffered text
stream.end_input()  # Signal end of input

# Consume audio as it's generated
async for audio in stream:
    # Process audio frames
    pass

STT

Use Inworld STT for streaming speech-to-text. Multiple models are supported.

from livekit.plugins import inworld

session = AgentSession(
   stt=inworld.STT()
   # ... llm, tts, etc.
)

With a specific model and voice profile detection:

from livekit.plugins import inworld

session = AgentSession(
   stt=inworld.STT(
       model="inworld/inworld-stt-1",
       enable_voice_profile=True,
   )
   # ... llm, tts, etc.
)

Example

A full voice agent using Inworld for both STT and TTS:

"""Inworld STT + TTS voice agent example.

Demonstrates using Inworld for both speech-to-text and text-to-speech
in a LiveKit voice agent. Save this as ``inworld_agent.py`` and run:

    uv run inworld_agent.py console   # local console mode
    uv run inworld_agent.py dev       # LiveKit Cloud (requires LIVEKIT_URL,
                                      # LIVEKIT_API_KEY, LIVEKIT_API_SECRET)

Then connect via https://agents-playground.livekit.io
"""

import logging

from dotenv import load_dotenv

from livekit.agents import (
    Agent,
    AgentServer,
    AgentSession,
    JobContext,
    JobProcess,
    cli,
    metrics,
    room_io,
)
from livekit.plugins import inworld, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

logger = logging.getLogger("inworld-agent")

load_dotenv()


class InworldAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions=(
                "Your name is Nova. You interact with users via voice. "
                "Keep your responses concise and to the point. "
                "Do not use emojis, asterisks, markdown, or other special characters. "
                "You are helpful, curious, and friendly."
            ),
        )

    async def on_enter(self):
        self.session.generate_reply()


server = AgentServer()


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session()
async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {"room": ctx.room.name}

    session = AgentSession(
        stt=inworld.STT(model="inworld/inworld-stt-1"),
        llm="openai/gpt-4.1-mini",
        tts=inworld.TTS(voice="Clive"),
        turn_detection=MultilingualModel(),
        vad=ctx.proc.userdata["vad"],
    )

    usage_collector = metrics.UsageCollector()

    @session.on("metrics_collected")
    def _on_metrics(ev):
        metrics.log_metrics(ev.metrics)
        usage_collector.collect(ev.metrics)

    async def log_usage():
        logger.info(f"Usage: {usage_collector.get_summary()}")

    ctx.add_shutdown_callback(log_usage)

    await session.start(
        agent=InworldAgent(),
        room=ctx.room,
        room_options=room_io.RoomOptions(),
    )


if __name__ == "__main__":
    cli.run_app(server)

Combined TTS + STT

from livekit.plugins import inworld

session = AgentSession(
   tts=inworld.TTS(voice="Hades"),
   stt=inworld.STT(),
   # ... llm, etc.
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_inworld-1.5.9.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_inworld-1.5.9-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_inworld-1.5.9.tar.gz.

File metadata

  • Download URL: livekit_plugins_inworld-1.5.9.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for livekit_plugins_inworld-1.5.9.tar.gz
Algorithm Hash digest
SHA256 4773629e055b889ca6af117f6e9f3e9a7be236832b3489b5e5fbc49c98c9cc97
MD5 f06f32d82bdbf86e5d632b2b8641a4be
BLAKE2b-256 c21f86e171953c634eb5bcfc22506d97aa002197cd7b0ba4752560ce81f9e179

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_inworld-1.5.9.tar.gz:

Publisher: publish.yml on livekit/agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file livekit_plugins_inworld-1.5.9-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_inworld-1.5.9-py3-none-any.whl
Algorithm Hash digest
SHA256 a2baddb7bc5fca69103778c454499397ffc2e5141a6260492936196ad869c593
MD5 2543aa79f9a8c95fbe80d52b9a8e14d4
BLAKE2b-256 4afdf0767e45ee3c6aded9e98e77929d3a8d5aa66e521758c8032538b7459771

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_inworld-1.5.9-py3-none-any.whl:

Publisher: publish.yml on livekit/agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page