Skip to main content

Agent Framework plugin for voice synthesis and speech-to-text with Inworld's API.

Project description

Inworld plugin for LiveKit Agents

Support for voice synthesis and speech-to-text with Inworld TTS and Inworld STT.

See Inworld TTS and Inworld STT for more information.

Installation

pip install livekit-plugins-inworld

Authentication

Set INWORLD_API_KEY in your .env file (get one here).

Usage

TTS

Use Inworld TTS within an AgentSession or as a standalone speech generator.

from livekit.plugins import inworld

tts = inworld.TTS()

Or with options:

from livekit.plugins import inworld

tts = inworld.TTS(
    voice="Hades",                 # voice ID (default or custom cloned voice)
    model="inworld-tts-1",         # or "inworld-tts-1-max"
    encoding="OGG_OPUS",           # LINEAR16, MP3, OGG_OPUS, ALAW, MULAW, FLAC
    sample_rate=48000,             # 8000-48000 Hz
    bit_rate=64000,                # bits per second (for compressed formats)
    speaking_rate=1.0,             # 0.5-1.5
    temperature=1.1,               # 0-2
    timestamp_type="WORD",         # WORD, CHARACTER, or TIMESTAMP_TYPE_UNSPECIFIED
    text_normalization="OFF",      # ON, OFF, or APPLY_TEXT_NORMALIZATION_UNSPECIFIED
)

TTS Streaming

Inworld TTS supports WebSocket streaming for lower latency real-time synthesis. Use the stream() method for streaming text as it's generated:

from livekit.plugins import inworld

tts = inworld.TTS(
    voice="Hades",
    model="inworld-tts-1",
    buffer_char_threshold=100,     # chars before triggering synthesis (default: 100)
    max_buffer_delay_ms=3000,      # max buffer time in ms (default: 3000)
)

# Create a stream for real-time synthesis
stream = tts.stream()

# Push text incrementally
stream.push_text("Hello, ")
stream.push_text("how are you today?")
stream.flush()  # Flush any remaining buffered text
stream.end_input()  # Signal end of input

# Consume audio as it's generated
async for audio in stream:
    # Process audio frames
    pass

STT

Use Inworld STT for streaming speech-to-text. Multiple models are supported.

from livekit.plugins import inworld

session = AgentSession(
   stt=inworld.STT()
   # ... llm, tts, etc.
)

With a specific model and voice profile detection:

from livekit.plugins import inworld

session = AgentSession(
   stt=inworld.STT(
       model="inworld/inworld-stt-1",
       enable_voice_profile=True,
   )
   # ... llm, tts, etc.
)

Example

A full voice agent using Inworld for both STT and TTS:

"""Inworld STT + TTS voice agent example.

Demonstrates using Inworld for both speech-to-text and text-to-speech
in a LiveKit voice agent. Save this as ``inworld_agent.py`` and run:

    uv run inworld_agent.py console   # local console mode
    uv run inworld_agent.py dev       # LiveKit Cloud (requires LIVEKIT_URL,
                                      # LIVEKIT_API_KEY, LIVEKIT_API_SECRET)

Then connect via https://agents-playground.livekit.io
"""

import logging

from dotenv import load_dotenv

from livekit.agents import (
    Agent,
    AgentServer,
    AgentSession,
    JobContext,
    JobProcess,
    cli,
    metrics,
    room_io,
)
from livekit.plugins import inworld, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

logger = logging.getLogger("inworld-agent")

load_dotenv()


class InworldAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions=(
                "Your name is Nova. You interact with users via voice. "
                "Keep your responses concise and to the point. "
                "Do not use emojis, asterisks, markdown, or other special characters. "
                "You are helpful, curious, and friendly."
            ),
        )

    async def on_enter(self):
        self.session.generate_reply()


server = AgentServer()


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session()
async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {"room": ctx.room.name}

    session = AgentSession(
        stt=inworld.STT(model="inworld/inworld-stt-1"),
        llm="openai/gpt-4.1-mini",
        tts=inworld.TTS(voice="Clive"),
        turn_detection=MultilingualModel(),
        vad=ctx.proc.userdata["vad"],
    )

    usage_collector = metrics.UsageCollector()

    @session.on("metrics_collected")
    def _on_metrics(ev):
        metrics.log_metrics(ev.metrics)
        usage_collector.collect(ev.metrics)

    async def log_usage():
        logger.info(f"Usage: {usage_collector.get_summary()}")

    ctx.add_shutdown_callback(log_usage)

    await session.start(
        agent=InworldAgent(),
        room=ctx.room,
        room_options=room_io.RoomOptions(),
    )


if __name__ == "__main__":
    cli.run_app(server)

Combined TTS + STT

from livekit.plugins import inworld

session = AgentSession(
   tts=inworld.TTS(voice="Hades"),
   stt=inworld.STT(),
   # ... llm, etc.
)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_inworld-1.5.12.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_inworld-1.5.12-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_inworld-1.5.12.tar.gz.

File metadata

  • Download URL: livekit_plugins_inworld-1.5.12.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for livekit_plugins_inworld-1.5.12.tar.gz
Algorithm Hash digest
SHA256 294016306aa7e91f4d192042393ebdf9405bf08516839ee7a113913fdc9ca3d3
MD5 323b91c689bd77afab8db28b1560a6c6
BLAKE2b-256 c677fffb038e9dd345cef3b5d793c87a7d98c5b74dd28d7a977d5f1a1d4eedee

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_inworld-1.5.12.tar.gz:

Publisher: publish.yml on livekit/agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file livekit_plugins_inworld-1.5.12-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_inworld-1.5.12-py3-none-any.whl
Algorithm Hash digest
SHA256 22ccfde170e7d2164e3d5f40b139f3e111647bf05264e150a0335de11843a1ab
MD5 d701e5aaea4832dab621f9e6e7b3124c
BLAKE2b-256 c55b3095744e9a80d084c661b612f1ea37e9f2548828073b3ceb345ee4c3f630

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_inworld-1.5.12-py3-none-any.whl:

Publisher: publish.yml on livekit/agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page