Skip to main content

Agent Framework plugin for voice synthesis and speech-to-text with Inworld's API.

Project description

Inworld plugin for LiveKit Agents

Support for voice synthesis and speech-to-text with Inworld TTS and Inworld STT.

See Inworld TTS and Inworld STT for more information.

Installation

pip install livekit-plugins-inworld

Authentication

Set INWORLD_API_KEY in your .env file (get one here).

Usage

TTS

Use Inworld TTS within an AgentSession or as a standalone speech generator.

from livekit.plugins import inworld

tts = inworld.TTS()

Or with options:

from livekit.plugins import inworld

tts = inworld.TTS(
    voice="Hades",                 # voice ID (default or custom cloned voice)
    model="inworld-tts-1",         # or "inworld-tts-1-max"
    encoding="OGG_OPUS",           # LINEAR16, MP3, OGG_OPUS, ALAW, MULAW, FLAC
    sample_rate=48000,             # 8000-48000 Hz
    bit_rate=64000,                # bits per second (for compressed formats)
    speaking_rate=1.0,             # 0.5-1.5
    temperature=1.1,               # 0-2
    timestamp_type="WORD",         # WORD, CHARACTER, or TIMESTAMP_TYPE_UNSPECIFIED
    text_normalization="OFF",      # ON, OFF, or APPLY_TEXT_NORMALIZATION_UNSPECIFIED
)

TTS Streaming

Inworld TTS supports WebSocket streaming for lower latency real-time synthesis. Use the stream() method for streaming text as it's generated:

from livekit.plugins import inworld

tts = inworld.TTS(
    voice="Hades",
    model="inworld-tts-1",
    buffer_char_threshold=100,     # chars before triggering synthesis (default: 100)
    max_buffer_delay_ms=3000,      # max buffer time in ms (default: 3000)
)

# Create a stream for real-time synthesis
stream = tts.stream()

# Push text incrementally
stream.push_text("Hello, ")
stream.push_text("how are you today?")
stream.flush()  # Flush any remaining buffered text
stream.end_input()  # Signal end of input

# Consume audio as it's generated
async for audio in stream:
    # Process audio frames
    pass

STT

Use Inworld STT for streaming speech-to-text. Multiple models are supported.

from livekit.plugins import inworld

session = AgentSession(
   stt=inworld.STT()
   # ... llm, tts, etc.
)

With a specific model and voice profile detection:

from livekit.plugins import inworld

session = AgentSession(
   stt=inworld.STT(
       model="inworld/inworld-stt-1",
       enable_voice_profile=True,
   )
   # ... llm, tts, etc.
)

Example

A full voice agent using Inworld for both STT and TTS:

"""Inworld STT + TTS voice agent example.

Demonstrates using Inworld for both speech-to-text and text-to-speech
in a LiveKit voice agent. Save this as ``inworld_agent.py`` and run:

    uv run inworld_agent.py console   # local console mode
    uv run inworld_agent.py dev       # LiveKit Cloud (requires LIVEKIT_URL,
                                      # LIVEKIT_API_KEY, LIVEKIT_API_SECRET)

Then connect via https://agents-playground.livekit.io
"""

import logging

from dotenv import load_dotenv

from livekit.agents import (
    Agent,
    AgentServer,
    AgentSession,
    JobContext,
    JobProcess,
    cli,
    metrics,
    room_io,
)
from livekit.plugins import inworld, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

logger = logging.getLogger("inworld-agent")

load_dotenv()


class InworldAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions=(
                "Your name is Nova. You interact with users via voice. "
                "Keep your responses concise and to the point. "
                "Do not use emojis, asterisks, markdown, or other special characters. "
                "You are helpful, curious, and friendly."
            ),
        )

    async def on_enter(self):
        self.session.generate_reply()


server = AgentServer()


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session()
async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {"room": ctx.room.name}

    session = AgentSession(
        stt=inworld.STT(model="inworld/inworld-stt-1"),
        llm="openai/gpt-4.1-mini",
        tts=inworld.TTS(voice="Clive"),
        turn_detection=MultilingualModel(),
        vad=ctx.proc.userdata["vad"],
    )

    usage_collector = metrics.UsageCollector()

    @session.on("metrics_collected")
    def _on_metrics(ev):
        metrics.log_metrics(ev.metrics)
        usage_collector.collect(ev.metrics)

    async def log_usage():
        logger.info(f"Usage: {usage_collector.get_summary()}")

    ctx.add_shutdown_callback(log_usage)

    await session.start(
        agent=InworldAgent(),
        room=ctx.room,
        room_options=room_io.RoomOptions(),
    )


if __name__ == "__main__":
    cli.run_app(server)

Combined TTS + STT

from livekit.plugins import inworld

session = AgentSession(
   tts=inworld.TTS(voice="Hades"),
   stt=inworld.STT(),
   # ... llm, etc.
)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livekit_plugins_inworld-1.5.17.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

livekit_plugins_inworld-1.5.17-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file livekit_plugins_inworld-1.5.17.tar.gz.

File metadata

  • Download URL: livekit_plugins_inworld-1.5.17.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for livekit_plugins_inworld-1.5.17.tar.gz
Algorithm Hash digest
SHA256 7295ef859e522d45b026853cbb4a44ecb356565886a1cb7e895d570ef1c0650b
MD5 683c4e7cabaefbe8edf080da28ad274f
BLAKE2b-256 690526fba5ed5353a40331ddefc010d6ce500be2811db374e102f8c8cf9c3822

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_inworld-1.5.17.tar.gz:

Publisher: publish.yml on livekit/agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file livekit_plugins_inworld-1.5.17-py3-none-any.whl.

File metadata

File hashes

Hashes for livekit_plugins_inworld-1.5.17-py3-none-any.whl
Algorithm Hash digest
SHA256 fd3d4df150bd64c0e34b7b3c72542826d07a200bc6ee4cd299f931c4f8b73f1e
MD5 daa45d0eb7a454c5b08253973293006b
BLAKE2b-256 c92d8e3f8fc8e24eb4e5316ce3b46f25ca2b336cae93119760dad16c27023eb6

See more details on using hashes here.

Provenance

The following attestation bundles were made for livekit_plugins_inworld-1.5.17-py3-none-any.whl:

Publisher: publish.yml on livekit/agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page