Skip to main content

Run multiple LiveKit voice agents in a single shared worker process.

Project description


openrtc-python

Run N LiveKit voice agents in one worker. Pay the model-load cost once.

PyPI package name: openrtc.


License: MIT Python Version Ruff PyPI version codecov CI

Table of Contents
  1. The problem
  2. What openrtc does
  3. Installation
  4. Quick start: explicit registration with add()
  5. Quick start: one Python file per agent with discover()
  6. Memory: before and after
  7. Routing
  8. Greetings and session options
  9. Provider configuration
  10. CLI and TUI
  11. Public API at a glance
  12. Project structure
  13. Contributing
  14. License

The problem

You already ship three voice agents with livekit-agents. Each agent is its own worker on the same VPS. Every worker process loads the same shared stack: Python runtime, Silero VAD, and the turn-detection model. You are not loading three different models. You are loading the same stack three times because the process boundary forces it. On a 1–2 GB instance, that shows up as duplicate resident set for every idle worker. You pay RAM for copies you do not need.

What openrtc does

openrtc gives you one AgentPool in one worker: prewarm runs once, each incoming call still gets its own AgentSession, and you register multiple Agent subclasses on the pool so dispatch can pick one per session from metadata or fallbacks. This package does not replace your agent code. It does not sit between you and livekit.agents.Agent, @function_tool, RunContext, on_enter, on_exit, llm_node, stt_node, or tts_node. You keep your subclasses and tools as they are. You change how many workers you run, not how you write an agent.

Installation

OpenRTC requires Python 3.11 or newer. The LiveKit Silero / turn-detector plugins depend on onnxruntime, which does not ship supported wheels for Python 3.10 in current releases—use 3.11+ to avoid install failures.

pip install openrtc

The base install pulls in livekit-agents[openai,silero,turn-detector] so shared prewarm has the plugins it expects. The package ships a PEP 561 py.typed marker for downstream type checkers.

With uv (recommended in CONTRIBUTING.md):

uv add openrtc
uv add "openrtc[cli,tui]"
pip install 'openrtc[cli]'

Optional Textual sidecar for live metrics:

pip install 'openrtc[cli,tui]'

Set the same variables you use for any LiveKit worker:

export LIVEKIT_URL=ws://localhost:7880
export LIVEKIT_API_KEY=devkey
export LIVEKIT_API_SECRET=secret

For OpenAI-backed plugins, set OPENAI_API_KEY as you already do.

Quick start: explicit registration with add()

Use this when you want every agent registered in one place with explicit names and providers.

from livekit.agents import Agent
from livekit.plugins import openai
from openrtc import AgentPool


class RestaurantAgent(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You help callers make restaurant bookings.")


class DentalAgent(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You help callers manage dental appointments.")


pool = AgentPool()
pool.add(
    "restaurant",
    RestaurantAgent,
    stt=openai.STT(model="gpt-4o-mini-transcribe"),
    llm=openai.responses.LLM(model="gpt-4.1-mini"),
    tts=openai.TTS(model="gpt-4o-mini-tts"),
    greeting="Welcome to reservations.",
)
pool.add(
    "dental",
    DentalAgent,
    stt=openai.STT(model="gpt-4o-mini-transcribe"),
    llm=openai.responses.LLM(model="gpt-4.1-mini"),
    tts=openai.TTS(model="gpt-4o-mini-tts"),
)
pool.run()

Quick start: one Python file per agent with discover()

Use this when you prefer one module per agent and optional @agent_config(...) on each class.

Create a directory (for example agents/) and add one .py file per agent. Then:

from pathlib import Path

from livekit.plugins import openai
from openrtc import AgentPool

pool = AgentPool(
    default_stt=openai.STT(model="gpt-4o-mini-transcribe"),
    default_llm=openai.responses.LLM(model="gpt-4.1-mini"),
    default_tts=openai.TTS(model="gpt-4o-mini-tts"),
)
pool.discover(Path("./agents"))
pool.run()

Example file agents/restaurant.py:

from livekit.agents import Agent
from openrtc import agent_config


@agent_config(name="restaurant", greeting="Welcome to reservations.")
class RestaurantAgent(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You help callers make restaurant bookings.")

If a module has no @agent_config, the agent name defaults to the filename stem. STT, LLM, TTS, and greeting fall back to the pool defaults.

Discovered agents work with livekit dev and spawn-based workers on macOS. For add(), define agent classes at module scope so worker reload can import them.

Memory: before and after

Assume an illustrative ~400 MB idle baseline per worker for the shared stack (VAD, turn detector, and similar). Your measured RSS will differ by provider, model, and OS.

Before openrtc After openrtc
Three workers, same stack about 3 × 400 MB ≈ 1.2 GB idle baseline (three loads)
One worker, three registered agents about one × 400 MB idle baseline (one load) plus per-session overhead

Exact numbers depend on your providers, concurrency, and call patterns. The win is not loading that stack once per agent worker.

Isolation modes

AgentPool accepts an isolation argument that picks how each session runs inside the worker. The v0.1 default is "coroutine"; pass isolation="process" to opt back into the v0.0.x behavior:

pool = AgentPool(
    isolation="coroutine",          # default in v0.1
    max_concurrent_sessions=50,     # backpressure threshold (coroutine only)
)
Aspect coroutine (default) process
Sessions per worker Many (one asyncio.Task per session, shared JobProcess) One (each session is its own subprocess via livekit-agents ProcPool)
Prewarm cost (VAD, turn detector) Paid once per worker Paid once per session subprocess
Crash isolation Cooperative: an unhandled exception in one session is logged and marked FAILED; siblings continue. After consecutive_failure_limit (default 5) the worker calls aclose() so the platform restarts it. Hard: each subprocess crashes independently; siblings unaffected.
Per-session memory cap Not enforced (asyncio shares one process) Enforced via livekit-agents job_memory_limit_mb
Backpressure current_load() = active / max_concurrent_sessions reported as worker load; LiveKit dispatch routes elsewhere at >= load_threshold livekit-agents default load math (CPU-based)
When to pick High density on a single host; cost-sensitive deployments. Regulatory/compliance requires hard process isolation; per-session memory caps required.

Density: coroutine vs process at 10/25/50/100 sessions

From the v0.1 stub-workload benchmark (tests/benchmarks/density.py, results recorded at docs/benchmarks/density-v0.1.md):

Sessions coroutine peak RSS process est. peak RSS Within 4 GB budget
10 165 MB ~30 GB coroutine ✓ / process ✗
25 230 MB ~75 GB coroutine ✓ / process ✗
50 367 MB ~150 GB coroutine ✓ / process ✗
100 617 MB ~300 GB coroutine ✓ / process ✗

The same harness scales cleanly to 200 sessions (1073 MB) and 500 sessions (1370 MB) without breaching the 4 GB budget — see the full results doc for the headroom sweep.

How the process column was estimated. Each livekit-agents subprocess loads the Silero VAD + turn-detector models per worker process (~250-400 MB of model weights) plus the WebRTC peer connection and Python runtime, settling at the ~3 GB per process baseline documented in docs/audit-2026-05-02.md. Multiply by N for the total. We do not run process mode at N>2 in CI for this reason.

Stub-workload caveat. The benchmark allocates ~5 MB per session to stress task scheduling, not a realistic ~60 MB/session WebRTC + LLM footprint. Validate against the §8.4 real-LiveKit integration test (which needs docker compose -f docker-compose.test.yml up -d and OPENAI_API_KEY) before quoting a per-session memory number to your operators.

Routing

One process hosts several agent classes, so each session must resolve to a single registered name. AgentPool resolves the agent in this order:

  1. ctx.job.metadata["agent"]
  2. ctx.job.metadata["demo"]
  3. ctx.room.metadata["agent"]
  4. ctx.room.metadata["demo"]
  5. room name prefix match, such as restaurant-call-123
  6. the first registered agent

If metadata names an agent that is not registered, you get a ValueError instead of a silent fallback.

Greetings and session options

You can pass a greeting and extra AgentSession options per registration.

pool.add(
    "restaurant",
    RestaurantAgent,
    greeting="Welcome to reservations.",
    session_kwargs={"turn_handling": {"interruption": {"enabled": False}}},
    max_tool_steps=4,
    preemptive_generation=True,
)

Direct keyword arguments win over the same keys inside session_kwargs.

By default, OpenRTC sets explicit turn_handling with the multilingual turn detector and VAD-based interruption. To opt into adaptive interruption, pass session_kwargs={"turn_handling": {"interruption": {"mode": "adaptive"}}}.

Provider configuration

Pass instantiated provider objects through to livekit-agents unchanged, for example:

  • openai.STT(model="gpt-4o-mini-transcribe")
  • openai.responses.LLM(model="gpt-4.1-mini")
  • openai.TTS(model="gpt-4o-mini-tts")

If you pass strings such as openai/gpt-4.1-mini, OpenRTC leaves them as-is and the LiveKit runtime interprets them for your deployment.

CLI and TUI

Install openrtc[cli] to get openrtc on your PATH. Subcommands follow the LiveKit Agents CLI shape (dev, start, console, connect, download-files), plus list and tui. For most commands you can pass the agents directory (or, for tui, the metrics JSONL file) as the first path argument instead of --agents-dir / --watch.

List what discovery would register (defaults are string passthroughs for livekit-agents, not constructed provider objects):

openrtc list \
  ./agents \
  --default-stt openai/gpt-4o-mini-transcribe \
  --default-llm openai/gpt-4.1-mini \
  --default-tts openai/gpt-4o-mini-tts

Run a production worker (after exporting LIVEKIT_*):

openrtc start ./agents

Run a development worker:

openrtc dev ./agents

Same as openrtc dev --agents-dir ./agents. The metrics JSONL file is optional: add a second path only when you want JSONL output (same as --metrics-jsonl), e.g. openrtc dev ./agents ./openrtc-metrics.jsonl for openrtc tui.

Optional visibility: --dashboard prints a Rich summary in the terminal. --metrics-json-file ./runtime.json overwrites a JSON snapshot on each tick. Use that for scripts, dashboards, or CI. For JSON Lines plus a separate terminal UI, use --metrics-jsonl ./openrtc-metrics.jsonl on the worker and openrtc tui in another terminal (it tails ./openrtc-metrics.jsonl by default; override with --watch) after pip install 'openrtc[cli,tui]'.

Stable machine output: openrtc list --json and --plain. Combine --resources when you want footprint hints. OpenRTC-only flags are stripped before the handoff to LiveKit’s CLI parser.

Full flag lists live in docs/cli.md.

Public API at a glance

Everything openrtc exposes publicly is listed here. Anything else is internal and not treated as stable.

  • AgentPool
  • AgentConfig
  • AgentDiscoveryConfig
  • agent_config(...)
  • ProviderValue — type alias for STT/LLM/TTS slot values (provider ID strings or LiveKit plugin instances)

AgentPool(...) constructor (all keyword-only, all optional):

  • default_stt, default_llm, default_tts, default_greeting — pool-wide defaults applied when add() / discover() doesn't override them.
  • isolation: "coroutine" | "process" (v0.1) — worker isolation mode. Default "coroutine" runs every session as an asyncio.Task in one worker; "process" keeps the v0.0.x one-subprocess-per-session behavior.
  • max_concurrent_sessions: int (v0.1) — coroutine-mode backpressure threshold. Default 50. The worker reports load >= 1.0 to LiveKit dispatch once this many sessions are in flight; ignored under isolation="process".
  • consecutive_failure_limit: int (v0.1) — coroutine-mode supervisor threshold. Default 5. After this many non-SUCCESS session terminations the worker calls aclose() so the deployment platform can restart it; ignored under isolation="process".

On AgentPool:

  • add(...)
  • discover(...)
  • list_agents()
  • get(name)
  • remove(name)
  • run()
  • runtime_snapshot()
  • drain_metrics_stream_events() — for JSONL export paths (mainly CLI; rare in app code)
  • server
  • isolation (read-only property, v0.1)
  • max_concurrent_sessions (read-only property, v0.1)
  • consecutive_failure_limit (read-only property, v0.1)

Project structure

src/openrtc/
├── __init__.py
├── py.typed
├── types.py               # ProviderValue and related typing
├── tui/
│   ├── __init__.py
│   └── app.py             # optional Textual sidecar
├── cli/
│   ├── __init__.py        # re-exports `main` and `app`
│   ├── entry.py           # lazy console entry / missing-extra hint
│   ├── commands.py        # Typer commands and programmatic main()
│   ├── types.py           # shared CLI option aliases
│   ├── dashboard.py       # Rich dashboard and list output
│   ├── reporter.py        # background metrics reporter thread
│   ├── livekit.py         # LiveKit argv/env handoff, pool run
│   └── params.py          # shared worker handoff option bundles
├── core/
│   └── pool.py            # AgentPool, discovery, routing
└── observability/
    ├── metrics.py         # RuntimeMetricsStore, footprint helpers
    ├── snapshot.py        # PoolRuntimeSnapshot dataclass
    └── stream.py          # JSONL metrics schema
  • core/pool.py: AgentPool, discovery, routing
  • cli/: Typer/Rich CLI (openrtc[cli])
  • observability/stream.py: JSONL metrics schema
  • tui/app.py: optional Textual sidecar (openrtc[tui])

Contributing

See CONTRIBUTING.md. CI runs Ruff and mypy on pull requests alongside the test suite.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openrtc-0.2.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openrtc-0.2.1-py3-none-any.whl (67.4 kB view details)

Uploaded Python 3

File details

Details for the file openrtc-0.2.1.tar.gz.

File metadata

  • Download URL: openrtc-0.2.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openrtc-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3e363017d0406cdcfdd750658fc0f7333f295ee27294bff8db8aeeb09156a4b2
MD5 780b2d9eb88ac9da93e75075c776a25d
BLAKE2b-256 8444bfd49ceb1a8eed09ef6d534e09baf5fe2f445ac48696c8887c5e183b85ec

See more details on using hashes here.

File details

Details for the file openrtc-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: openrtc-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 67.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openrtc-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dcb6eecdfac4d10ca0e027aee26765883e53e6787c37f1767bbdc17dcc3126d4
MD5 1b1f52d1c699770e2001d6927a3928aa
BLAKE2b-256 b49b8c6e4aaa87ad6adef3b6b4c430f5680c19d5244a7610d240576f99128ff4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page