Pure async Python library for multi-channel conversations

These details have not been verified by PyPI

Project links

Project description

RoomKit

Pure async Python 3.12+ framework for multi-channel conversation orchestration.

RoomKit gives you one abstraction — the room — to wire together any combination of SMS, WhatsApp, Email, Teams, Telegram, Voice, Video, WebSocket, and AI channels. Messages flow in, pass through a hook pipeline, get routed to the right agent, and broadcast out to every attached channel. You focus on the conversation logic; the framework handles routing, transcoding, audio processing, video processing, and agent handoffs.

Website: roomkit.live | Docs: roomkit.live/docs | API Reference: roomkit.live/docs/api

How it works

RoomKit architecture

Every channel implements the same interface: handle_inbound() converts a provider message into a RoomEvent, and deliver() pushes events out. Channels have two categories: transport (delivers to external systems) and intelligence (generates content, like AI agents).

Message flow

Message flow through a room

An inbound message is normalized into a RoomEvent, passes through the hook pipeline (where it can be blocked, modified, or enriched), gets stored, then fans out to every attached channel. AI agents generate responses that re-enter the same pipeline.

Quickstart

pip install roomkit

Example: AI chatbot in 20 lines

import asyncio
from roomkit import (
    ChannelCategory, InboundMessage, RoomKit,
    TextContent, WebSocketChannel,
)
from roomkit.channels.ai import AIChannel
from roomkit.providers.anthropic import AnthropicAIProvider, AnthropicConfig

async def main():
    kit = RoomKit()

    # One channel for the user, one for AI
    ws = WebSocketChannel("ws-user")
    ai = AIChannel("assistant", provider=AnthropicAIProvider(
        AnthropicConfig(api_key="sk-...")
    ), system_prompt="You are a helpful assistant.")

    kit.register_channel(ws)
    kit.register_channel(ai)

    # Create a room and wire everything together
    await kit.create_room(room_id="chat")
    await kit.attach_channel("chat", "ws-user")
    await kit.attach_channel("chat", "assistant", category=ChannelCategory.INTELLIGENCE)

    # Process a message — AI responds automatically
    await kit.process_inbound(InboundMessage(
        channel_id="ws-user", sender_id="user-1",
        content=TextContent(body="What is RoomKit?"),
    ))

asyncio.run(main())

That's it. The message flows through the hook pipeline, gets routed to the AI channel, and the response is broadcast back to the WebSocket.

Example: Multi-channel bridge

The same room can bridge any mix of channels — a user on SMS, another on WhatsApp, and an AI assistant all sharing one conversation:

kit = RoomKit()

sms = SMSChannel("sms", provider=TwilioSMSProvider(...))
wa = WhatsAppChannel("whatsapp", provider=...)
ai = AIChannel("assistant", provider=...)

for ch in [sms, wa, ai]:
    kit.register_channel(ch)

await kit.create_room(room_id="support-case-42")
await kit.attach_channel("support-case-42", "sms")
await kit.attach_channel("support-case-42", "whatsapp")
await kit.attach_channel("support-case-42", "assistant", category=ChannelCategory.INTELLIGENCE)

# Message from SMS → broadcast to WhatsApp + AI
# AI reply → broadcast to SMS + WhatsApp

Content is automatically transcoded between channel capabilities (rich → text fallback, media handling, etc.).

More examples in examples/.

Installation

RoomKit's core has a single dependency (pydantic). Everything else is optional:

pip install roomkit                    # core only
pip install roomkit[anthropic]         # + Anthropic Claude
pip install roomkit[openai]            # + OpenAI GPT
pip install roomkit[gemini]            # + Google Gemini

# Voice & video
pip install roomkit[fastrtc]           # WebRTC audio
pip install roomkit[sip]               # SIP voice + video
pip install roomkit[deepgram]          # Deepgram STT
pip install roomkit[elevenlabs]        # ElevenLabs TTS
pip install roomkit[sherpa-onnx]       # Local STT/TTS/VAD/Denoiser (ONNX)
pip install roomkit[realtime-gemini]   # Gemini Live (speech-to-speech)
pip install roomkit[realtime-openai]   # OpenAI Realtime (speech-to-speech)

# Messaging
pip install roomkit[httpx]             # SMS, RCS, Email providers
pip install roomkit[teams]             # Microsoft Teams
pip install roomkit[telegram]          # Telegram
pip install roomkit[neonize]           # WhatsApp Personal

# Infrastructure
pip install roomkit[postgres]          # PostgreSQL storage
pip install roomkit[opentelemetry]     # Distributed tracing
pip install roomkit[mcp]               # Model Context Protocol tools

# Everything
pip install roomkit[all]

For development:

git clone https://github.com/roomkit-live/roomkit.git
cd roomkit
uv sync --extra dev
make all                               # ruff check + mypy --strict + pytest

Requires Python 3.12+.

Multi-Agent Orchestration

Multi-agent orchestration

RoomKit has four built-in orchestration strategies, all configured through RoomKit(orchestration=...). The framework handles agent registration, routing, handoff tools, and conversation state — you just define agents and pick a strategy.

Agents

Agent extends AIChannel with identity metadata (role, scope, voice, greeting) that gets auto-injected into the system prompt:

from roomkit import Agent
from roomkit.providers.anthropic import AnthropicAIProvider, AnthropicConfig
from roomkit.orchestration.handoff import HandoffMemoryProvider
from roomkit.memory.sliding_window import SlidingWindowMemory

triage = Agent(
    "agent-triage",
    provider=AnthropicAIProvider(AnthropicConfig(api_key="sk-...")),
    role="Triage receptionist",
    description="Routes callers to the right specialist",
    system_prompt="You triage incoming requests.",
    voice="Zephyr",              # TTS voice ID
    language="French",
    greeting="Greet the caller warmly and ask how you can help.",
    memory=HandoffMemoryProvider(SlidingWindowMemory(max_events=20)),
)

Pipeline — linear handoff chain

Agents hand off to the next in a fixed sequence. Each agent gets a handoff_conversation tool and can only move forward:

from roomkit import Agent, Pipeline, RoomKit

kit = RoomKit(
    orchestration=Pipeline(agents=[triage, handler, resolver]),
)

Swarm — any-to-any handoff

Every agent can hand off to any other. The AI decides when a topic change requires a different specialist:

from roomkit import Agent, Swarm, RoomKit

kit = RoomKit(
    orchestration=Swarm(
        agents=[sales, support, billing],
        entry="agent-sales",
    ),
)

Loop — iterative refinement

A producer agent generates content, one or more reviewers evaluate it (sequentially or in parallel), and the cycle repeats until all approve or max iterations are reached:

from roomkit import Agent, Loop, RoomKit

kit = RoomKit(
    orchestration=Loop(
        agent=writer,
        reviewers=[quality, accuracy, style],
        strategy="parallel",
        max_iterations=3,
    ),
)

Supervisor — delegating to workers

A supervisor agent talks to the user and delegates tasks to workers that run in isolated child rooms:

from roomkit import Agent, Supervisor, RoomKit

kit = RoomKit(
    orchestration=Supervisor(
        supervisor=manager,
        workers=[researcher, coder],
    ),
)

Voice orchestration

All orchestration strategies work seamlessly on live voice calls. The voice/realtime channel is a transport — swapping the active agent doesn't touch the audio session:

Voice orchestration flow

For speech-to-speech mode (Gemini Live, OpenAI Realtime), the realtime session is reconfigured on handoff — system prompt, voice, and tools change with ~200-500ms latency while the audio stream stays connected.

Audio Pipeline

Voice audio pipeline

All stages are optional. AEC and AGC are automatically skipped when the backend declares native support.

Stage	Role	Implementations
VAD	Voice activity detection	SherpaOnnx, Energy-based
Denoiser	Noise reduction	RNNoise, SherpaOnnx
AEC	Acoustic echo cancellation	Speex
STT	Speech-to-text	Deepgram, SherpaOnnx, Qwen, Gradium
TTS	Text-to-speech	ElevenLabs, SherpaOnnx, Qwen, Gradium, Grok
Diarization	Speaker identification	Pluggable
DTMF	Tone detection (parallel)	Pluggable

Interruption strategies control how user speech during TTS playback is handled: IMMEDIATE, CONFIRMED (wait for sustained speech), SEMANTIC (backchannel detection ignores "uh-huh"), or DISABLED.

voice = VoiceChannel(
    "voice", stt=stt, tts=tts, backend=backend,
    pipeline=AudioPipelineConfig(vad=vad, denoiser=denoiser, aec=aec),
    interruption=InterruptionConfig(
        strategy=InterruptionStrategy.CONFIRMED, min_speech_ms=300
    ),
)

Hooks

Hooks intercept events at specific points in the pipeline. They can block, modify, or observe events:

@kit.hook(HookTrigger.BEFORE_BROADCAST, name="compliance_check")
async def check(event: RoomEvent, ctx: RoomContext) -> HookResult:
    if contains_pii(event.content):
        return HookResult.block("PII detected")
    return HookResult.allow()

35 hook triggers across the full lifecycle: event pipeline (BEFORE_BROADCAST, AFTER_BROADCAST), room lifecycle, channel lifecycle, identity resolution, voice events (speech start/end, transcription, barge-in, VAD, DTMF, speaker change), TTS events, tool execution, orchestration (phase transitions, handoffs), and side effects (delivery status, errors, protocol traces).

Hooks support filtering by channel type, channel ID, and direction.

Channels & Providers

Channel	Media	Provider examples
SMS / RCS	text, MMS, rich cards	Twilio, Telnyx, Sinch
Email	text, rich, media	ElasticEmail, SendGrid
WhatsApp	text, media, location, templates	Cloud API, Neonize (Personal)
Messenger	text, rich, templates	Facebook Messenger
Teams	text, rich	Bot Framework
Telegram	text, rich, media	Telegram Bot API
WebSocket	text, rich, media	Built-in
HTTP	text, rich	Generic webhook
Voice	audio ↔ text	STT/TTS pipeline
Realtime Voice	audio (S2S)	Gemini Live, OpenAI Realtime
Video	video	SIP/RTP, Webcam, Screen capture
Audio+Video	audio + video	SIP A/V (VP9/H.264)
AI / Agent	text, rich	Claude, GPT, Gemini, Mistral, vLLM

Every AI and transport provider has a mock counterpart for testing without credentials.

Video Pipeline

Video processing pipeline

Like the audio pipeline, the video subsystem processes frames through pluggable stages:

Inbound:   Backend → [Decoder] → [Resizer] → [Transforms...] → [Filters...] → Vision / Taps

All stages are optional — configure only what you need.

Stage	Role	Implementations
Decoder	Encoded → raw pixels	PyAV (H.264, VP9, VP8)
Resizer	Scale to target dimensions	PyAV
Transforms	Modify pixel data	Grayscale, blur, effects (OpenCV)
Filters	Inspect or replace frames	YOLO object detection, Censor, Watermark
Vision	Periodic frame analysis → AI context	OpenAI, Gemini

from roomkit import VideoChannel
from roomkit.video.pipeline import VideoPipelineConfig
from roomkit.video.pipeline.filter.yolo import YOLODetectorFilter
from roomkit.video.pipeline.filter.watermark import WatermarkFilter

video = VideoChannel(
    "video",
    backend=backend,
    pipeline=VideoPipelineConfig(
        filters=[
            YOLODetectorFilter(model="yolo11n.pt", confidence=0.5),
            WatermarkFilter(text="CONFIDENTIAL", position="bottom-right"),
        ],
        vision=gemini_vision,
    ),
)

Video backends

Backend	Role	Dependency
`SIPVideoBackend`	SIP A/V calls (VP9/H.264/VP8)	`roomkit[sip]`
`RTPVideoBackend`	Raw RTP video transport	`roomkit[rtp]`
`LocalVideoBackend`	Webcam capture (OpenCV)	`roomkit[local-video]`
`ScreenCaptureBackend`	Screen capture (mss)	`roomkit[screen-capture]`

Talking avatars

Avatar providers generate lip-synced video from TTS audio — the visual counterpart of text-to-speech:

Avatar pipeline

Implementations: MuseTalk (local inference), WebSocket (remote), Anam (cloud).

Recording

Room-level A/V recording to MP4 with VP9 → H.264 transcoding, per-track sync, and NVENC hardware acceleration:

from roomkit import AudioVideoChannel
from roomkit.video.pipeline import VideoPipelineConfig
from roomkit.video.recorder.pyav import PyAVVideoRecorder
from roomkit.video.recorder import VideoRecordingConfig

video = VideoChannel(
    "video",
    backend=backend,
    pipeline=VideoPipelineConfig(
        recorder=PyAVVideoRecorder(),
        recording_config=VideoRecordingConfig(
            storage="./recordings", codec="auto", fps=15.0,
        ),
    ),
)

Production Features

Storage

kit = RoomKit()                          # InMemoryStore (development)
kit = RoomKit(store=PostgresStore(...))   # PostgreSQL (production)

The store persists rooms, events, bindings, participants, identities, tasks, and observations.

Resilience

Built-in retry with exponential backoff, circuit breaker isolation, token bucket rate limiting, content transcoding, chain depth tracking (prevents infinite loops), and idempotency keys.

await kit.attach_channel("room-1", "sms-out",
    retry_policy=RetryPolicy(max_retries=3, base_delay_seconds=1.0),
    rate_limit=RateLimit(max_per_second=5.0),
)

Room lifecycle

Rooms transition automatically based on activity timers:

Room lifecycle

Telemetry

kit = RoomKit(telemetry=TelemetryConfig(provider=ConsoleTelemetryProvider()))     # dev
kit = RoomKit(telemetry=TelemetryConfig(provider=OpenTelemetryProvider()))         # production

Identity resolution

Resolve unknown senders to known identities with a pluggable pipeline:

class MyResolver(IdentityResolver):
    async def resolve(self, message, context):
        user = await lookup(message.sender_id)
        if user:
            return IdentityResult(
                status=IdentificationStatus.IDENTIFIED,
                identity=Identity(id=user.id, display_name=user.name),
            )
        return IdentityResult(status=IdentificationStatus.UNKNOWN)

kit = RoomKit(identity_resolver=MyResolver())

MCP Tools

from roomkit import MCPToolProvider, compose_tool_handlers

mcp = MCPToolProvider(server_url="http://localhost:3000")
handler = compose_tool_handlers(mcp.handler, my_custom_handler)

Skills

Extensible AI capabilities via a skill registry:

registry = SkillRegistry()
registry.register(Skill(
    metadata=SkillMetadata(name="weather", description="Get weather forecasts"),
    handler=my_weather_handler,
))

Realtime events

Handle typing indicators, presence, read receipts, and tool call notifications:

sub_id = await kit.subscribe_room("room-1", handle_realtime)
await kit.publish_typing("room-1", "user-1")
await kit.publish_presence("room-1", "user-1", "online")

Project Structure

src/roomkit/
  core/            Framework, hooks, routing, retry, circuit breaker
  channels/        Channel implementations (Voice, AI, Agent, WebSocket, ...)
  orchestration/   Multi-agent routing, handoff, pipeline, conversation state
  providers/       Provider implementations (AI, SMS, Email, Teams, ...)
  voice/           Voice subsystem
    backends/        Audio transports (FastRTC, RTP, SIP, Local)
    stt/             Speech-to-text providers
    tts/             Text-to-speech providers
    pipeline/        Audio processing stages (VAD, AEC, AGC, Denoiser, ...)
    realtime/        Speech-to-speech (Gemini Live, OpenAI Realtime)
  video/           Video subsystem (RTP, SIP, Local, Screen, Vision AI)
  recorder/        Room-level A/V recording (PyAV)
  models/          Pydantic data models and enums
  memory/          AI context construction (SlidingWindow, Handoff-aware)
  orchestration/   Pipeline, Loop, Supervisor, Swarm
  store/           Conversation persistence (Memory, Postgres)
  identity/        User identification resolution
  telemetry/       Tracing (Console, OpenTelemetry)

AI Assistant Support

RoomKit includes files to help AI coding assistants understand the library:

llms.txt — structured documentation for LLM context windows
AGENTS.md — coding guidelines and patterns for AI assistants
MCP Integration — Model Context Protocol support

Contributing

See CONTRIBUTING.md and CODE_OF_CONDUCT.md.

uv sync --extra dev
make all                # ruff check + mypy --strict + pytest

All new code needs tests. Aim for >90% coverage.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.0

May 15, 2026

0.7.0a17 pre-release

Apr 30, 2026

0.7.0a16 pre-release

Apr 23, 2026

0.7.0a15 pre-release

Apr 23, 2026

0.7.0a14 pre-release

Apr 17, 2026

0.7.0a13 pre-release

Apr 17, 2026

0.7.0a12 pre-release

Apr 8, 2026

0.7.0a11 pre-release

Apr 4, 2026

0.7.0a10 pre-release

Apr 3, 2026

This version

0.7.0a9 pre-release

Apr 1, 2026

0.7.0a8 pre-release

Apr 1, 2026

0.7.0a7 pre-release

Mar 28, 2026

0.7.0a6 pre-release

Mar 28, 2026

0.7.0a5 pre-release

Mar 27, 2026

0.7.0a4 pre-release

Mar 26, 2026

0.7.0a3 pre-release

Mar 25, 2026

0.7.0a2 pre-release

Mar 24, 2026

0.7.0a1 pre-release

Mar 24, 2026

0.6.13

Mar 5, 2026

0.6.12

Mar 5, 2026

0.6.11

Mar 4, 2026

0.6.10

Mar 3, 2026

0.6.9

Mar 2, 2026

0.6.8

Mar 2, 2026

0.6.7

Mar 1, 2026

0.6.6

Feb 28, 2026

0.6.5

Feb 28, 2026

0.6.4

Feb 28, 2026

0.6.3

Feb 28, 2026

0.6.2

Feb 27, 2026

0.6.1

Feb 26, 2026

0.6.0

Feb 25, 2026

0.5.3

Feb 17, 2026

0.5.2

Feb 17, 2026

0.5.1

Feb 17, 2026

0.5.0

Feb 16, 2026

0.4.18

Feb 14, 2026

0.4.17

Feb 13, 2026

0.4.16

Feb 13, 2026

0.4.15

Feb 13, 2026

0.4.14

Feb 12, 2026

0.4.13

Feb 11, 2026

0.4.12

Feb 11, 2026

0.4.11

Feb 11, 2026

0.4.10

Feb 11, 2026

0.4.9

Feb 11, 2026

0.4.8

Feb 10, 2026

0.4.7

Feb 10, 2026

0.4.6

Feb 10, 2026

0.4.5

Feb 10, 2026

0.4.4

Feb 9, 2026

0.4.3

Feb 8, 2026

0.4.2

Feb 8, 2026

0.4.1

Feb 7, 2026

0.4.0

Feb 7, 2026

0.3.2

Feb 6, 2026

0.3.1

Feb 6, 2026

0.3.0

Feb 5, 2026

0.2.1

Feb 4, 2026

0.2.0

Feb 4, 2026

0.1.1

Feb 2, 2026

0.1.0

Feb 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roomkit-0.7.0a9.tar.gz (1.2 MB view details)

Uploaded Apr 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

roomkit-0.7.0a9-py3-none-any.whl (932.9 kB view details)

Uploaded Apr 1, 2026 Python 3

File details

Details for the file roomkit-0.7.0a9.tar.gz.

File metadata

Download URL: roomkit-0.7.0a9.tar.gz
Upload date: Apr 1, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for roomkit-0.7.0a9.tar.gz
Algorithm	Hash digest
SHA256	`8a3babf86d504b3c128e6df27c79db5e298af597b0fd5cace6b9dd8cc034cfc7`
MD5	`423f07439a675ae8de6c2bf7230ba3ad`
BLAKE2b-256	`2d90b9152dbe7de04b4f615327cf7c8087899f9e3abb6461e3b03ff503eab3bb`

See more details on using hashes here.

File details

Details for the file roomkit-0.7.0a9-py3-none-any.whl.

File metadata

Download URL: roomkit-0.7.0a9-py3-none-any.whl
Upload date: Apr 1, 2026
Size: 932.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for roomkit-0.7.0a9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db9d7c319cb62cf4636cd8888d56c7ccde5573085cacfc606978da596e3d7b93`
MD5	`80f710aa45207e48458466c78a0ebae5`
BLAKE2b-256	`7a2ad1225daff318d01db42ea4bedd764e29ea30bcfc4cd04e7d13da2dbc1ca1`

See more details on using hashes here.

roomkit 0.7.0a9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RoomKit

How it works

Message flow

Quickstart

Example: AI chatbot in 20 lines

Example: Multi-channel bridge

Installation

Multi-Agent Orchestration

Agents

Pipeline — linear handoff chain

Swarm — any-to-any handoff

Loop — iterative refinement

Supervisor — delegating to workers

Voice orchestration

Audio Pipeline

Hooks

Channels & Providers

Video Pipeline

Video backends

Talking avatars

Recording

Production Features

Storage

Resilience

Room lifecycle

Telemetry

Identity resolution

MCP Tools

Skills

Realtime events

Project Structure

AI Assistant Support

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes