Real-time voice assistant built on OpenAI's Realtime API

Project description

rtvoice

A Python framework for building voice agents on top of OpenAI's Realtime API. Handles audio streaming, interruption detection, tool calling, transcription, subagents, and MCP servers — so you can focus on your application logic.

Installation

pip install rtvoice

Requires Python 3.13+ and an OPENAI_API_KEY environment variable (or pass api_key= directly).

Quick Start

import asyncio
from rtvoice import RealtimeAgent

async def main():
    agent = RealtimeAgent(
        instructions="You are a helpful voice assistant. Answer concisely.",
    )
    await agent.run()

asyncio.run(main())

Configuration

RealtimeAgent accepts the following parameters:

Parameter	Type	Default	Description
`instructions`	`str`	`""`	System prompt for the assistant
`model`	`RealtimeModel`	`GPT_REALTIME_MINI`	Which Realtime model to use
`voice`	`AssistantVoice`	`MARIN`	Voice of the assistant
`speech_speed`	`float`	`1.0`	Playback speed, clamped to `0.5–1.5`
`transcription_model`	`TranscriptionModel \| None`	`None`	Enable speech-to-text (user + assistant)
`noise_reduction`	`NoiseReduction`	`FAR_FIELD`	Microphone noise reduction mode
`turn_detection`	`TurnDetection \| None`	defaults	VAD sensitivity settings
`tools`	`Tools \| None`	`None`	Callable tools the assistant can invoke
`subagents`	`list[SubAgent] \| None`	`None`	Specialist agents to delegate tasks to
`mcp_servers`	`list[MCPServer] \| None`	`None`	MCP servers to connect to
`audio_input`	`AudioInputDevice \| None`	`MicrophoneInput()`	Custom audio source
`audio_output`	`AudioOutputDevice \| None`	`SpeakerOutput()`	Custom audio sink
`transcript_listener`	`TranscriptListener \| None`	`None`	Callbacks for transcript events
`agent_listener`	`AgentListener \| None`	`None`	Callbacks for lifecycle events
`inactivity_timeout_seconds`	`float`	`10.0`	Auto-stop after this many seconds of silence
`api_key`	`str \| None`	`None`	OpenAI API key (falls back to env var)

Models & Voices

from rtvoice.views import RealtimeModel, AssistantVoice

# Models
RealtimeModel.GPT_REALTIME       # gpt-realtime
RealtimeModel.GPT_REALTIME_MINI  # gpt-realtime-mini (default)

# Voices
AssistantVoice.MARIN    # default
AssistantVoice.ALLOY
AssistantVoice.ASH
AssistantVoice.CORAL
AssistantVoice.ECHO
AssistantVoice.NOVA
AssistantVoice.SAGE
AssistantVoice.SHIMMER
# ... and more

Turn Detection

from rtvoice.views import TurnDetection

agent = RealtimeAgent(
    instructions="...",
    turn_detection=TurnDetection(
        threshold=0.5,              # VAD sensitivity (0.0–1.0)
        prefix_padding_ms=300,      # Audio included before speech onset
        silence_duration_ms=500,    # Silence needed to end a turn
    ),
)

Tools

Tools are Python functions decorated with @tools.action(...). The assistant can call them during a conversation. Both sync and async functions are supported.

import asyncio
from typing import Annotated
from rtvoice import RealtimeAgent, Tools

tools = Tools()

@tools.action("Look up the current weather for a city.")
async def get_weather(
    city: Annotated[str, "The city to get weather for."],
) -> str:
    return f"Weather in {city}: 18°C, partly cloudy."

@tools.action("Send an email to a recipient.")
async def send_email(
    recipient: Annotated[str, "Email address."],
    subject: Annotated[str, "Email subject."],
    body: Annotated[str, "Email body."],
) -> str:
    # ... your email logic
    return f"Email sent to {recipient}."

agent = RealtimeAgent(
    instructions="You are a helpful assistant. You can check weather and send emails.",
    tools=tools,
)

asyncio.run(agent.run())

Injected Parameters

Tools can declare special parameters that are automatically injected by the framework — no need to pass them from the LLM:

Parameter name	Type	Description
`event_bus`	`EventBus`	The agent's internal event bus
`context`	`T`	Custom context object passed to `RealtimeAgent(context=...)`
`conversation_history`	`ConversationHistory`	Full conversation history so far

from rtvoice.conversation import ConversationHistory

@tools.action("Summarize the conversation so far.")
async def summarize(conversation_history: ConversationHistory) -> str:
    return conversation_history.format()

Transcript Listener

Implement TranscriptListener to react to completed speech turns. Requires transcription_model to be set for user transcription.

from rtvoice import RealtimeAgent
from rtvoice.views import TranscriptionModel, TranscriptListener

class ConsolePrinter(TranscriptListener):
    async def on_user_completed(self, transcript: str) -> None:
        print(f"User: {transcript}")

    async def on_assistant_completed(self, transcript: str) -> None:
        print(f"Assistant: {transcript}")

agent = RealtimeAgent(
    instructions="...",
    transcription_model=TranscriptionModel.WHISPER_1,
    transcript_listener=ConsolePrinter(),
)

Both callbacks are optional — override only what you need.

Agent Listener

AgentListener provides hooks into the agent's lifecycle. Useful for logging, metrics, or UI state.

from rtvoice import RealtimeAgent
from rtvoice.views import AgentListener

class MyListener(AgentListener):
    async def on_agent_started(self) -> None:
        """Called when the WebSocket session is established and the agent is ready."""
        print("Ready.")

    async def on_agent_stopped(self) -> None:
        """Called when the agent shuts down cleanly."""
        print("Stopped.")

    async def on_agent_interrupted(self) -> None:
        """Called when the assistant is interrupted mid-response by the user."""
        print("Interrupted.")

    async def on_subagent_called(self, agent_name: str, task: str) -> None:
        """Called when a subagent is dispatched with a task."""
        print(f"→ {agent_name}: {task}")

    async def on_agent_error(self, type: str, message: str, code: str | None, param: str | None) -> None:
        """Called on API-level errors."""
        print(f"Error [{code}]: {message}")

agent = RealtimeAgent(
    instructions="...",
    agent_listener=MyListener(),
)

SubAgents

SubAgents let the main voice agent delegate specialized tasks to dedicated LLM agents. The main agent sees them as regular tools and decides autonomously when to call them.

import asyncio
from typing import Annotated
from llmify import ChatOpenAI
from rtvoice import RealtimeAgent, SubAgent, Tools

# 1. Build tools for the subagent
tools = Tools()

@tools.action("Fetch the current weather for a city.")
def get_weather(city: Annotated[str, "The city name."]) -> str:
    return f"Weather in {city}: 12°C, cloudy."

# 2. Define the subagent
weather_agent = SubAgent(
    name="Weather Assistant",
    description=(
        "Looks up current weather conditions for any city. "
        "Use this whenever the user asks about weather or temperature."
    ),
    instructions="You are a weather assistant. Use the get_weather tool and answer concisely.",
    llm=ChatOpenAI(model="gpt-4o-mini"),
    tools=tools,
)

# 3. Attach to the main agent
agent = RealtimeAgent(
    instructions="You are a voice assistant. For weather questions, delegate to the Weather Assistant.",
    subagents=[weather_agent],
)

asyncio.run(agent.run())

SubAgent Options

Parameter	Description
`name`	Identifier shown to the main agent as the tool name
`description`	Tells the main agent when to call this subagent
`instructions`	System prompt for the subagent's own LLM
`llm`	The `BaseChatModel` to use (e.g. `ChatOpenAI`)
`tools`	Tools available to the subagent
`mcp_servers`	MCP servers to attach to the subagent
`max_iterations`	Maximum LLM turns before giving up (default: `10`)
`handoff_instructions`	Extra instructions appended to `description` — guides the main agent on how to hand off
`result_instructions`	Text the main agent receives immediately, before the subagent finishes (useful with `fire_and_forget`)
`fire_and_forget`	If `True`, the main agent continues immediately without waiting for the result

How SubAgents Work

When the main voice agent decides to call a subagent, the framework:

Dispatches a SubAgentCalledEvent (triggers on_subagent_called on your listener)
Passes the current conversation history as context
Runs the subagent's internal ReAct loop (tool calls → LLM → tool calls …)
Returns the final result back to the main voice agent as a tool result

sequenceDiagram
    participant User
    participant VoiceAgent
    participant SubAgent
    participant SubAgentLLM

    User->>VoiceAgent: "What's the weather in Berlin?"
    VoiceAgent->>SubAgent: handoff(task="weather in Berlin", context=...)
    SubAgent->>SubAgentLLM: invoke with tools
    SubAgentLLM->>SubAgent: call get_weather("Berlin")
    SubAgent->>SubAgentLLM: tool result
    SubAgentLLM->>SubAgent: done("12°C, cloudy")
    SubAgent->>VoiceAgent: SubAgentResult(message="12°C, cloudy")
    VoiceAgent->>User: speaks the result

Fire & Forget

For long-running tasks (e.g. sending an email), use fire_and_forget=True. The main agent gets back result_instructions immediately and the subagent runs in the background.

email_agent = SubAgent(
    name="email_agent",
    description="Sends an email. Use when the user wants to send an email.",
    instructions="You are an email assistant. Send the email and confirm.",
    llm=ChatOpenAI(model="gpt-4o-mini"),
    tools=email_tools,
    fire_and_forget=True,
    result_instructions="The email is being sent in the background.",
)

MCP Servers

Connect any MCP-compatible tool server to the agent or to individual subagents.

from rtvoice import RealtimeAgent
from rtvoice.mcp import MCPServerStdio

agent = RealtimeAgent(
    instructions="...",
    mcp_servers=[
        MCPServerStdio(
            command="python",
            args=["my_mcp_server.py"],
        )
    ],
)

MCPServerStdio spawns a subprocess and communicates over stdin/stdout using the MCP protocol. All tools exposed by the server are automatically registered and made available to the LLM.

Event Flow

sequenceDiagram
    participant User
    participant Microphone
    participant EventBus
    participant WebSocket
    participant OpenAI
    participant Speaker

    User->>Microphone: speaks
    Microphone->>EventBus: audio chunk
    EventBus->>WebSocket: forward audio
    WebSocket->>OpenAI: stream audio (WS)

    OpenAI->>WebSocket: speech detected
    OpenAI->>WebSocket: audio response delta
    WebSocket->>EventBus: audio delta event
    EventBus->>Speaker: play chunk
    Speaker->>User: hears response

    Note over User,Speaker: User interrupts mid-response
    User->>Microphone: speaks again
    Microphone->>EventBus: speech started
    EventBus->>WebSocket: cancel response
    WebSocket->>OpenAI: ResponseCancelEvent

    Note over User,Speaker: Tool call
    OpenAI->>WebSocket: function call requested
    WebSocket->>EventBus: tool call event
    EventBus->>EventBus: execute tool
    EventBus->>WebSocket: tool result
    WebSocket->>OpenAI: submit result

Custom Audio Devices

Implement AudioInputDevice or AudioOutputDevice to use any audio source or sink — useful for testing, embedded hardware, or telephony integrations.

from collections.abc import AsyncIterator
from rtvoice.audio.devices import AudioInputDevice, AudioOutputDevice

class CustomMicrophone(AudioInputDevice):
    async def start(self) -> None: ...
    async def stop(self) -> None: ...

    async def stream_chunks(self) -> AsyncIterator[bytes]:
        while self.is_active:
            yield await self._read_audio_chunk()

    @property
    def is_active(self) -> bool:
        return self._active

class CustomSpeaker(AudioOutputDevice):
    async def start(self) -> None: ...
    async def stop(self) -> None: ...
    async def play_chunk(self, chunk: bytes) -> None: ...
    async def clear_buffer(self) -> None: ...

    @property
    def is_playing(self) -> bool:
        return self._playing

agent = RealtimeAgent(
    instructions="...",
    audio_input=CustomMicrophone(),
    audio_output=CustomSpeaker(),
)

Requirements

Python 3.13+
OpenAI API key with Realtime API access (OPENAI_API_KEY env var)

Project details

Release history Release notifications | RSS feed

0.5.0

Apr 10, 2026

0.4.0

Mar 12, 2026

0.3.0

Mar 1, 2026

0.2.0

Mar 1, 2026

0.1.8

Mar 1, 2026

0.1.7

Mar 1, 2026

This version

0.1.6

Mar 1, 2026

0.1.5

Feb 28, 2026

0.1.4

Feb 28, 2026

0.1.3

Feb 27, 2026

0.1.2

Feb 26, 2026

0.1.0

Feb 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtvoice-0.1.6.tar.gz (53.1 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rtvoice-0.1.6-py3-none-any.whl (38.7 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file rtvoice-0.1.6.tar.gz.

File metadata

Download URL: rtvoice-0.1.6.tar.gz
Upload date: Mar 1, 2026
Size: 53.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.2

File hashes

Hashes for rtvoice-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`d2c9642240b47409b8ce3e9aefc14e1d9697b1723b006c951345bb0b649be458`
MD5	`1a0d30b99cd06da4f73a1a5ab2807290`
BLAKE2b-256	`52a8a3db931c105913c457118ab88c91d6ceb6fc08ccc4becfbbace233e6b68a`

See more details on using hashes here.

File details

Details for the file rtvoice-0.1.6-py3-none-any.whl.

File metadata

Download URL: rtvoice-0.1.6-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 38.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.2

File hashes

Hashes for rtvoice-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`25216b28ab19c3f4e6f9c0615b9716d6aaae25838e023847ded25ec0c6b0510a`
MD5	`1b969a8d1a600f9dcde0910aea59f1d3`
BLAKE2b-256	`65149369934c91353256ea6b2c1205b1b013c6eb9f4a337022497edf9b6a73f4`

See more details on using hashes here.

rtvoice 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

rtvoice

Installation

Quick Start

Configuration

Models & Voices

Turn Detection

Tools

Injected Parameters

Transcript Listener

Agent Listener

SubAgents

SubAgent Options

How SubAgents Work

Fire & Forget

MCP Servers

Event Flow

Custom Audio Devices

Requirements

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes