Real-time voice assistant built on OpenAI's Realtime API
Project description
rtvoice
A Python library for building real-time voice agents powered by the OpenAI Realtime API. It handles the full session lifecycle — microphone input, WebSocket streaming, turn detection, tool calling, and audio playback — so you can focus on what your agent does, not how it talks.
Installation
pip install rtvoice[audio]
Requires Python 3.13+ and an OPENAI_API_KEY environment variable (or pass api_key= directly).
Quickstart
import asyncio
from rtvoice import RealtimeAgent
async def main():
agent = RealtimeAgent(
instructions="You are Jarvis, a concise and helpful voice assistant.",
)
await agent.run()
asyncio.run(main())
Run it, speak into your microphone, and the agent responds through your speakers. Press Ctrl+C to end the session.
Tool calling
Register any async (or sync) function with @tools.action(...) and the model will call it when appropriate:
import asyncio
from typing import Annotated
from rtvoice import RealtimeAgent, Tools
tools = Tools()
@tools.action("Get the current weather for a given city")
async def get_weather(city: Annotated[str, "The city name"]) -> str:
return f"It's 18°C and partly cloudy in {city}."
async def main():
agent = RealtimeAgent(
instructions="Answer weather questions using get_weather.",
tools=tools,
)
await agent.run()
asyncio.run(main())
For long-running tools, set is_long_running=True and provide a holding_instruction so the assistant keeps the user informed while it works. → Tools guide
Subagents
Delegate complex, multi-step tasks to a dedicated LLM-driven sub-agent. The voice agent hands off the task, speaks a holding phrase, and presents the result when done:
from rtvoice.llm import ChatOpenAI
from rtvoice import RealtimeAgent, SubAgent, Tools
tools = Tools()
@tools.action("Book a restaurant table.")
async def book_table(restaurant: str, date: str, time: str, party_size: int) -> str:
return f"Booked table for {party_size} at {restaurant} on {date} at {time}."
booking_agent = SubAgent(
name="Booking Assistant",
description="Books restaurant tables for the user.",
holding_instruction="I'm checking availability, just a moment.",
instructions="Use book_table to complete booking requests.",
tools=tools,
llm=ChatOpenAI(model="gpt-4o-mini"),
)
agent = RealtimeAgent(
instructions="Delegate restaurant bookings to the Booking Assistant.",
subagents=[booking_agent],
)
If a subagent needs information from the user (e.g. party size), it asks a clarifying question through the voice agent automatically. → Subagents guide
MCP servers
Connect any MCP-compatible tool server via MCPServerStdio. Tools are discovered and registered automatically during prepare():
from rtvoice import RealtimeAgent
from rtvoice.mcp import MCPServerStdio
agent = RealtimeAgent(
instructions="You can read and write files in /tmp.",
mcp_servers=[
MCPServerStdio(
command="npx",
args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
)
],
)
Prefer attaching MCP servers to a SubAgent rather than RealtimeAgent directly to keep the realtime model's tool list short. → MCP guide
Custom audio devices
Implement AudioInputDevice or AudioOutputDevice to use any audio source or sink — useful for testing, telephony, or embedded hardware:
from collections.abc import AsyncIterator
from rtvoice.audio import AudioInputDevice
class CustomMicrophone(AudioInputDevice):
async def start(self) -> None: ...
async def stop(self) -> None: ...
async def stream_chunks(self) -> AsyncIterator[bytes]:
while self.is_active:
yield await self._read_audio_chunk()
@property
def is_active(self) -> bool:
return self._active
agent = RealtimeAgent(
instructions="...",
audio_input=CustomMicrophone(),
)
Documentation
Full documentation including guides and API reference: mathisarends.github.io/rtvoice
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rtvoice-0.5.0.tar.gz.
File metadata
- Download URL: rtvoice-0.5.0.tar.gz
- Upload date:
- Size: 132.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0988fe4c3b417068a2ae67bf4f34b0db813e57eacd4b872788f6d47b204fda33
|
|
| MD5 |
c6dd134796666b8a89952ee6988ea91b
|
|
| BLAKE2b-256 |
88c4d4e70893ceeb9d0e54964d45b66c9e8bde19517c80a8d70a8d2c6e95c635
|
File details
Details for the file rtvoice-0.5.0-py3-none-any.whl.
File metadata
- Download URL: rtvoice-0.5.0-py3-none-any.whl
- Upload date:
- Size: 72.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f895c3cce193cf2703b472b2c646d9e67fdb7ed8e0ede6a12de2d3e2a90f7fb2
|
|
| MD5 |
985a2a2b829a04556b7a5a2a0e977e04
|
|
| BLAKE2b-256 |
427182587d1e356ab3f07eec8f6a15a2fe7c98835a98e83dc10f8133db3d2202
|