Skip to main content

Real-time voice assistant built on OpenAI's Realtime API

Project description

rtvoice

Documentation PyPI version Python Version

A Python library for building real-time voice agents powered by the OpenAI Realtime API. It handles the full session lifecycle — microphone input, WebSocket streaming, turn detection, tool calling, and audio playback — so you can focus on what your agent does, not how it talks.


Installation

pip install rtvoice[audio]

Requires Python 3.13+ and an OPENAI_API_KEY environment variable (or pass api_key= directly).


Quickstart

import asyncio
from rtvoice import RealtimeAgent

async def main():
    agent = RealtimeAgent(
        instructions="You are Jarvis, a concise and helpful voice assistant.",
    )
    await agent.run()

asyncio.run(main())

Run it, speak into your microphone, and the agent responds through your speakers. Press Ctrl+C to end the session.


Tool calling

Register any async (or sync) function with @tools.action(...) and the model will call it when appropriate:

import asyncio
from typing import Annotated
from rtvoice import RealtimeAgent, Tools

tools = Tools()

@tools.action("Get the current weather for a given city")
async def get_weather(city: Annotated[str, "The city name"]) -> str:
    return f"It's 18°C and partly cloudy in {city}."

async def main():
    agent = RealtimeAgent(
        instructions="Answer weather questions using get_weather.",
        tools=tools,
    )
    await agent.run()

asyncio.run(main())

For long-running tools, set is_long_running=True and provide a holding_instruction so the assistant keeps the user informed while it works. → Tools guide


Subagents

Delegate complex, multi-step tasks to a dedicated LLM-driven sub-agent. The voice agent hands off the task, speaks a holding phrase, and presents the result when done:

from rtvoice.llm import ChatOpenAI
from rtvoice import RealtimeAgent, SubAgent, Tools

tools = Tools()

@tools.action("Book a restaurant table.")
async def book_table(restaurant: str, date: str, time: str, party_size: int) -> str:
    return f"Booked table for {party_size} at {restaurant} on {date} at {time}."

booking_agent = SubAgent(
    name="Booking Assistant",
    description="Books restaurant tables for the user.",
    holding_instruction="I'm checking availability, just a moment.",
    instructions="Use book_table to complete booking requests.",
    tools=tools,
    llm=ChatOpenAI(model="gpt-4o-mini"),
)

agent = RealtimeAgent(
    instructions="Delegate restaurant bookings to the Booking Assistant.",
    subagents=[booking_agent],
)

If a subagent needs information from the user (e.g. party size), it asks a clarifying question through the voice agent automatically. → Subagents guide


MCP servers

Connect any MCP-compatible tool server via MCPServerStdio. Tools are discovered and registered automatically during prepare():

from rtvoice import RealtimeAgent
from rtvoice.mcp import MCPServerStdio

agent = RealtimeAgent(
    instructions="You can read and write files in /tmp.",
    mcp_servers=[
        MCPServerStdio(
            command="npx",
            args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
        )
    ],
)

Prefer attaching MCP servers to a SubAgent rather than RealtimeAgent directly to keep the realtime model's tool list short. → MCP guide


Custom audio devices

Implement AudioInputDevice or AudioOutputDevice to use any audio source or sink — useful for testing, telephony, or embedded hardware:

from collections.abc import AsyncIterator
from rtvoice.audio import AudioInputDevice

class CustomMicrophone(AudioInputDevice):
    async def start(self) -> None: ...
    async def stop(self) -> None: ...

    async def stream_chunks(self) -> AsyncIterator[bytes]:
        while self.is_active:
            yield await self._read_audio_chunk()

    @property
    def is_active(self) -> bool:
        return self._active

agent = RealtimeAgent(
    instructions="...",
    audio_input=CustomMicrophone(),
)

Audio API reference


Documentation

Full documentation including guides and API reference: mathisarends.github.io/rtvoice

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtvoice-0.5.0.tar.gz (132.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rtvoice-0.5.0-py3-none-any.whl (72.3 kB view details)

Uploaded Python 3

File details

Details for the file rtvoice-0.5.0.tar.gz.

File metadata

  • Download URL: rtvoice-0.5.0.tar.gz
  • Upload date:
  • Size: 132.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for rtvoice-0.5.0.tar.gz
Algorithm Hash digest
SHA256 0988fe4c3b417068a2ae67bf4f34b0db813e57eacd4b872788f6d47b204fda33
MD5 c6dd134796666b8a89952ee6988ea91b
BLAKE2b-256 88c4d4e70893ceeb9d0e54964d45b66c9e8bde19517c80a8d70a8d2c6e95c635

See more details on using hashes here.

File details

Details for the file rtvoice-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: rtvoice-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 72.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for rtvoice-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f895c3cce193cf2703b472b2c646d9e67fdb7ed8e0ede6a12de2d3e2a90f7fb2
MD5 985a2a2b829a04556b7a5a2a0e977e04
BLAKE2b-256 427182587d1e356ab3f07eec8f6a15a2fe7c98835a98e83dc10f8133db3d2202

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page