Skip to main content

Real-time voice assistant built on OpenAI's Realtime API

Project description

rtvoice

Documentation PyPI version Python Version

A Python library for building real-time voice agents powered by the OpenAI Realtime API. It handles the full session lifecycle — microphone input, WebSocket streaming, turn detection, tool calling, and audio playback — so you can focus on what your agent does, not how it talks.


Installation

pip install rtvoice[audio]

Requires Python 3.13+ and an OPENAI_API_KEY environment variable (or pass api_key= directly).


Quickstart

import asyncio
from rtvoice import RealtimeAgent

async def main():
    agent = RealtimeAgent(
        instructions="You are Jarvis, a concise and helpful voice assistant.",
    )
    await agent.run()

asyncio.run(main())

Run it, speak into your microphone, and the agent responds through your speakers. Press Ctrl+C to end the session.


Tool calling

Register any async (or sync) function with @tools.action(...) and the model will call it when appropriate:

import asyncio
from typing import Annotated
from rtvoice import RealtimeAgent, Tools

tools = Tools()

@tools.action("Get the current weather for a given city")
async def get_weather(city: Annotated[str, "The city name"]) -> str:
    return f"It's 18°C and partly cloudy in {city}."

async def main():
    agent = RealtimeAgent(
        instructions="Answer weather questions using get_weather.",
        tools=tools,
    )
    await agent.run()

asyncio.run(main())

For long-running tools, set is_long_running=True and provide a holding_instruction so the assistant keeps the user informed while it works. → Tools guide


Supervisor agents

Delegate complex, multi-step tasks to a dedicated LLM-driven sub-agent. The voice agent hands off the task, speaks a holding phrase, and presents the result when done:

from llmify import ChatOpenAI
from rtvoice import RealtimeAgent, SupervisorAgent, Tools

tools = Tools()

@tools.action("Book a restaurant table.")
async def book_table(restaurant: str, date: str, time: str, party_size: int) -> str:
    return f"Booked table for {party_size} at {restaurant} on {date} at {time}."

booking_agent = SupervisorAgent(
    name="Booking Assistant",
    description="Books restaurant tables for the user.",
    holding_instruction="I'm checking availability, just a moment.",
    instructions="Use book_table to complete booking requests.",
    tools=tools,
    llm=ChatOpenAI(model="gpt-4o-mini"),
)

agent = RealtimeAgent(
    instructions="Delegate restaurant bookings to the Booking Assistant.",
    supervisor_agent=booking_agent,
)

If the supervisor needs information from the user (e.g. party size), it asks a clarifying question through the voice agent automatically. → Supervisor guide


MCP servers

Connect any MCP-compatible tool server via MCPServerStdio. Tools are discovered and registered automatically during prepare():

from rtvoice import RealtimeAgent
from rtvoice.mcp import MCPServerStdio

agent = RealtimeAgent(
    instructions="You can read and write files in /tmp.",
    mcp_servers=[
        MCPServerStdio(
            command="npx",
            args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
        )
    ],
)

Prefer attaching MCP servers to a SupervisorAgent rather than RealtimeAgent directly to keep the realtime model's tool list short. → MCP guide


Custom audio devices

Implement AudioInputDevice or AudioOutputDevice to use any audio source or sink — useful for testing, telephony, or embedded hardware:

from collections.abc import AsyncIterator
from rtvoice.audio import AudioInputDevice

class CustomMicrophone(AudioInputDevice):
    async def start(self) -> None: ...
    async def stop(self) -> None: ...

    async def stream_chunks(self) -> AsyncIterator[bytes]:
        while self.is_active:
            yield await self._read_audio_chunk()

    @property
    def is_active(self) -> bool:
        return self._active

agent = RealtimeAgent(
    instructions="...",
    audio_input=CustomMicrophone(),
)

Audio API reference


Documentation

Full documentation including guides and API reference: mathisarends.github.io/rtvoice

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtvoice-0.4.0.tar.gz (116.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rtvoice-0.4.0-py3-none-any.whl (63.8 kB view details)

Uploaded Python 3

File details

Details for the file rtvoice-0.4.0.tar.gz.

File metadata

  • Download URL: rtvoice-0.4.0.tar.gz
  • Upload date:
  • Size: 116.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for rtvoice-0.4.0.tar.gz
Algorithm Hash digest
SHA256 bef545bac41ff83ed3a62b6349c7ba38c4b419bc08278e91f0d57de4795130f3
MD5 08bd391f7460eee9cb3adf888c197ff9
BLAKE2b-256 a728d7d1bf02839830c31e58e3f6eacd20200bfb0c8604784e70cf70d43a5976

See more details on using hashes here.

File details

Details for the file rtvoice-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: rtvoice-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 63.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for rtvoice-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8866c7ebe25863d4656d1e35a2aea0d10400cd324561311987556ddbf55168d
MD5 7dafb6bb12bc4ddd6dc6015fcafa2adc
BLAKE2b-256 0e8788e5e312e7cf7cf1a9fdd0962f378344036a95dddcb4b697417974964119

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page