Build real-time AI voice agents in Python. Zero Runtime runs the speech-to-speech pipeline (STT, LLM, TTS) for you.
Project description
ZRT — Zero Runtime Python SDK
Build real-time AI voice agents in Python — without running the infrastructure. You write the agent (instructions, tools, logic); Zero Runtime runs the live speech-to-speech pipeline — speech-to-text → LLM → text-to-speech, with turn detection, denoising, and interruptions — at low latency in the cloud.
Write the agent. We run the runtime.
A different kind of voice SDK
Most voice frameworks make you run the hard part — media servers, GPUs, turn-taking, autoscaling. No-code platforms hide all that but lock you into a dashboard. Zero Runtime is the middle: real Python and your own providers, with none of the real-time infrastructure to operate.
| Self-hosted frameworks | No-code platforms | Zero Runtime | |
|---|---|---|---|
| Write real Python + custom tools | ✅ | ❌ (dashboard) | ✅ |
| Run media servers / GPUs / scaling | ❌ you run it | ✅ managed | ✅ managed |
| Swap any STT / LLM / TTS provider | ✅ | limited | ✅ |
| Low-latency speech-to-speech | you tune it | managed | managed |
Requirements
- Python 3.11+
- A ZRT runtime endpoint + auth token (from your Zero Runtime account)
- API key(s) for the providers you use (e.g. Deepgram, Google, Cartesia)
Install
pip install --pre zrt
Public beta —
--preis required until the stable release.
Quickstart
1. Set your environment
export ZRT_RUNTIME_ADDRESS=us1.rt.zeroruntime.ai:443 # your ZRT runtime
export ZRT_AUTH_TOKEN=<your-token>
export DEEPGRAM_API_KEY=<key> # speech-to-text
export GOOGLE_API_KEY=<key> # the LLM (Gemini)
export CARTESIA_API_KEY=<key> # text-to-speech
2. Write your agent — agent.py
from zrt.agents import (
Agent, AgentSession, Pipeline, WorkerJob, JobContext, RoomOptions,
EOUConfig, InterruptConfig,
)
from zrt.plugins.deepgram import DeepgramSTT
from zrt.plugins.google import GoogleLLM
from zrt.plugins.cartesia import CartesiaTTS
from zrt.plugins.silero import SileroVAD
from zrt.plugins.turn_detector import NamoTurnDetectorV1
from zrt.plugins.rnnoise import RNNoise
IGNORE_PATTERNS = [r"\b(uh+|um+)\b"] # filler words to drop from transcripts
class Assistant(Agent):
def __init__(self):
super().__init__(instructions="You are a friendly voice assistant. Keep replies short.")
async def on_enter(self):
await self.session.say("Hi! How can I help?")
async def on_exit(self):
pass
async def entrypoint(ctx: JobContext):
session = AgentSession(
agent=Assistant(),
pipeline=Pipeline(
stt=DeepgramSTT(),
llm=GoogleLLM(
model="gemini-2.5-flash",
thinking_budget=0,
include_thoughts=False,
max_output_tokens=8192,
),
tts=CartesiaTTS(),
vad=SileroVAD(threshold=0.4),
turn_detector=NamoTurnDetectorV1(language="en", threshold=0.8),
denoise=RNNoise(),
eou_config=EOUConfig(mode="ADAPTIVE", min_max_speech_wait_timeout=[0.1, 0.3]),
interrupt_config=InterruptConfig(
interrupt_min_duration=0.5,
interrupt_min_words=2,
resume_on_false_interrupt=True,
),
stt_filter_patterns=IGNORE_PATTERNS,
stt_word_substitutions={"recording": "", "recorded": ""},
),
)
await session.start(wait_for_participant=True, run_until_shutdown=True)
if __name__ == "__main__":
WorkerJob(
entrypoint=entrypoint,
jobctx=lambda: JobContext(room_options=RoomOptions(name="Assistant")),
).start()
3. Run it
python agent.py
That's it — speech in → your agent → speech out, in real time.
How it works
| Piece | What it is |
|---|---|
Agent |
Your behavior — instructions, tools, what it says on enter/exit. |
Pipeline |
The voice stack: STT (hear) → LLM (think) → TTS (speak), plus VAD, turn detection, and denoising. |
WorkerJob |
Runs your agent and connects it to Zero Runtime. |
Give your agent tools
Let the LLM call your Python functions — just decorate them:
from zrt.agents import function_tool
@function_tool
async def get_weather(city: str) -> dict:
"""Get the weather for a city.
Args:
city: City name
"""
return {"city": city, "temp_c": 22}
# then pass them to your agent:
# super().__init__(instructions="...", tools=[get_weather])
Your tool runs in your worker; the runtime calls it when the LLM decides to.
Providers
Mix and match — bring the best model for each stage, swap any one in a line:
- Speech-to-text (STT): Deepgram, AssemblyAI, Google, Azure, Gladia, NVIDIA, Sarvam
- LLM: OpenAI, Google Gemini, Anthropic Claude, Groq, Cerebras, xAI Grok, Sarvam
- Text-to-speech (TTS): Cartesia, ElevenLabs, Google, AWS Polly, Azure, Deepgram, Rime, LMNT, Neuphonic, Hume AI, Inworld, Murf, Resemble, Smallest, Speechify, CambAI, NVIDIA
- Realtime speech-to-speech: OpenAI Realtime, Gemini Live, Ultravox, Azure Voice Live
- Turn detection: Namo · VAD: Silero · Denoise: RNNoise
from zrt.plugins.elevenlabs import ElevenLabsTTS # different TTS
from zrt.plugins.anthropic import AnthropicLLM # different LLM
Use cases
Phone & telephony agents, IVR replacement, customer-support voice bots, voice assistants, outbound/inbound call automation, and any real-time conversational AI.
FAQ
How is this different from a voice-agent framework?
- Frameworks make you host and scale the real-time runtime (media, GPUs, turn-taking). ZRT runs that for you — you only write and deploy the agent.
How is it different from a no-code voice platform?
- You write real Python with your own tools, logic, and providers — not a dashboard configuration. Full code control, zero infrastructure.
Can I use my own STT / LLM / TTS providers?
- Yes — mix any supported providers, and bring your own API keys.
What do I need to run it?
- A ZRT runtime endpoint + token and the provider keys for the stages you use.
Examples
More complete examples: https://github.com/ZeroRuntimeAI/zrt-python-sdk-examples
Contact
Copyright © 2026 Zujo Tech Pvt Ltd. All rights reserved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zrt-0.0.1b1.tar.gz.
File metadata
- Download URL: zrt-0.0.1b1.tar.gz
- Upload date:
- Size: 127.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af47adf8c0b4082d3d23a17c55ce3c2b3ff02cd0e811632f8553056c7f226552
|
|
| MD5 |
7ac6a1d7f02299117b4c9e1c9f20d754
|
|
| BLAKE2b-256 |
637cbd8fe754107ebc9ccfb79a90654e182bcc432195ac504acb5bcf40f0c558
|
File details
Details for the file zrt-0.0.1b1-py3-none-any.whl.
File metadata
- Download URL: zrt-0.0.1b1-py3-none-any.whl
- Upload date:
- Size: 161.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0afee24b2fede879589e0d61c750ff51f0f11492e36f2097175adffcfb5127e0
|
|
| MD5 |
efce4fab0fd77951e2999c018053943b
|
|
| BLAKE2b-256 |
bf106a73607a7381b2b0fa80b6c9eaea1354209587a4c2091e859c0b8107fa91
|