Skip to main content

Build real-time AI voice agents in Python. Zero Runtime runs the speech-to-speech pipeline (STT, LLM, TTS) for you.

Project description

ZRT — Zero Runtime Python SDK

Build real-time AI voice agents in Python — without running the infrastructure. You write the agent (instructions, tools, logic); Zero Runtime runs the live speech-to-speech pipeline — speech-to-text → LLM → text-to-speech, with turn detection, denoising, and interruptions — at low latency in the cloud.

Write the agent. We run the runtime.

A different kind of voice SDK

Most voice frameworks make you run the hard part — media servers, GPUs, turn-taking, autoscaling. No-code platforms hide all that but lock you into a dashboard. Zero Runtime is the middle: real Python and your own providers, with none of the real-time infrastructure to operate.

Self-hosted frameworks No-code platforms Zero Runtime
Write real Python + custom tools ❌ (dashboard)
Run media servers / GPUs / scaling you run it ✅ managed ✅ managed
Swap any STT / LLM / TTS provider limited
Low-latency speech-to-speech you tune it managed managed

Requirements

  • Python 3.11+
  • A ZRT runtime endpoint + auth token (from your Zero Runtime account)
  • API key(s) for the providers you use (e.g. Deepgram, Google, Cartesia)

Install

pip install --pre zrt

Public beta — --pre is required until the stable release.

Quickstart

1. Set your environment

export ZRT_RUNTIME_ADDRESS=us1.rt.zeroruntime.ai:443   # your ZRT runtime
export ZRT_AUTH_TOKEN=<your-token>

export DEEPGRAM_API_KEY=<key>    # speech-to-text
export GOOGLE_API_KEY=<key>      # the LLM (Gemini)
export CARTESIA_API_KEY=<key>    # text-to-speech

2. Write your agentagent.py

from zrt.agents import (
    Agent, AgentSession, Pipeline, WorkerJob, JobContext, RoomOptions,
    EOUConfig, InterruptConfig,
)
from zrt.plugins.deepgram import DeepgramSTT
from zrt.plugins.google import GoogleLLM
from zrt.plugins.cartesia import CartesiaTTS
from zrt.plugins.silero import SileroVAD
from zrt.plugins.turn_detector import NamoTurnDetectorV1
from zrt.plugins.rnnoise import RNNoise

IGNORE_PATTERNS = [r"\b(uh+|um+)\b"]   # filler words to drop from transcripts


class Assistant(Agent):
    def __init__(self):
        super().__init__(instructions="You are a friendly voice assistant. Keep replies short.")

    async def on_enter(self):
        await self.session.say("Hi! How can I help?")

    async def on_exit(self):
        pass


async def entrypoint(ctx: JobContext):
    session = AgentSession(
        agent=Assistant(),
        pipeline=Pipeline(
            stt=DeepgramSTT(),
            llm=GoogleLLM(
                model="gemini-2.5-flash",
                thinking_budget=0,
                include_thoughts=False,
                max_output_tokens=8192,
            ),
            tts=CartesiaTTS(),
            vad=SileroVAD(threshold=0.4),
            turn_detector=NamoTurnDetectorV1(language="en", threshold=0.8),
            denoise=RNNoise(),
            eou_config=EOUConfig(mode="ADAPTIVE", min_max_speech_wait_timeout=[0.1, 0.3]),
            interrupt_config=InterruptConfig(
                interrupt_min_duration=0.5,
                interrupt_min_words=2,
                resume_on_false_interrupt=True,
            ),
            stt_filter_patterns=IGNORE_PATTERNS,
            stt_word_substitutions={"recording": "", "recorded": ""},
        ),
    )
    await session.start(wait_for_participant=True, run_until_shutdown=True)


if __name__ == "__main__":
    WorkerJob(
        entrypoint=entrypoint,
        jobctx=lambda: JobContext(room_options=RoomOptions(name="Assistant")),
    ).start()

3. Run it

python agent.py

That's it — speech in → your agent → speech out, in real time.

How it works

Piece What it is
Agent Your behavior — instructions, tools, what it says on enter/exit.
Pipeline The voice stack: STT (hear) → LLM (think) → TTS (speak), plus VAD, turn detection, and denoising.
WorkerJob Runs your agent and connects it to Zero Runtime.

Give your agent tools

Let the LLM call your Python functions — just decorate them:

from zrt.agents import function_tool

@function_tool
async def get_weather(city: str) -> dict:
    """Get the weather for a city.

    Args:
        city: City name
    """
    return {"city": city, "temp_c": 22}

# then pass them to your agent:
#   super().__init__(instructions="...", tools=[get_weather])

Your tool runs in your worker; the runtime calls it when the LLM decides to.

Providers

Mix and match — bring the best model for each stage, swap any one in a line:

  • Speech-to-text (STT): Deepgram, AssemblyAI, Google, Azure, Gladia, NVIDIA, Sarvam
  • LLM: OpenAI, Google Gemini, Anthropic Claude, Groq, Cerebras, xAI Grok, Sarvam
  • Text-to-speech (TTS): Cartesia, ElevenLabs, Google, AWS Polly, Azure, Deepgram, Rime, LMNT, Neuphonic, Hume AI, Inworld, Murf, Resemble, Smallest, Speechify, CambAI, NVIDIA
  • Realtime speech-to-speech: OpenAI Realtime, Gemini Live, Ultravox, Azure Voice Live
  • Turn detection: Namo · VAD: Silero · Denoise: RNNoise
from zrt.plugins.elevenlabs import ElevenLabsTTS   # different TTS
from zrt.plugins.anthropic import AnthropicLLM      # different LLM

Use cases

Phone & telephony agents, IVR replacement, customer-support voice bots, voice assistants, outbound/inbound call automation, and any real-time conversational AI.

FAQ

How is this different from a voice-agent framework?

  • Frameworks make you host and scale the real-time runtime (media, GPUs, turn-taking). ZRT runs that for you — you only write and deploy the agent.

How is it different from a no-code voice platform?

  • You write real Python with your own tools, logic, and providers — not a dashboard configuration. Full code control, zero infrastructure.

Can I use my own STT / LLM / TTS providers?

  • Yes — mix any supported providers, and bring your own API keys.

What do I need to run it?

  • A ZRT runtime endpoint + token and the provider keys for the stages you use.

Examples

More complete examples: https://github.com/ZeroRuntimeAI/zrt-python-sdk-examples

Contact

support@videosdk.live

Copyright © 2026 Zujo Tech Pvt Ltd. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zrt-0.0.1b1.tar.gz (127.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zrt-0.0.1b1-py3-none-any.whl (161.1 kB view details)

Uploaded Python 3

File details

Details for the file zrt-0.0.1b1.tar.gz.

File metadata

  • Download URL: zrt-0.0.1b1.tar.gz
  • Upload date:
  • Size: 127.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for zrt-0.0.1b1.tar.gz
Algorithm Hash digest
SHA256 af47adf8c0b4082d3d23a17c55ce3c2b3ff02cd0e811632f8553056c7f226552
MD5 7ac6a1d7f02299117b4c9e1c9f20d754
BLAKE2b-256 637cbd8fe754107ebc9ccfb79a90654e182bcc432195ac504acb5bcf40f0c558

See more details on using hashes here.

File details

Details for the file zrt-0.0.1b1-py3-none-any.whl.

File metadata

  • Download URL: zrt-0.0.1b1-py3-none-any.whl
  • Upload date:
  • Size: 161.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for zrt-0.0.1b1-py3-none-any.whl
Algorithm Hash digest
SHA256 0afee24b2fede879589e0d61c750ff51f0f11492e36f2097175adffcfb5127e0
MD5 efce4fab0fd77951e2999c018053943b
BLAKE2b-256 bf106a73607a7381b2b0fa80b6c9eaea1354209587a4c2091e859c0b8107fa91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page