Skip to main content

Testing framework for AI Voice Agents

Project description

rehearse

Testing framework for voice agents. Make testing voice AI as easy as testing web APIs.

Features

  • Pytest Integration: Write voice agent tests using familiar pytest patterns
  • Real Phone Calls: Test your agents via actual Twilio calls
  • LLM-Powered Assertions: Use semantic assertions to validate agent responses
  • Multi-Provider Support: ElevenLabs for TTS/STT, LiteLLM for LLM judging (OpenAI, Azure, Anthropic, etc.)
  • Async-First: Built with async/await for efficient call handling

Installation

pip install rehearse

Or with uv:

uv add rehearse

Quick Start

import pytest
from rehearse import TwilioCall, LLMJudge, expect
from rehearse.audio.tts import ElevenLabsTTS
from rehearse.audio.stt import ElevenLabsSTT

# Configure providers
tts = ElevenLabsTTS(api_key="your-elevenlabs-key")
stt = ElevenLabsSTT(api_key="your-elevenlabs-key")
judge = LLMJudge(model="gpt-4o-mini", api_key="your-openai-key")

@pytest.mark.asyncio
async def test_agent_greeting():
    """Test that the agent greets the caller."""
    async with TwilioCall(
        to="+15551234567",           # Agent's phone number
        account_sid="ACxxxxx",        # Twilio Account SID
        auth_token="xxxxx",           # Twilio Auth Token
        from_number="+15559876543",   # Your Twilio number
        ngrok_url="abc123.ngrok.io",  # Your ngrok domain
        tts=tts,
        stt=stt,
    ) as call:
        # Listen for agent's greeting
        response = await call.listen(max_duration=20.0, silence_threshold=5.0)

        # Assert response is not empty
        expect(response).to_not_be_empty()

        # Assert response matches intent using LLM
        await expect(response).to_satisfy("a friendly greeting", llm=judge)

Prerequisites

1. Environment Variables

Create a .env file with your credentials:

# Twilio (required)
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_FROM_NUMBER=+15559876543

# ElevenLabs (required for TTS/STT)
ELEVENLABS_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# LLM Judge - OpenAI
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Or Azure OpenAI
AZURE_OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AZURE_BASE_URL=https://your-resource.openai.azure.com

# Your voice agent's phone number
AGENT_PHONE=+15551234567

2. ngrok Setup

Rehearse needs a public URL to receive Twilio webhooks. Start ngrok before running tests:

ngrok http 8765

Copy the forwarding URL (e.g., abc123.ngrok-free.app) and set it as NGROK_URL in your environment.

Running Tests

Basic Command

pytest examples/ -v

Recommended Command

For better output during voice agent tests (real-time logs, shorter tracebacks, no warnings):

pytest examples/ -v -s --tb=short --log-cli-level=INFO --disable-warnings

Command Options Explained

Option Description
-v Verbose output - shows pass/fail status for each test
-s No capture - print statements and logs show in real-time
--tb=short Short tracebacks - less noise on failures
--log-cli-level=INFO Show INFO level logs as tests run
--disable-warnings Suppress deprecation warnings

Run a Specific Test

pytest examples/test_asterisk_agent.py::test_agent_greeting -v -s --tb=short --log-cli-level=INFO --disable-warnings

API Reference

TwilioCall

The main interface for making test calls.

async with TwilioCall(
    to="+15551234567",           # Phone number to call
    account_sid="ACxxxxx",        # Twilio Account SID
    auth_token="xxxxx",           # Twilio Auth Token
    from_number="+15559876543",   # Your Twilio phone number
    ngrok_url="abc123.ngrok.io",  # ngrok domain for webhooks
    tts=tts,                      # TTS provider instance
    stt=stt,                      # STT provider instance
    send_digits="www7",           # Optional: DTMF digits to send (w = 0.5s wait)
    audio_path="/tmp/debug.wav",  # Optional: Save call audio to WAV file for debugging
) as call:
    # Use call.listen() and call.say()

Saving Audio for Debugging

Use audio_path to save the call's audio to a WAV file for debugging:

async with TwilioCall(
    to="+15551234567",
    audio_path="./recordings/test_greeting.wav",
    # ... other params
) as call:
    response = await call.listen()
    # Audio will be saved to ./recordings/test_greeting.wav when call ends

call.listen()

Listen for the agent's response.

response = await call.listen(
    max_duration=20.0,      # Maximum recording duration in seconds
    silence_threshold=5.0,  # Stop after this many seconds of silence
    timeout=20.0,           # Maximum wait time for response
)
print(response.text)  # Transcribed text

call.say()

Speak to the agent.

await call.say("What are your business hours?")

expect()

Create assertions on responses. All assertions are chainable.

Text Assertions

# Check response contains text (case-insensitive)
expect(response).to_contain("hello")

# Check response contains any of the options
expect(response).to_contain_any(["hello", "hi", "hey"])

# Check response matches regex pattern
expect(response).to_match(r"order #\d+")

# Check exact equality
expect(response.text).to_equal("Hello, how can I help you?")

# Check empty/not empty
expect(response).to_not_be_empty()
expect(response).to_be_empty()

Semantic Assertions (LLM-Powered)

# Single intent check
await expect(response).to_satisfy("a friendly greeting", llm=judge)

# Multiple intents (all must pass)
await expect(response).to_satisfy(
    "acknowledges the customer's request",
    "provides clear next steps",
    "maintains professional tone",
    llm=judge
)

# Synchronous version (for non-async contexts)
expect(response).to_satisfy_sync("a friendly greeting", llm=judge)

Numeric Assertions

# Check response latency
expect(response.latency).to_be_less_than(2.0)
expect(response.latency).to_be_greater_than(0.5)

Tool Call Assertions

# Check if agent made a tool call
expect(call.tool_calls).to_contain("transfer", department="sales")

# Check no tool calls were made
expect(call.tool_calls).to_be_empty()

LLMJudge

Configure the LLM for semantic assertions. Powered by LiteLLM, which means all major LLM providers are supported including OpenAI, Azure OpenAI, Anthropic, Google Gemini, AWS Bedrock, Mistral, Cohere, and more.

# OpenAI
judge = LLMJudge(model="gpt-4o-mini", api_key="sk-xxx")

# Azure OpenAI
judge = LLMJudge(
    model="azure/your-deployment-name",
    api_key="xxx",
    api_base="https://your-resource.openai.azure.com",
)

# Anthropic
judge = LLMJudge(model="claude-3-haiku-20240307", api_key="sk-ant-xxx")

# Google Gemini
judge = LLMJudge(model="gemini/gemini-pro", api_key="xxx")

# AWS Bedrock
judge = LLMJudge(model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0")

See LiteLLM providers for the full list of supported models.

Example Test Patterns

Test Agent Greeting

@pytest.mark.asyncio
async def test_agent_greeting():
    async with TwilioCall(...) as call:
        response = await call.listen()
        expect(response).to_not_be_empty()
        await expect(response).to_satisfy("a friendly greeting", llm=judge)

Test Question and Answer

@pytest.mark.asyncio
async def test_agent_answers_question():
    async with TwilioCall(...) as call:
        # Wait for greeting
        await call.listen()

        # Ask a question
        await call.say("What are your business hours?")

        # Validate response
        response = await call.listen()
        await expect(response).to_satisfy(
            "mentions business hours are Monday, Wednesday, and Friday from 10am to 6pm",
            llm=judge
        )

Test Multi-Turn Conversation

@pytest.mark.asyncio
async def test_multi_turn_conversation():
    async with TwilioCall(...) as call:
        await call.listen()  # Greeting

        await call.say("I need help with my account")
        response1 = await call.listen()

        await call.say("My account number is 12345")
        response2 = await call.listen()

        expect(response2).to_not_be_empty()

Why ngrok?

Twilio uses webhooks to stream audio back to your test runner. Since your local machine isn't publicly accessible, ngrok creates a secure tunnel that exposes your local WebSocket server (port 8765) to the internet. This allows Twilio to send real-time audio data back to Rehearse during the call.

Logging

Enable logging to see what's happening during tests:

from rehearse import setup_logging

setup_logging("INFO")   # Standard logging
setup_logging("DEBUG")  # Verbose logging

Roadmap

Connectors

  • Twilio
  • Direct WebSocket
  • Vapi
  • Retell
  • Bland.ai

Audio Providers

  • ElevenLabs (TTS/STT)
  • Deepgram
  • AssemblyAI
  • Google Cloud Speech
  • Azure Speech Services

Audio Simulation

  • Background noise injection (rain, traffic, crowd, office)
  • Different accents and speaking styles
  • Variable audio quality (simulate poor connections)

Assertions

  • Native audio assertions (volume, silence detection, audio quality)
  • Emotion detection assertions (angry, happy, frustrated)
  • Latency assertions (response time thresholds)
  • Interruption handling assertions

Advanced Testing

  • Voice agent vs voice agent testing
  • Load testing (concurrent calls)
  • Conversation replay and debugging

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rehearse-0.1.2.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rehearse-0.1.2-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file rehearse-0.1.2.tar.gz.

File metadata

  • Download URL: rehearse-0.1.2.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for rehearse-0.1.2.tar.gz
Algorithm Hash digest
SHA256 041abc97db32f4248c7e8b4ea38922d039d1b64e2a20f9060c36608e597e04a6
MD5 dc5587a19080af94d0d1a4f49a2696b8
BLAKE2b-256 714f78c6972242983c28c807f25d488fa0d38b0698311690b78e91ab6c893a54

See more details on using hashes here.

File details

Details for the file rehearse-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: rehearse-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for rehearse-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e16e76a25043a8994e016a89e7620b8986aad8d43cdc9649a49cb81d7bb6d444
MD5 503098b8f6689ae67971b9756702b602
BLAKE2b-256 c4a58311b0ddc1b9374cf3e3d6e6efe911aa8dbe23f7854d7343f32bb17edf19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page