Testing framework for AI Voice Agents

These details have not been verified by PyPI

Project links

Project description

rehearse

Testing framework for voice agents. Make testing voice AI as easy as testing web APIs.

Features

Pytest Integration: Write voice agent tests using familiar pytest patterns
Real Phone Calls: Test your agents via actual Twilio calls
LLM-Powered Assertions: Use semantic assertions to validate agent responses
Multi-Provider Support: ElevenLabs for TTS/STT, LiteLLM for LLM judging (OpenAI, Azure, Anthropic, etc.)
Async-First: Built with async/await for efficient call handling

Installation

pip install rehearse

Or with uv:

uv add rehearse

Quick Start

import pytest
from rehearse import TwilioCall, LLMJudge, expect
from rehearse.audio.tts import ElevenLabsTTS
from rehearse.audio.stt import ElevenLabsSTT

# Configure providers
tts = ElevenLabsTTS(api_key="your-elevenlabs-key")
stt = ElevenLabsSTT(api_key="your-elevenlabs-key")
judge = LLMJudge(model="gpt-4o-mini", api_key="your-openai-key")

@pytest.mark.asyncio
async def test_restaurant_reservation():
    """Test booking a table through a voice agent."""
    async with TwilioCall(
        to="+15551234567",           # Restaurant agent's number
        account_sid="ACxxxxx",
        auth_token="xxxxx",
        from_number="+15559876543",
        ngrok_url="abc123.ngrok.io",
        tts=tts,
        stt=stt,
    ) as call:
        # Agent greets the caller
        greeting = await call.listen()
        expect(greeting).to_not_be_empty()
        await expect(greeting).to_satisfy("a polite greeting", llm=judge)

        # Try to book at invalid time (midnight)
        await call.say("I'd like to book a table for 2 at midnight")
        response = await call.listen()

        # Agent should refuse and mention valid hours
        await expect(response).to_satisfy(
            "politely declines the midnight booking",
            "mentions operating hours are 11am to 10pm, closed on Mondays",
            llm=judge
        )

        # Book at a valid time
        await call.say("Okay, how about 7pm tomorrow?")
        response = await call.listen()

        await expect(response).to_satisfy(
            "confirms the day and time as 7pm",
            "asks for name or party size",
            llm=judge
        )

        # Provide name for reservation
        await call.say("It's for 2 people, name is John Smith")
        confirmation = await call.listen()

        # Agent confirms the booking
        expect(confirmation).to_contain("smith")
        await expect(confirmation).to_satisfy("confirms the reservation is complete", llm=judge)

Prerequisites

ngrok Setup

TwilioCall requires ngrok to receive Twilio webhooks. Start ngrok before running tests:

ngrok http 8765

Copy the forwarding URL (e.g., abc123.ngrok-free.app) and pass it as ngrok_url to TwilioCall.

Running Tests

Basic Command

pytest tests/ -v

Recommended Command

For better output during voice agent tests (real-time logs, shorter tracebacks, no warnings):

pytest tests/ -v -s --tb=short --log-cli-level=INFO --disable-warnings

Command Options Explained

Option	Description
`-v`	Verbose output - shows pass/fail status for each test
`-s`	No capture - print statements and logs show in real-time
`--tb=short`	Short tracebacks - less noise on failures
`--log-cli-level=INFO`	Show INFO level logs as tests run
`--disable-warnings`	Suppress deprecation warnings

Run a Specific Test

pytest tests/test_asterisk_agent.py::test_agent_greeting -v -s --tb=short --log-cli-level=INFO --disable-warnings

API Reference

TwilioCall

The main interface for making test calls.

async with TwilioCall(
    to="+15551234567",           # Phone number to call
    account_sid="ACxxxxx",        # Twilio Account SID
    auth_token="xxxxx",           # Twilio Auth Token
    from_number="+15559876543",   # Your Twilio phone number
    ngrok_url="abc123.ngrok.io",  # ngrok domain for webhooks
    tts=tts,                      # TTS provider instance
    stt=stt,                      # STT provider instance
    send_digits="www7",           # Optional: DTMF digits to send (w = 0.5s wait)
    audio_path="/tmp/debug.wav",  # Optional: Save call audio to WAV file for debugging
) as call:
    # Use call.listen() and call.say()

Saving Audio for Debugging

Use audio_path to save the call's audio to a WAV file for debugging:

async with TwilioCall(
    to="+15551234567",
    audio_path="./recordings/test_greeting.wav",
    # ... other params
) as call:
    response = await call.listen()
    # Audio will be saved to ./recordings/test_greeting.wav when call ends

call.listen()

Listen for the agent's response.

response = await call.listen(
    max_duration=20.0,      # Maximum recording duration in seconds
    silence_threshold=5.0,  # Stop after this many seconds of silence
    timeout=20.0,           # Maximum wait time for response
)
print(response.text)  # Transcribed text

call.say()

Speak to the agent.

await call.say("What are your business hours?")

expect()

Create assertions on responses. All assertions are chainable.

Text Assertions

# Check response contains text (case-insensitive)
expect(response).to_contain("hello")

# Check response contains any of the options
expect(response).to_contain_any(["hello", "hi", "hey"])

# Check response matches regex pattern
expect(response).to_match(r"order #\d+")

# Check exact equality
expect(response.text).to_equal("Hello, how can I help you?")

# Check empty/not empty
expect(response).to_not_be_empty()
expect(response).to_be_empty()

Semantic Assertions (LLM-Powered)

# Single intent check
await expect(response).to_satisfy("a friendly greeting", llm=judge)

# Multiple intents (all must pass)
await expect(response).to_satisfy(
    "acknowledges the customer's request",
    "provides clear next steps",
    "maintains professional tone",
    llm=judge
)

# Synchronous version (for non-async contexts)
expect(response).to_satisfy_sync("a friendly greeting", llm=judge)

Numeric Assertions

# Check response latency
expect(response.latency).to_be_less_than(2.0)
expect(response.latency).to_be_greater_than(0.5)

Tool Call Assertions

# Check if agent made a tool call
expect(call.tool_calls).to_contain("transfer", department="sales")

# Check no tool calls were made
expect(call.tool_calls).to_be_empty()

LLMJudge

Configure the LLM for semantic assertions. Powered by LiteLLM, which means all major LLM providers are supported including OpenAI, Azure OpenAI, Anthropic, Google Gemini, AWS Bedrock, Mistral, Cohere, and more.

# OpenAI
judge = LLMJudge(model="gpt-4o-mini", api_key="sk-xxx")

# Azure OpenAI
judge = LLMJudge(
    model="azure/your-deployment-name",
    api_key="xxx",
    api_base="https://your-resource.openai.azure.com",
)

# Anthropic
judge = LLMJudge(model="claude-3-haiku-20240307", api_key="sk-ant-xxx")

# Google Gemini
judge = LLMJudge(model="gemini/gemini-pro", api_key="xxx")

# AWS Bedrock
judge = LLMJudge(model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0")

See LiteLLM providers for the full list of supported models.

Example Test Patterns

Test Agent Greeting

@pytest.mark.asyncio
async def test_agent_greeting():
    async with TwilioCall(...) as call:
        response = await call.listen()
        expect(response).to_not_be_empty()
        await expect(response).to_satisfy("a friendly greeting", llm=judge)

Test Question and Answer

@pytest.mark.asyncio
async def test_agent_answers_question():
    async with TwilioCall(...) as call:
        # Wait for greeting
        await call.listen()

        # Ask a question
        await call.say("What are your business hours?")

        # Validate response
        response = await call.listen()
        await expect(response).to_satisfy(
            "mentions business hours are Monday, Wednesday, and Friday from 10am to 6pm",
            llm=judge
        )

Test Multi-Turn Conversation

@pytest.mark.asyncio
async def test_multi_turn_conversation():
    async with TwilioCall(...) as call:
        await call.listen()  # Greeting

        await call.say("I need help with my account")
        response1 = await call.listen()

        await call.say("My account number is 12345")
        response2 = await call.listen()

        expect(response2).to_not_be_empty()

Logging

Enable logging to see what's happening during tests:

from rehearse import setup_logging

setup_logging("INFO")   # Standard logging
setup_logging("DEBUG")  # Verbose logging

Roadmap

Connectors

Twilio
Direct WebSocket
Vapi
Retell
Bland.ai

Audio Providers

ElevenLabs (TTS/STT)
Deepgram
AssemblyAI
Google Cloud Speech
Azure Speech Services

Audio Simulation

Background noise injection (rain, traffic, crowd, office)
Different accents and speaking styles
Variable audio quality (simulate poor connections)

Assertions

Native audio assertions (volume, silence detection, audio quality)
Emotion detection assertions (angry, happy, frustrated)
Latency assertions (response time thresholds)
Interruption handling assertions

Advanced Testing

Voice agent vs voice agent testing
Load testing (concurrent calls)
Conversation replay and debugging

Vision: Full-Featured Test

Note: This example demonstrates planned capabilities that are not yet implemented. See the Roadmap above for current implementation status.

import pytest
from rehearse import VapiCall, BlandCall, WebSocketCall, LLMJudge, expect
from rehearse.audio.simulation import BackgroundNoise, AudioQuality, SpeakingStyle

judge = LLMJudge(model="gpt-4o-mini", api_key="your-openai-key")

@pytest.mark.asyncio
async def test_support_call_with_realistic_conditions():
    """Test support agent handles a frustrated customer in noisy environment."""
    async with VapiCall(
        assistant_id="your-assistant-id",  # Vapi assistant to test
        api_key="your-vapi-key",
        background_noise=BackgroundNoise.COFFEE_SHOP, # caller is in a busy coffee shop
        noise_level=0.3,  # 30% background noise
        audio_quality=AudioQuality.POOR_CONNECTION, # Simulate poor cell connection
        speaking_style=SpeakingStyle( # accent and style
            accent="british",
            speed=1.3,  # 30% faster than normal
        ),
    ) as call:
        # Agent greets the caller
        greeting = await call.listen()

        # Check response latency is acceptable
        expect(greeting.latency).to_be_less_than(2.0)

        # Check audio quality metrics
        expect(greeting.audio).to_have_volume_above(0.3)
        expect(greeting.audio).to_have_no_clipping()

        # Express frustration about a billing issue
        await call.say(
            "I've been charged twice for my subscription and nobody is helping me!",
            emotion="frustrated",  # TTS renders with frustrated tone
        )
        response = await call.listen()

        # Agent should detect frustration and respond empathetically
        await expect(response).to_satisfy(
            "acknowledges the customer's frustration",
            "apologizes for the inconvenience",
            "does not sound robotic or dismissive",
            llm=judge
        )

        # Check agent's emotional tone
        expect(response.audio).to_have_emotion("empathetic")
        expect(response.audio).to_not_have_emotion("dismissive")

        # Interrupt the agent mid-sentence to test barge-in handling
        await call.say("Just fix it!", interrupt=True)
        response = await call.listen()

        # Agent should handle interruption gracefully
        expect(response).to_handle_interruption_gracefully()
        await expect(response).to_satisfy(
            "does not restart from the beginning",
            "acknowledges the urgency",
            llm=judge
        )

        # Provide account details
        await call.say("My account number is 1-2-3-4-5-6")
        response = await call.listen()

        # Check agent made the right tool call
        expect(call.tool_calls).to_contain(
            "lookup_account",
            account_number="123456"
        )

        # Final resolution
        await call.say("Yes, please process the refund")
        confirmation = await call.listen()

        await expect(confirmation).to_satisfy(
            "confirms refund will be processed",
            "provides timeline or reference number",
            "asks if there's anything else",
            llm=judge
        )


@pytest.mark.asyncio
async def test_agent_handles_profanity():
    """Test agent remains professional when user uses inappropriate language."""
    async with BlandCall(
        phone_number="+15551234567",
        api_key="your-bland-key",
        background_noise=BackgroundNoise.TRAFFIC,
        noise_level=0.4,
        speaking_style=SpeakingStyle(
            accent="american",
            speed=1.4,  # Speaking fast when angry
        ),
    ) as call:
        await call.listen()  # Greeting

        await call.say(
            "This is bullshit, I want to speak to a manager!",
            emotion="angry",
        )
        response = await call.listen()

        # Agent should remain professional and de-escalate
        await expect(response).to_satisfy(
            "remains calm and professional",
            "does not mirror the profanity",
            "offers to escalate or resolve the issue",
            llm=judge
        )
        expect(response.audio).to_not_have_emotion("angry")
        expect(response.latency).to_be_less_than(2.5)


@pytest.mark.asyncio
async def test_agent_handles_topic_deviation():
    """Test agent redirects when user goes off-topic."""
    async with WebSocketCall(
        url="wss://your-agent.example.com/ws",
        background_noise=BackgroundNoise.TV,
        noise_level=0.25,
        audio_quality=AudioQuality.GOOD,
        speaking_style=SpeakingStyle(
            accent="australian",
            speed=0.9,
        ),
    ) as call:
        await call.listen()  # Greeting

        await call.say("I need help with my order")
        await call.listen()

        # User suddenly goes off-topic
        await call.say("By the way, what's the weather like in New York?")
        response = await call.listen()

        # Agent should gracefully redirect
        await expect(response).to_satisfy(
            "politely declines or briefly acknowledges the off-topic question",
            "redirects conversation back to the original issue",
            llm=judge
        )


@pytest.mark.asyncio
async def test_agent_handles_ambiguous_input():
    """Test agent asks for clarification on vague responses."""
    async with WebSocketCall(
        url="wss://your-agent.example.com/ws",
        background_noise=BackgroundNoise.QUIET_ROOM,
        noise_level=0.05,
        audio_quality=AudioQuality.EXCELLENT,
        speaking_style=SpeakingStyle(
            accent="indian",
            speed=0.7,  # Speaking slowly, uncertain
        ),
    ) as call:
        await call.listen()  # Greeting

        await call.say("I have a problem", emotion="uncertain")
        await call.listen()

        # User gives ambiguous response
        await call.say("Hmm, I don't know, maybe, I guess?", emotion="hesitant")
        response = await call.listen()

        # Agent should ask for clarification
        await expect(response).to_satisfy(
            "asks a clarifying question",
            "does not make assumptions about user intent",
            llm=judge
        )
        expect(response.audio).to_have_emotion("patient")

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.7

Jan 29, 2026

This version

0.1.6

Jan 29, 2026

0.1.5

Jan 29, 2026

0.1.4

Jan 29, 2026

0.1.3

Jan 29, 2026

0.1.2

Jan 29, 2026

0.1.1

Jan 29, 2026

0.1.0

Jan 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rehearse-0.1.6.tar.gz (20.3 kB view details)

Uploaded Jan 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rehearse-0.1.6-py3-none-any.whl (27.5 kB view details)

Uploaded Jan 29, 2026 Python 3

File details

Details for the file rehearse-0.1.6.tar.gz.

File metadata

Download URL: rehearse-0.1.6.tar.gz
Upload date: Jan 29, 2026
Size: 20.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for rehearse-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`58f56d183733ad6a8393477e7aeca278d6c620aa7a2fb247ad384d23c3b64b2a`
MD5	`9d28d6ccffb530b90ade2170ae25860f`
BLAKE2b-256	`9740a269ebddcd498825d4cde7ce5b7ec0761f33a0eb6caa5a3d6c7029db723d`

See more details on using hashes here.

File details

Details for the file rehearse-0.1.6-py3-none-any.whl.

File metadata

Download URL: rehearse-0.1.6-py3-none-any.whl
Upload date: Jan 29, 2026
Size: 27.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for rehearse-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`794a7bfd21dac4ec7d244394c23475cd95071398ebbe9246cfca269afd24d202`
MD5	`4bc7e8eb195bf4a05cb7867fdfbde66a`
BLAKE2b-256	`884c10408976f7ff77a7a739e30d875f990a9bb3e6cdaae839ad72b9d66ce118`

See more details on using hashes here.

rehearse 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rehearse

Features

Installation

Quick Start

Prerequisites

ngrok Setup

Running Tests

Basic Command

Recommended Command

Command Options Explained

Run a Specific Test

API Reference

TwilioCall

Saving Audio for Debugging

call.listen()

call.say()

expect()

Text Assertions

Semantic Assertions (LLM-Powered)

Numeric Assertions

Tool Call Assertions

LLMJudge

Example Test Patterns

Test Agent Greeting

Test Question and Answer

Test Multi-Turn Conversation

Logging

Roadmap

Connectors

Audio Providers

Audio Simulation

Assertions

Advanced Testing

Vision: Full-Featured Test

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes