Observability framework for Pipecat voice and multimodal conversational AI
Project description
Voiceground
Observability framework for Pipecat voice and multimodal conversational AI.
Features
- VoicegroundObserver: Track conversation events following Pipecat's Observer pattern
- Call Simulation: Test your bots with dynamic, LLM-powered simulated users
Installation
pip install voiceground
Or with UV:
uv add voiceground
Quick Start
import uuid
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from voiceground import VoicegroundObserver, HTMLReporter
# Create observer with HTML reporter
conversation_id = str(uuid.uuid4())
reporter = HTMLReporter(output_dir="./reports")
observer = VoicegroundObserver(
reporters=[reporter],
conversation_id=conversation_id
)
# Create pipeline task with observer
task = PipelineTask(
pipeline=Pipeline([...]),
observers=[observer]
)
# Run your pipeline
Tested With
Voiceground has been tested with the following Pipecat providers:
LLM Providers
- OpenAI (GPT)
STT Providers
- ElevenLabs
TTS Providers
- ElevenLabs
Event Categories
Voiceground tracks the following event categories:
| Category | Types | Description |
|---|---|---|
user_speak |
start, end |
User speech events |
bot_speak |
start, end |
Bot speech events |
stt |
start, end |
Speech-to-text processing (includes transcription text) |
llm |
start, first_byte, end |
LLM response generation (includes generated text) |
tts |
start, first_byte, end |
Text-to-speech synthesis |
tool_call |
start, end |
LLM function/tool calling |
system |
start, end |
System events (e.g., context aggregation) |
Opinionated Metrics
Voiceground tracks 7 opinionated metrics per conversation turn, providing comprehensive insights into voice conversation performance:
-
Turn Duration: Total time from the first event to the last event in the turn (milliseconds). Measures the complete duration of a conversation turn.
-
Response Time: Time from
user_speak:endtobot_speak:start(or from the first event tobot_speak:startif the conversation started with bot speech). This is the end-to-end time the user experiences waiting for a response. -
Transcription Overhead: Time from
user_speak:endtostt:end(milliseconds). Measures the latency of speech-to-text processing. -
Voice Synthesis Overhead: Time from
tts:starttobot_speak:start(milliseconds). Measures the latency of text-to-speech synthesis. -
LLM Response Time: Time from
llm:starttollm:first_byte(milliseconds). Measures the time-to-first-byte for the LLM response, indicating how quickly the model starts generating content. -
System Overhead: Time from
stt:endtollm:start(milliseconds). Measures context aggregation and other system processing that occurs between transcription and LLM invocation. Includes labels/metadata about the system operations. -
Tools Overhead: Sum of all individual
tool_calldurations (eachtool_call:end - tool_call:start) that occur betweenllm:startandllm:end(milliseconds). Measures the total time spent executing function/tool calls during LLM processing.
Metric Relationships
The metrics are related as follows:
- Response Time ≈ Transcription Overhead + System Overhead + LLM Response Time + Tools Overhead + Voice Synthesis Overhead
- Turn Duration includes all events in the turn and may be longer than Response Time if there are additional events before or after the main response flow
Report Features
The generated HTML reports include:
- Timeline Visualization: Interactive timeline showing all events and their relationships
- Events Table: Detailed view of all tracked events with timestamps, sources, and data
- Turns Table: Conversation turns with all 7 opinionated performance metrics
- Metrics Summary: Average metrics across the conversation
- Event Highlighting: Hover over events or turns to see related events highlighted
Call Simulation
Voiceground includes a call simulation feature for testing your bots with dynamic, LLM-powered simulated users. Instead of manual testing, you can define user personas and goals, and let the simulator have realistic conversations with your bot.
Architecture
┌───────────────────────────┐ ┌───────────────────────────┐
│ Simulator Pipeline │ │ Bot Pipeline │
│ (The "Fake User") │ │ (Your actual bot) │
│ │ │ │
│ STT ◄───────────────────┼── audio ─┼─── TTS │
│ ↓ │ │ ↑ │
│ LLM (user persona) │ │ LLM │
│ ↓ │ │ ↑ │
│ TTS ────────────────────┼── audio ─┼──► STT │
│ │ │ │
└───────────────────────────┘ └───────────────────────────┘
VoicegroundBridgeTransport
Both pipelines are standard Pipecat pipelines connected via VoicegroundBridgeTransport. The simulator's LLM has a system prompt that tells it to act as a user with specific goals.
Quick Start
from voiceground.simulation import VoicegroundSimulation, VoicegroundSimulatorConfig
# Configure the simulated user
config = VoicegroundSimulatorConfig(
llm=OpenAILLMService(api_key=...),
tts=ElevenLabsTTSService(api_key=...),
stt=ElevenLabsSTTService(api_key=...),
system_prompt="""
You are a customer calling to book a restaurant table.
Your goal: Book a table for 2 people tomorrow at 7pm.
Be natural and conversational.
""",
initiate_conversation=True, # Simulator speaks first
max_turns=10,
)
# Run simulation
async with VoicegroundSimulation(config) as simulation:
await run_bot(transport=simulation.transport)
# Results available after context exits
print(simulation.results.transcript)
print(f"Turns: {simulation.results.turn_count}")
Your run_bot function just needs to accept a transport parameter, as a drop in replacement:
async def run_bot(transport):
# Use transport.input() and transport.output() - same as LocalAudioTransport!
pipeline = Pipeline([
transport.input(),
stt, llm, tts,
transport.output(),
])
runner = PipelineRunner()
await runner.run(PipelineTask(pipeline))
The simulation automatically handles turn limiting and timeouts - no extra configuration needed on the bot side.
Note: Simulations run faster than real-time because audio input/output is not buffered. This allows for rapid testing and iteration, but timing metrics may not reflect real-world performance characteristics.
VoicegroundSimulatorConfig Options
| Option | Type | Description |
|---|---|---|
llm |
LLMService |
LLM for generating user responses |
tts |
TTSService |
TTS for generating user voice |
stt |
STTService |
STT for transcribing bot speech |
system_prompt |
str |
Instructions for the simulated user persona |
initiate_conversation |
bool |
If True, simulator speaks first (default: False) |
max_turns |
int |
Maximum conversation turns (default: 10) |
timeout_seconds |
float |
Maximum simulation duration (default: 120) |
VoicegroundSimulationResults
After the simulation completes, simulation.results contains:
transcript: List ofVoicegroundTranscriptEntryobjects with role, text, and timestampevents: AllVoicegroundEventobjects captured during simulationturn_count: Number of completed conversation turnsduration_seconds: Total simulation durationtermination_reason: Why the simulation ended (max_turns,timeout, orunknown)
Examples
See the examples/ directory for complete working examples:
- observer/basic_pipeline.py: Basic voice conversation with STT, LLM, and TTS
- observer/tool_calling_pipeline.py: Example with LLM function calling
- simulations/run_simulation.py: Call simulation with a restaurant booking scenario
To run an example:
# Install example dependencies
uv sync --all-extras
# Set required environment variables
export OPENAI_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key
export VOICE_ID=your_voice_id
# Run the example
python examples/basic_pipeline.py
Note: On macOS, you'll need to install portaudio for audio support:
brew install portaudio
Development
# Clone the repository
git clone https://github.com/poseneror/voiceground.git
cd voiceground
# Install all dependencies (including dev and examples)
uv sync --all-extras
# Run tests
uv run pytest
# Run linting
uv run ruff check .
# Run type checking
uv run mypy src
# Build the client
python scripts/develop.py build
# Run example (requires portaudio on macOS: brew install portaudio)
python scripts/develop.py example
License
BSD-2-Clause License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voiceground-0.1.5.tar.gz.
File metadata
- Download URL: voiceground-0.1.5.tar.gz
- Upload date:
- Size: 474.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bc69ec2c582477fe394adf90b7f03e18d6e77f2369d0880a03470ac7f8413e1
|
|
| MD5 |
fb1ea113f920105cba61aeb4f9084d30
|
|
| BLAKE2b-256 |
fef6ec8b8f482025207c3df417bd51f085f94b0bf723a22f25575ba923eaff7b
|
File details
Details for the file voiceground-0.1.5-py3-none-any.whl.
File metadata
- Download URL: voiceground-0.1.5-py3-none-any.whl
- Upload date:
- Size: 154.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5a8e670798ee674ce1b4c98b95473adfd3ba7080eb6b49c3b0290cfd6a65af3
|
|
| MD5 |
c069b608a0caa0685f30b696f228a9f7
|
|
| BLAKE2b-256 |
1ab7b6afae096dbb053f2c8e893366f1ecefb93080691506b0748e4d1cccd6b2
|