voxarena

An evaluation arena for realtime voice agents.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

simkeyur

These details have not been verified by PyPI

Project description

VoxArena

An evaluation arena for realtime voice agents.

VoxArena is a reproducible benchmarking harness for realtime voice agents. Run the same scripted conversation across Gemini Live, OpenAI Realtime, and other Pipecat-supported providers — and compare them apples-to-apples on latency, tool-call accuracy, and hallucinations.

Drop it into your CI pipeline, your dev loop, or the bundled control panel.

🚀 CI & Pipeline Integration

VoxArena ships a voxarena CLI designed for headless use in your build pipeline. It returns a non-zero exit code when metrics fall below thresholds you define, and emits JUnit XML for native CI reporting.

pip install voxarena

voxarena run \
  --provider gemini \
  --script ./script/utterances.yaml \
  --min-tool-accuracy 0.9 \
  --max-hallucinations 0 \
  --max-avg-ttfa-ms 1500 \
  --output result.json \
  --junit voxarena.xml
# exit 0 if every threshold passes, 1 otherwise

Compare two providers in one shot

voxarena compare \
  --gemini-model gemini-3.1-flash-live-preview \
  --openai-model gpt-realtime-2 \
  --num-turns 5 \
  --min-tool-accuracy 0.9 \
  --output compare.json

GitHub Actions

- name: Voice agent regression check
  env:
    GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
  run: |
    pip install voxarena
    voxarena run --provider gemini \
      --min-tool-accuracy 0.92 --max-hallucinations 0 \
      --junit voxarena.xml --quiet

- uses: mikepenz/action-junit-report@v4
  if: always()
  with:
    report_paths: voxarena.xml

Subcommands

Command	What it does
`voxarena run`	Single-provider scripted run; exits 0/1 against thresholds.
`voxarena compare`	Runs Gemini and OpenAI in parallel against the same script.
`voxarena report`	Generates a markdown comparison report from past runs.

Run voxarena <command> --help for the full flag set.

Features

🎙️ Provider-agnostic agent — one Pipecat pipeline drives every provider; swap models without re-implementing your agent
🔁 Scripted conversations — multi-turn YAML scripts with pre-recorded WAV inputs and expected tool calls / response content
📊 Automated scoring — tool-call correctness, response matching, hallucination counts, time-to-first-audio, interruption-stop latency
🆚 Side-by-side comparisons — run multiple providers in parallel against the same script
🗄️ Persistent run history — JSON manifests on disk, indexed in SQLite
🖥️ Web control panel — React UI to launch runs, watch live status, browse results, and edit scripts
🧩 Extensible — add a new provider by implementing one adapter class

Architecture

flowchart TD
    A["Recorded WAVs<br/>script/audio/*.wav"] --> B["Injection Harness<br/>voxarena/harness.py"]
    B --> C

    subgraph C ["Pipecat Pipeline"]
        direction LR
        C1["Audio Injector"] --> C2["Provider Adapter"]
        C2 --> C3["Audio Capture"]
        C3 --> C4["Metrics Collector"]
    end

    C2 <--> D{{"Provider Backend"}}
    D --> D1["Gemini Live"]
    D --> D2["OpenAI Realtime"]
    D --> D3["...future providers"]

    C4 --> E["Run Manifest<br/>results/PROVIDER/RUN_ID/manifest.json"]
    E --> F[("SQLite Index<br/>runs.db")]

    F <--> G["voxarena CLI<br/>+ FastAPI Backend"]
    G <--> H["React Control Panel<br/>ui/"]

    style D1 fill:#4285F4,color:#fff,stroke:#333
    style D2 fill:#10A37F,color:#fff,stroke:#333
    style D3 fill:#999,color:#fff,stroke:#333
    style F fill:#f5f5f5,stroke:#333
    style H fill:#fff7da,stroke:#333

Local Dev (with UI)

git clone https://github.com/simkeyur/vox-arena.git
cd vox-arena
cp .env.example .env  # add GOOGLE_API_KEY / OPENAI_API_KEY

python3 -m venv .venv && source .venv/bin/activate
pip install -e .

uvicorn voxarena.main:app --reload --port 8000

Then in another terminal:

cd ui && npm install && npm run dev

Open the control panel at http://localhost:5173.

Bring Your Own Agent

The demo ships with the "Saffron Leaf" restaurant agent so you can run end-to-end on day one. To evaluate your own:

Replace the system prompt and tool schemas in voxarena/agent.py
Implement (or stub) your tools in voxarena/tools.py
Re-record script/audio/*.wav and update script/utterances.yaml to reflect your real workload
Run the arena as normal — every provider gets scored against your scripts

Scripted Conversations

Conversations live in script/utterances.yaml. Each turn pairs an utterance id with an expect block describing the correct tool call and/or response content:

- id: u04
  text: "Are you open on Sundays?"
  expect:
    tool: get_hours
    args:
      day: sunday
    response_contains:
      - "closed"

The harness plays script/audio/{id}.wav into the pipeline and scores the agent's actual tool calls and transcript against expect.

Configuration

Variable	Description
`GOOGLE_API_KEY` / `OPENAI_API_KEY`	Provider credentials
`GEMINI_MODEL` / `OPENAI_MODEL`	Realtime model under test
`GEMINI_EVAL_MODEL` / `OPENAI_EVAL_MODEL`	Cheaper text models for grading
`PORT`	FastAPI server port
`BASE_DIR`	Override workdir (CLI: `--workdir`)

Contributing

To add a new provider: implement an adapter in voxarena/providers/ following the pattern in gemini.py / openai.py, wire it into voxarena/harness.py and voxarena/config.py, and open a PR.

For bugs and feature requests, please open an issue.

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

simkeyur

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.12

Jun 14, 2026

0.1.11

Jun 14, 2026

0.1.10

Jun 14, 2026

0.1.9

Jun 14, 2026

0.1.8

Jun 14, 2026

0.1.7

Jun 14, 2026

0.1.6

Jun 14, 2026

0.1.5

Jun 13, 2026

0.1.4

Jun 13, 2026

0.1.3

Jun 13, 2026

0.1.2

Jun 13, 2026

0.1.1

Jun 13, 2026

This version

0.1.0

Jun 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxarena-0.1.0.tar.gz (38.3 kB view details)

Uploaded Jun 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voxarena-0.1.0-py3-none-any.whl (42.4 kB view details)

Uploaded Jun 13, 2026 Python 3

File details

Details for the file voxarena-0.1.0.tar.gz.

File metadata

Download URL: voxarena-0.1.0.tar.gz
Upload date: Jun 13, 2026
Size: 38.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for voxarena-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`67e25bb44a382acbc924310872395d6882aaa76eeb09c927dd597a63c0f18961`
MD5	`695b06c18cddbef03c5cc4bb75694ca3`
BLAKE2b-256	`a0ba14a8883447f7f95e91160e1c4e5f932649c4cb337e16d9a063e312fa2bb5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxarena-0.1.0.tar.gz:

Publisher: publish.yml on simkeyur/vox-arena

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voxarena-0.1.0.tar.gz
- Subject digest: 67e25bb44a382acbc924310872395d6882aaa76eeb09c927dd597a63c0f18961
- Sigstore transparency entry: 1810165935
- Sigstore integration time: Jun 13, 2026
Source repository:
- Permalink: simkeyur/vox-arena@bdc7c903f2a4fc692dd544b4d7eac38796cca80f
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/simkeyur
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bdc7c903f2a4fc692dd544b4d7eac38796cca80f
- Trigger Event: push

File details

Details for the file voxarena-0.1.0-py3-none-any.whl.

File metadata

Download URL: voxarena-0.1.0-py3-none-any.whl
Upload date: Jun 13, 2026
Size: 42.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for voxarena-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2c9b3991dc2c66995a60571c8abd851fce6abb8780ddab4c1537b394aed8ca07`
MD5	`5b252c8cdf076909e165591db7cc9cd1`
BLAKE2b-256	`a509de2ded5ce1dd31b219c0350a4a3b5f5ee0ab14737776acc3c8224836b684`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxarena-0.1.0-py3-none-any.whl:

Publisher: publish.yml on simkeyur/vox-arena

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voxarena-0.1.0-py3-none-any.whl
- Subject digest: 2c9b3991dc2c66995a60571c8abd851fce6abb8780ddab4c1537b394aed8ca07
- Sigstore transparency entry: 1810166084
- Sigstore integration time: Jun 13, 2026
Source repository:
- Permalink: simkeyur/vox-arena@bdc7c903f2a4fc692dd544b4d7eac38796cca80f
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/simkeyur
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bdc7c903f2a4fc692dd544b4d7eac38796cca80f
- Trigger Event: push

voxarena 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🚀 CI & Pipeline Integration

Compare two providers in one shot

GitHub Actions

Subcommands

Features

Architecture

Local Dev (with UI)

Bring Your Own Agent

Scripted Conversations

Configuration

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance