A framework for evaluating, monitoring, and benchmarking multi-agent systems

These details have not been verified by PyPI

Project links

Project description

AgentUnit

AgentUnit is a framework for evaluating, monitoring, and benchmarking multi-agent systems. It standardises how teams define scenarios, run experiments, and report outcomes across adapters, model providers, and deployment targets.

Overview

Scenario-centric design – describe datasets, adapters, and policies once, then reuse them in local runs, CI jobs, and production monitors.
Extensible adapters – plug into LangGraph, CrewAI, PromptFlow, OpenAI Swarm, Anthropic Bedrock, Phidata, and custom agents through a consistent interface.
Comprehensive metrics – combine exact-match assertions, RAGAS quality scores, and operational metrics with optional OpenTelemetry traces.
Production-first tooling – export JSON, Markdown, and JUnit reports, gate releases with regression detection, and surface telemetry in existing observability stacks.

Installation

AgentUnit requires Python 3.10 or later. The recommended workflow uses Poetry for dependency management.

git clone https://github.com/aviralgarg05/agentunit.git
cd agentunit
poetry install
poetry shell

To use pip instead:

python -m venv .venv
source .venv/bin/activate
pip install -e .

Optional integrations are published as extras; install only what you need:

poetry install --with promptflow,crewai,langgraph
# or with pip
pip install agentunit[promptflow,crewai,langgraph]

Optional Extras

Extra	Includes	Use Case
`promptflow`	`promptflow>=1.0.0`	Azure PromptFlow integration
`crewai`	`crewai>=0.201.1`	CrewAI multi-agent orchestration
`langgraph`	`langgraph>=1.0.0a4`	LangGraph state machines
`openai`	`openai>=1.0.0`	OpenAI models and Swarm
`anthropic`	`anthropic>=0.18.0`	Claude/Bedrock integration
`phidata`	`phidata>=2.0.0`	Phidata agents
`all`	All above extras	Complete installation

Refer to the adapters guide for per-adapter requirements and feature support matrices.

Quickstart

2-Minute Copy-Paste Example

Create a file example_suite.py:

from agentunit import Scenario, DatasetCase, Runner
from agentunit.adapters import MockAdapter
from agentunit.metrics import ExactMatch

# Define test cases
cases = [
    DatasetCase(
        id="math_1",
        query="What is 2 + 2?",
        expected_output="4"
    ),
    DatasetCase(
        id="capital_1",
        query="What is the capital of France?",
        expected_output="Paris"
    )
]

# Create scenario
scenario = Scenario(
    name="Basic Q&A Test",
    adapter=MockAdapter(),  # Replace with your adapter
    dataset=cases,
    metrics=[ExactMatch()]
)

# Run evaluation
runner = Runner()
results = runner.run(scenario)

# Print results
print(f"Success rate: {results.success_rate:.1%}")
print(f"Average latency: {results.avg_latency:.2f}s")

Run it:

python example_suite.py

YAML Configuration Example

Create example_suite.yaml:

name: "Customer Support Q&A"
description: "Evaluate customer support agent responses"

adapter:
  type: "openai"
  config:
    model: "gpt-4"
    temperature: 0.7
    max_tokens: 500

dataset:
  cases:
    - input: "How do I reset my password?"
      expected: "Use the 'Forgot Password' link on the login page"
      metadata:
        category: "account"
    
    - input: "What are your business hours?"
      expected: "Monday-Friday 9AM-5PM EST"
      metadata:
        category: "general"

metrics:
  - "exact_match"
  - "semantic_similarity"
  - "latency"

timeout: 30
retries: 2

Run it with the CLI:

agentunit example_suite.yaml \
  --json results.json \
  --markdown results.md \
  --junit results.xml

Getting started

Follow the Quickstart above for a 2-minute runnable example.
Review Writing Scenarios for dataset and adapter templates plus helper constructors for popular frameworks.
Consult the CLI reference to orchestrate suites from the command line and export results for CI, dashboards, or audits.
Explore the adapters guide for concrete adapter implementations and feature support.
Check the metrics catalog for all available evaluation metrics.

CLI Usage

AgentUnit exposes an agentunit CLI entry point once installed. Typical usage:

agentunit path.to.suite \
  --metrics faithfulness answer_correctness \
  --json reports/results.json \
  --markdown reports/results.md \
  --junit reports/results.xml

Programmatic runners are available through agentunit.core.Runner for notebook- or script-driven workflows.

Documentation map

Topic	Reference
Quick evaluation walkthrough	Quickstart
Scenario and adapter authoring	docs/writing-scenarios.md
Adapter implementations guide	docs/adapters.md
Metrics catalog and reference	docs/metrics-catalog.md
CLI options and examples	docs/cli.md
Architecture overview	docs/architecture.md
Framework-specific guides	docs/platform-guides.md
No-code builder guide	docs/nocode-quickstart.md
OpenTelemetry integration	docs/telemetry.md
Performance testing	docs/performance-testing.md
Comparison to other tools	docs/comparison.md
Templates	docs/templates/

Use the table above as the canonical navigation surface; every document cross-links back to related topics for clarity.

Development workflow

Install dependencies (Poetry or pip).
Run the unit and integration suite:

poetry run python3 -m pytest tests -v

Execute targeted suites during active development, then run the full matrix before opening a pull request.

Latest verification (2025-10-24): 144 passed, 10 skipped, 32 warnings. Warnings originate from third-party dependencies (langchain pydantic shim deprecations and datetime.utcnow usage). Track upstream fixes or pin patched releases as needed.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

Development setup and workflow
Code style and linting guidelines
Testing requirements
Pull request process
Issue labels and tags for open source events

Security disclosures and sensitive topics should follow responsible disclosure guidelines outlined in SECURITY.md.

License

AgentUnit is released under the MIT License. See LICENSE for the full text.

Need an overview for stakeholders? Start with docs/architecture.md. Ready to extend the platform? Explore the templates under docs/templates/.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.7.0

Nov 23, 2025

0.6.0

Oct 24, 2025

0.5.0

Oct 7, 2025

0.4.0

Oct 1, 2025

0.3.0

Sep 29, 2025

0.1.0

Sep 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentunit-0.7.0.tar.gz (157.7 kB view details)

Uploaded Nov 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentunit-0.7.0-py3-none-any.whl (208.3 kB view details)

Uploaded Nov 23, 2025 Python 3

File details

Details for the file agentunit-0.7.0.tar.gz.

File metadata

Download URL: agentunit-0.7.0.tar.gz
Upload date: Nov 23, 2025
Size: 157.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentunit-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`f06130ff7e77acf82a4a4513569783d5a7099932b6f3f531ecbede2a05cbdaf5`
MD5	`03a9ca11beb584fde815f4f4d69125dd`
BLAKE2b-256	`fb3f7af2d238adfa532ffd4f61ed3ca26b249587fe613410d78a91c39c592066`

See more details on using hashes here.

File details

Details for the file agentunit-0.7.0-py3-none-any.whl.

File metadata

Download URL: agentunit-0.7.0-py3-none-any.whl
Upload date: Nov 23, 2025
Size: 208.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentunit-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`166cb503372972e495a7603fb6ad8ce90ef1757026ac506ff094559f5bc5fe36`
MD5	`79c1d706f7f893127473e2f669028830`
BLAKE2b-256	`dc50b0ac2c2d013d0f01ace54c7df4a2ce7fdb858963539112bb2b4d098c3c2b`

See more details on using hashes here.

agentunit 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentUnit

Overview

Installation

Optional Extras

Quickstart

2-Minute Copy-Paste Example

YAML Configuration Example

Getting started

CLI Usage

Documentation map

Development workflow

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes