Framework for evaluating AI agents across multiple quality dimensions
Project description
agent-eval
Framework for evaluating AI agents across multiple quality dimensions
Part of the AumOS open-source agent infrastructure portfolio.
Features
EvalRunnerorchestrates multi-run evaluation with configurable concurrency, per-case timeouts, retry logic, and fail-fast mode- Six evaluation dimensions —
ACCURACY,LATENCY,COST,SAFETY,FORMAT, andCUSTOM— each producing a normalized[0.0, 1.0]score with a pass/fail determination BenchmarkSuiteYAML loader with per-case expected outputs, latency budgets, and cost caps;SuiteBuilderfor programmatic construction- LLM-judge evaluator alongside deterministic accuracy, latency, cost, and format evaluators — all implementing the
EvaluatorABC - Agent adapters for LangChain, CrewAI, AutoGen, OpenAI Agents, and plain callables so any agent can be wrapped without code changes
- Reporting in JSON, Markdown, HTML, and rich console output with per-dimension aggregate statistics
- Quality gates (
ThresholdGate,CompositeGate) that turn evaluation results into CI pass/fail signals
Quick Start
Install from PyPI:
pip install agent-eval
Verify the installation:
agent-eval version
Basic usage:
import agent_eval
# See examples/01_quickstart.py for a working example
Documentation
Enterprise Upgrade
For production deployments requiring SLA-backed support and advanced integrations, contact the maintainers or see the commercial extensions documentation.
Contributing
Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.
License
Apache 2.0 — see LICENSE for full terms.
Part of AumOS — open-source agent infrastructure.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aumos_agent_eval-0.2.0.tar.gz.
File metadata
- Download URL: aumos_agent_eval-0.2.0.tar.gz
- Upload date:
- Size: 190.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8971940e366bede58bd69761bc01e5f4e641b26b331283470d997631d5d10ae
|
|
| MD5 |
2b7e4827ae25c817e26c35f9e174302f
|
|
| BLAKE2b-256 |
e83f7aaa06874e2bbafe98d2381f815c7e3c74f5935d16f13761c81dac59d55f
|
File details
Details for the file aumos_agent_eval-0.2.0-py3-none-any.whl.
File metadata
- Download URL: aumos_agent_eval-0.2.0-py3-none-any.whl
- Upload date:
- Size: 142.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e18a47f4afe2f0660bb36814e76e0cb491dffe7309d5ed49d9985679ead250f1
|
|
| MD5 |
89bcee3a369aaee3468140177b0dbfa8
|
|
| BLAKE2b-256 |
1fdd10eb3e672b44781d9c73cfd4e350e25df41d5a9ba8b0609c588a98bffbd2
|