Framework for evaluating AI agents across multiple quality dimensions

These details have not been verified by PyPI

Project links

Project description

agent-eval

Framework for evaluating AI agents across multiple quality dimensions

Part of the AumOS open-source agent infrastructure portfolio.

Features

EvalRunner orchestrates multi-run evaluation with configurable concurrency, per-case timeouts, retry logic, and fail-fast mode
Six evaluation dimensions — ACCURACY, LATENCY, COST, SAFETY, FORMAT, and CUSTOM — each producing a normalized [0.0, 1.0] score with a pass/fail determination
BenchmarkSuite YAML loader with per-case expected outputs, latency budgets, and cost caps; SuiteBuilder for programmatic construction
LLM-judge evaluator alongside deterministic accuracy, latency, cost, and format evaluators — all implementing the Evaluator ABC
Agent adapters for LangChain, CrewAI, AutoGen, OpenAI Agents, and plain callables so any agent can be wrapped without code changes
Reporting in JSON, Markdown, HTML, and rich console output with per-dimension aggregate statistics
Quality gates (ThresholdGate, CompositeGate) that turn evaluation results into CI pass/fail signals

Quick Start

Install from PyPI:

pip install agent-eval

Verify the installation:

agent-eval version

Basic usage:

import agent_eval

# See examples/01_quickstart.py for a working example

Documentation

Enterprise Upgrade

For production deployments requiring SLA-backed support and advanced integrations, contact the maintainers or see the commercial extensions documentation.

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.

License

Apache 2.0 — see LICENSE for full terms.

Part of AumOS — open-source agent infrastructure.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Feb 27, 2026

0.1.0

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aumos_agent_eval-0.2.0.tar.gz (190.1 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aumos_agent_eval-0.2.0-py3-none-any.whl (142.6 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file aumos_agent_eval-0.2.0.tar.gz.

File metadata

Download URL: aumos_agent_eval-0.2.0.tar.gz
Upload date: Feb 27, 2026
Size: 190.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for aumos_agent_eval-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b8971940e366bede58bd69761bc01e5f4e641b26b331283470d997631d5d10ae`
MD5	`2b7e4827ae25c817e26c35f9e174302f`
BLAKE2b-256	`e83f7aaa06874e2bbafe98d2381f815c7e3c74f5935d16f13761c81dac59d55f`

See more details on using hashes here.

File details

Details for the file aumos_agent_eval-0.2.0-py3-none-any.whl.

File metadata

Download URL: aumos_agent_eval-0.2.0-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 142.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for aumos_agent_eval-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e18a47f4afe2f0660bb36814e76e0cb491dffe7309d5ed49d9985679ead250f1`
MD5	`89bcee3a369aaee3468140177b0dbfa8`
BLAKE2b-256	`1fdd10eb3e672b44781d9c73cfd4e350e25df41d5a9ba8b0609c588a98bffbd2`

See more details on using hashes here.

aumos-agent-eval 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agent-eval

Features

Quick Start

Documentation

Enterprise Upgrade

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes