Skip to main content

Framework for evaluating AI agents across multiple quality dimensions

Project description

agent-eval

Framework for evaluating AI agents across multiple quality dimensions

CI PyPI version Python versions License

Part of the AumOS open-source agent infrastructure portfolio.


Features

  • EvalRunner orchestrates multi-run evaluation with configurable concurrency, per-case timeouts, retry logic, and fail-fast mode
  • Six evaluation dimensions — ACCURACY, LATENCY, COST, SAFETY, FORMAT, and CUSTOM — each producing a normalized [0.0, 1.0] score with a pass/fail determination
  • BenchmarkSuite YAML loader with per-case expected outputs, latency budgets, and cost caps; SuiteBuilder for programmatic construction
  • LLM-judge evaluator alongside deterministic accuracy, latency, cost, and format evaluators — all implementing the Evaluator ABC
  • Agent adapters for LangChain, CrewAI, AutoGen, OpenAI Agents, and plain callables so any agent can be wrapped without code changes
  • Reporting in JSON, Markdown, HTML, and rich console output with per-dimension aggregate statistics
  • Quality gates (ThresholdGate, CompositeGate) that turn evaluation results into CI pass/fail signals

Quick Start

Install from PyPI:

pip install agent-eval

Verify the installation:

agent-eval version

Basic usage:

import agent_eval

# See examples/01_quickstart.py for a working example

Documentation

Enterprise Upgrade

For production deployments requiring SLA-backed support and advanced integrations, contact the maintainers or see the commercial extensions documentation.

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a pull request.

License

Apache 2.0 — see LICENSE for full terms.


Part of AumOS — open-source agent infrastructure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aumos_agent_eval-0.2.0.tar.gz (190.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aumos_agent_eval-0.2.0-py3-none-any.whl (142.6 kB view details)

Uploaded Python 3

File details

Details for the file aumos_agent_eval-0.2.0.tar.gz.

File metadata

  • Download URL: aumos_agent_eval-0.2.0.tar.gz
  • Upload date:
  • Size: 190.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for aumos_agent_eval-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b8971940e366bede58bd69761bc01e5f4e641b26b331283470d997631d5d10ae
MD5 2b7e4827ae25c817e26c35f9e174302f
BLAKE2b-256 e83f7aaa06874e2bbafe98d2381f815c7e3c74f5935d16f13761c81dac59d55f

See more details on using hashes here.

File details

Details for the file aumos_agent_eval-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aumos_agent_eval-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e18a47f4afe2f0660bb36814e76e0cb491dffe7309d5ed49d9985679ead250f1
MD5 89bcee3a369aaee3468140177b0dbfa8
BLAKE2b-256 1fdd10eb3e672b44781d9c73cfd4e350e25df41d5a9ba8b0609c588a98bffbd2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page