Skip to main content

Standalone Agent Evaluation Framework (AEF)

Project description

AEF - Agent Evaluation Framework

AEF is a framework to generate tests, run/evaluate trajectories, collect feedback, and self-evolve agent behavior.

The workflow is intentionally minimal and framework-agnostic:

  • aef generate calls the generation component/tool
  • aef evaluate calls the evaluation component/tool
  • aef feedback calls the feedback component/tool
  • aef evolve calls the evolution component/tool

Internally, these are routed through an A2A bus so the same flow works for sub-agents implemented with different frameworks.


Installation

From PyPI (Coming Soon)

Once published, install via pip or uv:

pip install aef-framework

or with uv:

uv pip install aef-framework

Local Development Install with uv

AEF uses uv for fast, reliable Python package management.

1. Install uv (if not already installed)

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Create a virtual environment

cd AEF
uv venv --python=3.11

This creates a .venv directory with Python 3.11 (or use 3.10, 3.12 as needed).

3. Activate the virtual environment

source .venv/bin/activate  # Linux/macOS
# or
.venv\Scripts\activate     # Windows

4. Install AEF in editable mode

uv pip install -e .

This installs AEF and all dependencies, making the aef command available.

5. Verify installation

aef --help

Traditional pip install (local)

If you prefer using pip:

python -m venv .venv
source .venv/bin/activate
pip install -e .

Core Principles

  • Universal sub-agent support via adapter contract (python, cli, http)
  • Single essential loop: Generate → Evaluate → Feedback → Evolve
  • Composable A2A components instead of tightly-coupled command logic
  • Versioned evolution profiles with before/after evaluation comparison

Essential Workflow

1) Generate trajectories

aef generate --config configs/fleet_ccc_run.json --n 10

2) Evaluate against a golden run

aef evaluate --config configs/fleet_ccc_run.json --golden run_YYYYMMDD_xxxxxx

3) Submit feedback

aef feedback --agent fleet_ccc --text "Agent should ask confirmation before delete operations"

4) Evolve (auto-apply + compare)

aef evolve --config configs/fleet_ccc_run.json --n 10

aef evolve now performs:

  1. baseline evaluate
  2. classify feedback into amendments
  3. apply evolution profile
  4. re-evaluate and report before/after score delta

Use AEF With Any Sub-Agent

Set agent.adapter_type in your config:

  • python: ADK/Python agent entrypoint module_or_file.py:agent_var
  • cli: shell command template using {step} / {goal} placeholders
  • http: endpoint that accepts { goal, step, session_id? }

See detailed usage in docs/USING_ANY_SUBAGENT.md.

Full prerequisites and onboarding checklist:


A2A Components

AEF components exposed through the internal bus:

  • generation.generate
  • evaluation.evaluate
  • feedback.submit_text
  • feedback.submit_annotations
  • evolution.evolve

See docs/A2A_COMPONENTS.md.


Evolution Outputs

Evolution applies and versions runtime amendments per agent under:

  • prompts/evolution_profiles/<agent>/latest.json
  • prompts/evolution_profiles/<agent>/profile_<timestamp>.json

These profiles contain:

  • prompt addenda
  • tool policies
  • generator hints
  • agent hints
  • rubric updates

See docs/SELF_EVOLUTION.md.


Minimal Command Reference

# Generate
aef generate --config <config.json> --n 10

# Direct A2A tool call
aef a2a --config <config.json> --component generation --tool generate --payload '{"n": 2}'

# Evaluate golden by run id
aef evaluate --config <config.json> --golden <run_id>

# Feedback
aef feedback --agent <agent_name> --text "..."

# Evolve
aef evolve --config <config.json> --n 10

# Compare two eval runs
aef compare --run <run_a> --vs <run_b>

# Query runs / memory
aef query runs --agent <agent_name>
aef query memory --agent <agent_name> --all-memory
aef query memory --agent <agent_name> --history

Documentation


Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup and guidelines.


License

AEF is released under the Apache License 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aef_framework-0.1.1.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aef_framework-0.1.1-py3-none-any.whl (80.0 kB view details)

Uploaded Python 3

File details

Details for the file aef_framework-0.1.1.tar.gz.

File metadata

  • Download URL: aef_framework-0.1.1.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for aef_framework-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a06156357697abe955c91233a3fcae5db4f8fa337fd30a6fca1878e65e1e9e14
MD5 af79db1d0866033c64d56471ed76b53c
BLAKE2b-256 4a339be3b83a6debb8e7fa0d4f39ae9ff4d4e295f2056ef6b6b2d6604eaad934

See more details on using hashes here.

File details

Details for the file aef_framework-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: aef_framework-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 80.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for aef_framework-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1fcb812893f6b77f63c5af05da0a60e9491cec98a68e182dc76a6cf864a0fda9
MD5 86fde6e190ab59bf974c3e3c942b7121
BLAKE2b-256 7bb4b60a9e85a341f1995b651bd17c5b8ca26889e635aefd9349042d89e38a43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page