Standalone Agent Evaluation Framework (AEF)
Project description
AEF - Agent Evaluation Framework
AEF is a framework to generate tests, run/evaluate trajectories, collect feedback, and self-evolve agent behavior.
The workflow is intentionally minimal and framework-agnostic:
aef generatecalls the generation component/toolaef evaluatecalls the evaluation component/toolaef feedbackcalls the feedback component/toolaef evolvecalls the evolution component/tool
Internally, these are routed through an A2A bus so the same flow works for sub-agents implemented with different frameworks.
Installation
From PyPI (Coming Soon)
Once published, install via pip or uv:
pip install aef-framework
or with uv:
uv pip install aef-framework
Local Development Install with uv
AEF uses uv for fast, reliable Python package management.
1. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
2. Create a virtual environment
cd AEF
uv venv --python=3.11
This creates a .venv directory with Python 3.11 (or use 3.10, 3.12 as needed).
3. Activate the virtual environment
source .venv/bin/activate # Linux/macOS
# or
.venv\Scripts\activate # Windows
4. Install AEF in editable mode
uv pip install -e .
This installs AEF and all dependencies, making the aef command available.
5. Verify installation
aef --help
Traditional pip install (local)
If you prefer using pip:
python -m venv .venv
source .venv/bin/activate
pip install -e .
Core Principles
- Universal sub-agent support via adapter contract (
python,cli,http) - Single essential loop: Generate → Evaluate → Feedback → Evolve
- Composable A2A components instead of tightly-coupled command logic
- Versioned evolution profiles with before/after evaluation comparison
Essential Workflow
1) Generate trajectories
aef generate --config configs/fleet_ccc_run.json --n 10
2) Evaluate against a golden run
aef evaluate --config configs/fleet_ccc_run.json --golden run_YYYYMMDD_xxxxxx
3) Submit feedback
aef feedback --agent fleet_ccc --text "Agent should ask confirmation before delete operations"
4) Evolve (auto-apply + compare)
aef evolve --config configs/fleet_ccc_run.json --n 10
aef evolve now performs:
- baseline evaluate
- classify feedback into amendments
- apply evolution profile
- re-evaluate and report before/after score delta
Use AEF With Any Sub-Agent
Set agent.adapter_type in your config:
python: ADK/Python agent entrypointmodule_or_file.py:agent_varcli: shell command template using{step}/{goal}placeholdershttp: endpoint that accepts{ goal, step, session_id? }
See detailed usage in docs/USING_ANY_SUBAGENT.md.
Full prerequisites and onboarding checklist:
A2A Components
AEF components exposed through the internal bus:
generation.generateevaluation.evaluatefeedback.submit_textfeedback.submit_annotationsevolution.evolve
Evolution Outputs
Evolution applies and versions runtime amendments per agent under:
prompts/evolution_profiles/<agent>/latest.jsonprompts/evolution_profiles/<agent>/profile_<timestamp>.json
These profiles contain:
- prompt addenda
- tool policies
- generator hints
- agent hints
- rubric updates
Minimal Command Reference
# Generate
aef generate --config <config.json> --n 10
# Direct A2A tool call
aef a2a --config <config.json> --component generation --tool generate --payload '{"n": 2}'
# Evaluate golden by run id
aef evaluate --config <config.json> --golden <run_id>
# Feedback
aef feedback --agent <agent_name> --text "..."
# Evolve
aef evolve --config <config.json> --n 10
# Compare two eval runs
aef compare --run <run_a> --vs <run_b>
# Query runs / memory
aef query runs --agent <agent_name>
aef query memory --agent <agent_name> --all-memory
aef query memory --agent <agent_name> --history
Documentation
- docs/AEF_WORKFLOW.md
- docs/A2A_COMPONENTS.md
- docs/USING_ANY_SUBAGENT.md
- docs/SELF_EVOLUTION.md
- docs/PUBLISHING.md - PyPI package publishing guide
Contributing
Contributions are welcome! See CONTRIBUTING.md for development setup and guidelines.
License
AEF is released under the Apache License 2.0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aef_framework-0.1.1.tar.gz.
File metadata
- Download URL: aef_framework-0.1.1.tar.gz
- Upload date:
- Size: 1.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a06156357697abe955c91233a3fcae5db4f8fa337fd30a6fca1878e65e1e9e14
|
|
| MD5 |
af79db1d0866033c64d56471ed76b53c
|
|
| BLAKE2b-256 |
4a339be3b83a6debb8e7fa0d4f39ae9ff4d4e295f2056ef6b6b2d6604eaad934
|
File details
Details for the file aef_framework-0.1.1-py3-none-any.whl.
File metadata
- Download URL: aef_framework-0.1.1-py3-none-any.whl
- Upload date:
- Size: 80.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fcb812893f6b77f63c5af05da0a60e9491cec98a68e182dc76a6cf864a0fda9
|
|
| MD5 |
86fde6e190ab59bf974c3e3c942b7121
|
|
| BLAKE2b-256 |
7bb4b60a9e85a341f1995b651bd17c5b8ca26889e635aefd9349042d89e38a43
|