Standalone Agent Evaluation Framework (AEF)

These details have not been verified by PyPI

Project links

Project description

AEF - Agent Evaluation Framework

AEF is a framework to generate tests, run/evaluate trajectories, collect feedback, and self-evolve agent behavior.

The workflow is intentionally minimal and framework-agnostic:

aef generate calls the generation component/tool
aef evaluate calls the evaluation component/tool
aef feedback calls the feedback component/tool
aef evolve calls the evolution component/tool

Internally, these are routed through an A2A bus so the same flow works for sub-agents implemented with different frameworks.

Installation

From PyPI

Install via pip or uv:

pip install aef-framework

or with uv:

uv pip install aef-framework

Local Development Install with uv

AEF uses uv for fast, reliable Python package management.

1. Install uv (if not already installed)

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Create a virtual environment

cd AEF
uv venv --python=3.11

This creates a .venv directory with Python 3.11 (or use 3.10, 3.12 as needed).

3. Activate the virtual environment

source .venv/bin/activate  # Linux/macOS
# or
.venv\Scripts\activate     # Windows

4. Install AEF in editable mode

uv pip install -e .

This installs AEF and all dependencies, making the aef command available.

5. Verify installation

aef --help

Traditional pip install (local)

If you prefer using pip:

python -m venv .venv
source .venv/bin/activate
pip install -e .

Core Principles

Universal sub-agent support via adapter contract (python, cli, http)
Single essential loop: Generate → Evaluate → Feedback → Evolve
Composable A2A components instead of tightly-coupled command logic
Versioned evolution profiles with before/after evaluation comparison

Basic Workflow

1) Generate trajectories

aef generate --config configs/fleet_ccc_run.json --n 10

2) Evaluate against a golden run

aef evaluate --config configs/fleet_ccc_run.json --golden run_YYYYMMDD_xxxxxx

3) Submit feedback

aef feedback --agent fleet_ccc --text "Agent should ask confirmation before delete operations"

4) Evolve (auto-apply + compare)

aef evolve --config configs/fleet_ccc_run.json --n 10

aef evolve now performs:

baseline evaluate
classify feedback into amendments
apply evolution profile
re-evaluate and report before/after score delta

Use AEF With Any Sub-Agent

Set agent.adapter_type in your config:

python: ADK/Python agent entrypoint module_or_file.py:agent_var
cli: shell command template using {step} / {goal} placeholders
http: endpoint that accepts { goal, step, session_id? }

See detailed usage in docs/USING_ANY_SUBAGENT.md.

Runtime endpoint mode (`--agent_endpoint`)

In addition to config-defined adapters, you can override execution at runtime:

If --agent_endpoint is provided, AEF routes AUT calls through HTTP endpoint mode.
If --agent_endpoint is not provided, AEF keeps existing behavior (for example local --sub_agent / config adapter).

Endpoint-mode guarantee:

AEF uses the hosted endpoint runner path for AUT execution.
Local Python entrypoint loading is not required in this mode.
Local entrypoint import/path issues do not block endpoint-mode execution.
Endpoint runner uses ADK server contract: create/reuse session, then call POST /run.

This is useful when the same agent can run locally in development and remotely in a hosted ADK/A2A service.

Examples:

# Generate through hosted endpoint
aef generate --config configs/fleet_ilo_run.json \
	--agent_endpoint http://localhost:8086/docs/ --n 2

# Evaluate through hosted endpoint
aef evaluate --config configs/fleet_ilo_run.json \
	--agent_endpoint http://localhost:8086/docs/ --golden run_YYYYMMDD_xxxxxx

# Evolve through hosted endpoint
aef evolve --config configs/fleet_ilo_run.json \
	--agent_endpoint http://localhost:8086/docs/ --n 5

Notes:

/docs/ URLs are supported (AEF resolves them to API base).
Endpoint mode is available on generate, evaluate, run, and evolve.
Endpoint mode is intended for ADK/A2A-hosted AUTs where the AUT is reachable over API.

ADK endpoint contract used by --agent_endpoint:

Session bootstrap: POST /apps/{app_name}/users/{user_id}/sessions/{session_id}
Inference: POST /run
Request payload: {"appName","userId","sessionId","newMessage":{"role":"user","parts":[{"text":...}]},"streaming":false}
A 409 Conflict during session creation is treated as expected session reuse.

Trajectory logging in endpoint mode:

steps[].content stores assistant text response.
steps[].tool_calls, steps[].tool_responses, and steps[].tools_used store tool trace data when present.

Full prerequisites and onboarding checklist:

docs/ADOPTING_NEW_AGENT.md

A2A Components

AEF components exposed through the internal bus:

generation.generate
evaluation.evaluate
feedback.submit_text
feedback.submit_annotations
evolution.evolve

See docs/A2A_COMPONENTS.md.

Evolution Outputs

Evolution applies and versions runtime amendments per agent under:

prompts/evolution_profiles/<agent>/latest.json
prompts/evolution_profiles/<agent>/profile_<timestamp>.json

These profiles contain:

prompt addenda
tool policies
generator hints
agent hints
rubric updates

See docs/SELF_EVOLUTION.md.

Web UI

AEF includes a Next.js web interface for managing agents, running benchmarks, reviewing trajectories, and tracking evolution.

Option A — Docker (recommended)

Run both the backend API and the web UI with a single command:

docker compose up -d

Frontend: http://localhost:3010
Backend API: http://localhost:8001

To rebuild after code changes:

docker compose up -d --build

To stop:

docker compose down

Your run database (aef_runs.db), configs, outputs, and annotated data are bind-mounted from the repo root and persist across restarts.

Option B — Local development

Start the backend:

uvicorn aef.api.main:app --reload --port 8001

Start the UI:

cd aef-ui
npm install
npm run dev

Open http://localhost:3010. See aef-ui/README.md for full details.

UI Pages

Page	Description
Dashboard	Score trends, model cost breakdown, runs-over-time — shows only COMPLETE runs
Agent Config	Register local Python agents or HTTP endpoints
Generate	Run trajectory generation with live progress, view past runs with expandable step-by-step conversations
Feedback	Review GENERATED trajectories with full multi-step detail, submit per-trajectory quality ratings
Evaluate	Select a golden trajectory run, re-execute and score against it with dimension radar charts
Evolve	Run self-improvement cycles on evaluated runs, manage memory deltas
Query	Browse runs, trajectories, evaluations and execute raw SQL

Pipeline Flow

Generate → (GENERATED) → Feedback → (COMPLETE) → Evaluate → (COMPLETE/REGRESSION) → Evolve

Each UI page filters to the appropriate pipeline stage so you only see runs relevant to that step.

Minimal Command Reference

# Generate
aef generate --config <config.json> --n 10

# Generate via hosted endpoint override
aef generate --config <config.json> --agent_endpoint <http://host:port/docs/> --n 10

# Direct A2A tool call
aef a2a --config <config.json> --component generation --tool generate --payload '{"n": 2}'

# Evaluate golden by run id
aef evaluate --config <config.json> --golden <run_id>

# Evaluate via hosted endpoint override
aef evaluate --config <config.json> --agent_endpoint <http://host:port/docs/> --golden <run_id>

# Feedback
aef feedback --agent <agent_name> --text "..."

# Evolve
aef evolve --config <config.json> --n 10

# Evolve via hosted endpoint override
aef evolve --config <config.json> --agent_endpoint <http://host:port/docs/> --n 10

# Compare two eval runs
aef compare --run <run_a> --vs <run_b>

# Query runs / memory
aef query runs --agent <agent_name>
aef query memory --agent <agent_name> --all-memory
aef query memory --agent <agent_name> --history

Documentation

docs/AEF_WORKFLOW.md
docs/A2A_COMPONENTS.md
docs/USING_ANY_SUBAGENT.md
docs/SELF_EVOLUTION.md
docs/PUBLISHING.md - PyPI package publishing guide

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup and guidelines.

License

AEF is released under the Apache License 2.0. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.5

May 6, 2026

0.1.2

Mar 20, 2026

0.1.1

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aef_framework-0.1.5.tar.gz (2.3 MB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aef_framework-0.1.5-py3-none-any.whl (122.8 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file aef_framework-0.1.5.tar.gz.

File metadata

Download URL: aef_framework-0.1.5.tar.gz
Upload date: May 6, 2026
Size: 2.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for aef_framework-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`58f30f830c00c29c454f7e2355a699ca9dc5d6a0dc72903e0c2cf5bb7dd49e9b`
MD5	`8341acf8ce5b65a59c442af3e2587158`
BLAKE2b-256	`55fa4822b52ae09baea36fafad15c0ca98e89320a9a36effe9734c8f0a9e56f7`

See more details on using hashes here.

File details

Details for the file aef_framework-0.1.5-py3-none-any.whl.

File metadata

Download URL: aef_framework-0.1.5-py3-none-any.whl
Upload date: May 6, 2026
Size: 122.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for aef_framework-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ea2e84e578e261500de0e8c4d1745818b71cd3b85c916ee26983e572dd38cec`
MD5	`bc803139070561a21efac0c794631fb3`
BLAKE2b-256	`18080c8e9936aeefe095dd24820a3933e6c830c0bdb38f3fd6b3beed4972aa1c`

See more details on using hashes here.

aef-framework 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AEF - Agent Evaluation Framework

Installation

From PyPI

Local Development Install with uv

1. Install uv (if not already installed)

2. Create a virtual environment

3. Activate the virtual environment

4. Install AEF in editable mode

5. Verify installation

Traditional pip install (local)

Core Principles

Basic Workflow

1) Generate trajectories

2) Evaluate against a golden run

3) Submit feedback

4) Evolve (auto-apply + compare)

Use AEF With Any Sub-Agent

Runtime endpoint mode (--agent_endpoint)

A2A Components

Evolution Outputs

Web UI

Option A — Docker (recommended)

Option B — Local development

UI Pages

Pipeline Flow

Minimal Command Reference

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Runtime endpoint mode (`--agent_endpoint`)