Pydantic contracts and JSON Schemas for portable agent evaluation records.
Project description
Agent Eval Contract
Pydantic contracts and JSON Schemas for portable agent evaluation records.
Use this package when you are experimenting with agents, harnesses, CI checks, or benchmark runners and need a stable record shape for tasks, runs, scores, failures, and normalized external results. It does not run evaluations, call model providers, store dashboards, or orchestrate agents. It gives those tools a shared contract.
Install
pip install agent-eval-contract
For local development from this repo:
uv sync --dev
Validate A Record
from agent_eval_contract import validate_eval_run
run = validate_eval_run(
{
"run_id": "run-login-flow-001",
"task_id": "task-login-flow-001",
"harness": "pytest",
"model": "gpt-5",
"mode": "autonomous",
"context_profile": "repo_only",
"final_status": "success",
"checks": ["pytest tests/test_auth_redirect.py -q"],
}
)
print(run.model_dump(mode="json"))
Validation returns typed Pydantic model instances. Invalid records raise pydantic.ValidationError with structured field errors.
CLI
agent-eval-contract validate --kind run --file examples/eval_run.json
agent-eval-contract schemas --output-dir /tmp/agent-eval-contract-schemas
agent-eval-contract fixtures --output-dir /tmp/agent-eval-contract-fixtures
agent-eval-contract normalize --harness terminal-bench --file examples/terminal_bench_result.json --task-id task-login-flow-001 --model gpt-5
agent-eval-contract normalize --harness swe-bench --file examples/swe_bench_result.json
The legacy agent-eval-contract-fixtures command still writes fixture bundles for one release.
What It Provides
- Pydantic models for eval tasks, runs, scores, failures, external results, normalized runs, and fixture manifests
- runtime validators that return typed model instances
- JSON Schema export for all public models
- bundled sample records and markdown templates
- Terminal-Bench and SWE-bench oriented normalization helpers
- a small CLI for validation, schema export, fixture generation, and normalization
Contract Vocabulary
The public core uses generic vocabulary only. Project-specific concepts should live in metadata or a separate adapter package.
context_profile:repo_only,provided_context,clean_room,tool_augmented,full_workspacesource:manual,ci,benchmark,production_trace,syntheticmode:interactive,autonomous,shadow,replay,benchmarkfinal_status:success,partial,failed,abandoned,error
See docs/contract.md, docs/field-reference.md, and docs/adapters.md for the model contract and adapter guidance.
Development
uv run ruff check agent_eval_contract tests
uv run ruff format --check agent_eval_contract tests
uv run basedpyright agent_eval_contract tests
uv run pytest -q
uv build --out-dir /tmp/agent-eval-contract-dist
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_eval_contract-0.2.0.tar.gz.
File metadata
- Download URL: agent_eval_contract-0.2.0.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b345a6b67d08e8afa86e9bb2a27ace9d65446db72dcb7589482db163725f5f3
|
|
| MD5 |
3fc212fb53663ab9165704aa482549e7
|
|
| BLAKE2b-256 |
235e63d9c428598fe73c5d445933378ada8d6aae76a69e0d89211fc5693cd22c
|
File details
Details for the file agent_eval_contract-0.2.0-py3-none-any.whl.
File metadata
- Download URL: agent_eval_contract-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a6e2994945b920985e566f3627184c8a584305027b3675f8e65d038afc86cbd
|
|
| MD5 |
933a1ef8612dcadefa47308fd8d227ed
|
|
| BLAKE2b-256 |
6b03e502f3561f1e991fd1d5b4ca4b9e916d0e18335719a048b43814cb92a628
|