Pydantic contracts and JSON Schemas for portable agent evaluation records.

These details have not been verified by PyPI

Project links

Project description

Agent Eval Contract

Pydantic contracts and JSON Schemas for portable agent evaluation records.

Use this package when you are experimenting with agents, harnesses, CI checks, or benchmark runners and need a stable record shape for tasks, runs, scores, failures, and normalized external results. It does not run evaluations, call model providers, store dashboards, or orchestrate agents. It gives those tools a shared contract.

Install

pip install agent-eval-contract

For local development from this repo:

uv sync --dev

Validate A Record

from agent_eval_contract import validate_eval_run

run = validate_eval_run(
    {
        "run_id": "run-login-flow-001",
        "task_id": "task-login-flow-001",
        "harness": "pytest",
        "model": "gpt-5",
        "mode": "autonomous",
        "context_profile": "repo_only",
        "final_status": "success",
        "checks": ["pytest tests/test_auth_redirect.py -q"],
    }
)

print(run.model_dump(mode="json"))

Validation returns typed Pydantic model instances. Invalid records raise pydantic.ValidationError with structured field errors.

CLI

agent-eval-contract validate --kind run --file examples/eval_run.json
agent-eval-contract schemas --output-dir /tmp/agent-eval-contract-schemas
agent-eval-contract fixtures --output-dir /tmp/agent-eval-contract-fixtures
agent-eval-contract normalize --harness terminal-bench --file examples/terminal_bench_result.json --task-id task-login-flow-001 --model gpt-5
agent-eval-contract normalize --harness swe-bench --file examples/swe_bench_result.json

The legacy agent-eval-contract-fixtures command still writes fixture bundles for one release.

What It Provides

Pydantic models for eval tasks, runs, scores, failures, external results, normalized runs, and fixture manifests
runtime validators that return typed model instances
JSON Schema export for all public models
bundled sample records and markdown templates
Terminal-Bench and SWE-bench oriented normalization helpers
a small CLI for validation, schema export, fixture generation, and normalization

Contract Vocabulary

The public core uses generic vocabulary only. Project-specific concepts should live in metadata or a separate adapter package.

context_profile: repo_only, provided_context, clean_room, tool_augmented, full_workspace
source: manual, ci, benchmark, production_trace, synthetic
mode: interactive, autonomous, shadow, replay, benchmark
final_status: success, partial, failed, abandoned, error

See docs/contract.md, docs/field-reference.md, and docs/adapters.md for the model contract and adapter guidance.

Development

uv run ruff check agent_eval_contract tests
uv run ruff format --check agent_eval_contract tests
uv run basedpyright agent_eval_contract tests
uv run pytest -q
uv build --out-dir /tmp/agent-eval-contract-dist

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jul 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_eval_contract-0.2.0.tar.gz (21.4 kB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_eval_contract-0.2.0-py3-none-any.whl (20.4 kB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file agent_eval_contract-0.2.0.tar.gz.

File metadata

Download URL: agent_eval_contract-0.2.0.tar.gz
Upload date: Jul 4, 2026
Size: 21.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_eval_contract-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`5b345a6b67d08e8afa86e9bb2a27ace9d65446db72dcb7589482db163725f5f3`
MD5	`3fc212fb53663ab9165704aa482549e7`
BLAKE2b-256	`235e63d9c428598fe73c5d445933378ada8d6aae76a69e0d89211fc5693cd22c`

See more details on using hashes here.

File details

Details for the file agent_eval_contract-0.2.0-py3-none-any.whl.

File metadata

Download URL: agent_eval_contract-0.2.0-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 20.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_eval_contract-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9a6e2994945b920985e566f3627184c8a584305027b3675f8e65d038afc86cbd`
MD5	`933a1ef8612dcadefa47308fd8d227ed`
BLAKE2b-256	`6b03e502f3561f1e991fd1d5b4ca4b9e916d0e18335719a048b43814cb92a628`

See more details on using hashes here.

agent-eval-contract 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agent Eval Contract

Install

Validate A Record

CLI

What It Provides

Contract Vocabulary

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes