Conformance test suite for the Agent Accessibility Event Protocol (AAEP)
Project description
AAEP Conformance Test Suite
The official test suite for verifying AAEP implementations. This package, aaep-conformance, lets producers and subscribers prove they conform to a claimed AAEP Conformance Level by running a comprehensive battery of automated tests.
If you ship a product claiming AAEP support, you should run this suite and publish the report.
Installation
pip install aaep-conformance
Or from source:
git clone https://github.com/Ramseyxlil/aaep.git
cd aaep/conformance
pip install -e .
Requires Python 3.10 or newer.
Quick start
Verify a producer at Conformance Level 1
aaep-conformance producer --endpoint http://localhost:8080/agent --level 1
The suite connects to your producer endpoint, exercises it with a battery of test scenarios, captures the events it emits, and verifies each event against the AAEP specification.
Verify a subscriber at Conformance Level 2
aaep-conformance subscriber --connect tcp://localhost:9999 --level 2
The suite acts as a synthetic producer, emits a battery of test events to your subscriber, and verifies your subscriber's behavior (including reply handling).
Validate a single AAEP event against the schema
aaep-conformance validate event.json
Pure schema validation without exercising any endpoint.
What each level tests
Level 1 (Notification)
Tests verify the producer:
- Emits well-formed AAEP events
- Includes all required envelope fields
- Uses correct event types from the core vocabulary
- Properly terminates sessions with
session.completed,session.errored, orsession.cancelled - Emits events in valid sequences (no
tool.completedwithout precedingtool.invoked) - Includes
summary_normalon user-facing events - Sets
urgency: criticalonsession.erroredevents - Correctly formats identifiers (event_id, session_id, tool_call_id, reply_token)
~40 test cases.
Level 2 (Interactive)
Everything in Level 1, plus tests verifying:
- Producer emits
agent.awaiting.confirmationbefore irreversible actions - Producer blocks until reply arrives or timeout elapses
default_decisionfollows the safety rule (irreversible+high → reject)- Producer correctly applies default_decision on timeout
- Subscriber sends valid
confirmation.replymessages - Subscriber sends valid
clarification.replymessages - Reply tokens are single-use
- Subscriber and producer handle multiple concurrent confirmations correctly
~80 test cases.
Level 3 (Negotiated)
Everything in Levels 1 and 2, plus tests verifying:
- Producer responds to
subscription.requestwithsubscription.acceptedorsubscription.rejected honored_capabilitiesis not more permissive than the request- Producer honors
max_events_per_second(backpressure) - Producer respects
event_filters(include/exclude patterns) - Producer honors
coalesce_boundaries - Critical events bypass rate limits and filters
- Signed manifests (when required) validate cryptographically
- Subscription renegotiation works
- Multiple concurrent subscriptions are handled correctly
~120 test cases.
Reports
The suite produces two output formats:
JSON report (conformance-report.json)
Machine-readable, suitable for CI pipelines:
{
"aaep_version": "1.0.0",
"conformance_level": 2,
"endpoint": "http://localhost:8080/agent",
"started_at": "2026-06-30T14:22:00Z",
"completed_at": "2026-06-30T14:24:18Z",
"result": "PASS",
"pass_rate": 0.98,
"tests_run": 120,
"tests_passed": 118,
"tests_failed": 2,
"tests_skipped": 0,
"failures": [
{
"test_id": "L2-CONF-007",
"description": "Producer blocks on confirmation timeout with default_decision=reject",
"expected": "default_decision applied at 300.0s",
"actual": "default_decision applied at 298.7s (1.3s early)",
"severity": "warning"
}
]
}
HTML report (conformance-report.html)
Human-readable, suitable for product documentation. Shows pass/fail by category with drill-down to individual test cases.
Publishing the report
Once your implementation passes, publish the report alongside your product's accessibility documentation. Recommended format:
Product Name X.Y.Z supports AAEP v1.0.0 at Conformance Level 2. Verified 2026-09-15 against
aaep-conformance1.0.0. Full report: [link]
Running in CI
Add to your .github/workflows/:
name: AAEP Conformance
on: [push, pull_request]
jobs:
conformance:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install aaep-conformance
- run: |
# Start your producer in background
python my_agent.py --serve --port 8080 &
sleep 5
aaep-conformance producer --endpoint http://localhost:8080/agent --level 2 --fail-on-warning
The --fail-on-warning flag causes the suite to exit nonzero on any warning, blocking PR merges that introduce regressions.
Interpreting failures
| Failure type | Meaning | Action |
|---|---|---|
| error | Hard spec violation | Fix before claiming conformance |
| warning | Spec ambiguity or recommended-not-required violation | Investigate; may indicate a real bug |
| info | Behavior worth noting but not a violation | No action required |
A run with zero error failures and pass_rate ≥ 0.95 conforms at the claimed level.
Architecture
conformance/
├── README.md ← this file
├── pyproject.toml ← package config
├── setup.py ← legacy entry
├── aaep_conformance/
│ ├── __init__.py
│ ├── cli.py ← main CLI entry point
│ ├── runner.py ← test orchestration
│ ├── reporter.py ← report generation
│ ├── level_1.py ← Level 1 test suite
│ ├── level_2.py ← Level 2 test suite
│ ├── level_3.py ← Level 3 test suite
│ ├── checks/
│ │ ├── __init__.py
│ │ ├── envelope.py ← envelope structure checks
│ │ ├── lifecycle.py ← session lifecycle checks
│ │ ├── tools.py ← tool invocation checks
│ │ ├── streaming.py ← output streaming checks
│ │ ├── confirmation.py ← confirmation flow checks
│ │ ├── handshake.py ← subscription handshake checks
│ │ └── safety.py ← safety rule enforcement
│ └── fixtures/
│ ├── valid/ ← known-valid AAEP events
│ └── invalid/ ← known-invalid AAEP events
└── tests/ ← tests for the suite itself
The suite is itself tested (it would be ironic if the conformance suite had bugs). Run pytest in this directory to verify the suite's own correctness.
Contributing new test cases
If you find AAEP behavior that ought to be tested but isn't, contribute a test case:
- Identify what level the test belongs to (1, 2, or 3)
- Open
aaep_conformance/level_N.pyand add your test function - Use the helpers in
aaep_conformance/checks/ - Add a fixture in
aaep_conformance/fixtures/if needed - Update this README with the new test count
- Submit a PR
Test cases are reviewed by the AAEP maintainers and other contributors. The bar: a test must verify a specific normative requirement in the spec and must be precise enough that "pass" and "fail" are unambiguous.
See also
../spec/09-conformance.md— normative conformance criteria../guides/IMPLEMENTERS_GUIDE.md— what implementations should do../guides/SUBSCRIBERS_GUIDE.md— what subscribers should do../tools/aaep-validate/— lightweight standalone validator
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aaep_conformance-1.0.0.tar.gz.
File metadata
- Download URL: aaep_conformance-1.0.0.tar.gz
- Upload date:
- Size: 46.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c03b2531bb53b827de8e69a0fb72d42f74e8ef7e5fafffe428ec650cdc82461e
|
|
| MD5 |
07fe244c63337e135a2fc0e37d444160
|
|
| BLAKE2b-256 |
26275dbd6e4e9db6fdc77258212ec21b36eaaac2ab586817fc868cb7bd4cab7a
|
File details
Details for the file aaep_conformance-1.0.0-py3-none-any.whl.
File metadata
- Download URL: aaep_conformance-1.0.0-py3-none-any.whl
- Upload date:
- Size: 51.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26fa6212a2689bd9e7aa71c6738e318def01653a5aae40f140b89de15961748f
|
|
| MD5 |
4f0469b964def9faa04fb95d21d549c0
|
|
| BLAKE2b-256 |
3ea7e2b7c5f2cba02a54eefccab8515462c414a30ee6015c79ef6e8339f9d72a
|