Conformance testing CLI for the ARP Standard (v1).
Project description
ARP Conformance Toolkit arp-conformance
arp-conformance is the official conformance checker for ARP Standard (v1) HTTP services including Runtime, Tool Registry, and Daemon.
It runs black-box HTTP checks against a base URL and validates:
- Required routes exist and are reachable
- Success + error responses match the ARP envelopes including
ErrorEnvelope - Response bodies are valid against the normative JSON Schemas from the ARP Standard snapshot embedded in this package
What it does not validate:
- “Agent quality” / correctness of model outputs
- Performance, scalability, security posture, or multi-tenancy
- Internal implementation details, since it is wire-level only
This package is SDK-independent: it does not depend on generated SDK packages like arp-standard-model, arp-standard-client, or arp-standard-server. It validates directly from the spec snapshot.
[!IMPORTANT] Version pinning
This toolkit embeds a spec snapshot. Pin
arp-conformance==X.Y.Zto validate services built against the same ARP spec / SDK versionX.Y.Z.View the embedded snapshot:
arp-conformance --versionpython -c "import arp_conformance; print(arp_conformance.SPEC_REF)"
Install
python3 -m pip install arp-conformance
Quick start
Smoke test
Most basic level of testing, considered the safest option since it is GET-only.
arp-conformance check runtime --url http://localhost:8081 --tier smoke
Surface conformance
The second safest conformance test, where no resource creation happens.
arp-conformance check runtime --url http://localhost:8081 --tier surface
arp-conformance check tool-registry --url http://localhost:8082 --tier surface
arp-conformance check daemon --url http://localhost:8083 --tier surface
Run conformance on multiple services
arp-conformance check all \
--runtime-url http://localhost:8081 \
--tool-registry-url http://localhost:8082 \
--daemon-url http://localhost:8083 \
--tier surface
Tiers at a glance
| Tier | What it tests | Creates state? | Safe for prod? |
Typical use |
|---|---|---|---|---|
smoke |
Service is reachable + speaking ARP (/v1/health, /v1/version) |
No | Yes | Fast local sanity check; PR gating |
surface |
Required routes exist + success/error envelopes are schema-valid | No | Usually | Early implementation; contract regression |
core |
Minimal success-path workflow works end-to-end | Yes | No | Staging; nightly CI |
deep |
Optional endpoints + stronger behavioral guarantees | Yes | No | Pre-release / “full” validation |
Conformance definition:
A service “passes ARP conformance (tier X)” if
arp-conformanceproduces no FAIL results at that tier (and optionally noWARN/SKIPwhen using--strict).
Safety (before you run core / deep)
core and deep require --allow-mutations and will send real state-changing requests, depending on service type:
- Runtime: creates a run (
POST /v1/runs) and polls status/result. - Tool Registry: invokes a tool (
POST /v1/tool-invocations). - Daemon: may create a runtime profile (if none exists), creates an instance, submits a run, polls status/result, and cleans up by default.
Use staging/dev URLs unless you are confident about side effects. If you want to keep resources for debugging, use --no-cleanup.
Core conformance (creates real state; staging/dev recommended)
arp-conformance check runtime --url http://localhost:8081 --tier core --allow-mutations
arp-conformance check tool-registry --url http://localhost:8082 --tier core --allow-mutations
arp-conformance check daemon --url http://localhost:8083 --tier core --allow-mutations
Output and reports
Example output (text)
service=runtime tier=surface spec=spec/v1@v0.2.6
counts={'PASS': 10, 'FAIL': 0, 'WARN': 0, 'SKIP': 0} ok=True
- PASS smoke.health: OK
- PASS smoke.version: OK
Export JSON / JUnit
arp-conformance check runtime --url http://localhost:8081 --tier surface --format json --out arp-conformance.json
arp-conformance check runtime --url http://localhost:8081 --tier core --allow-mutations --format junit --out arp-conformance.xml
CI gating
- By default, the CLI exits non-zero when there is at least one
FAIL. - Use
--strictto also fail onWARNandSKIP(useful when you want a hard guarantee). - In GitHub Actions, a non-zero exit code fails the step/job.
Compatibility / pinning
Rule of thumb: pin arp-conformance==X.Y.Z to validate services targeting the ARP spec / SDK release X.Y.Z.
pipx install "arp-conformance==0.2.6"
arp-conformance --version
python -c "import arp_conformance; print(arp_conformance.SPEC_REF)"
Interpreting failures (fast debug loop)
401/403: pass auth headers via--headersor--headers-file.- Timeouts / polling failures: bump
--timeout,--poll-timeout, and/or--poll-interval. - Schema mismatch: inspect the response body (use
--format json --out ...) and confirm you pinned the toolkit version you intended (arp_conformance.SPEC_REF). WARN/SKIP: decide whether you want--strictin your environment.
Recommended usage
Local development:
- Use
smokefirst to confirm the service is reachable. - Use
surfaceduring early implementation to validate routes + response envelopes, even before end-to-end behavior works. - Use
corewhen the service is fully wired and you can safely allow test mutations (--allow-mutations).
CI:
- On pull requests: run
smokeorsurfaceagainst a locally started service container. - On nightly / integration pipelines: run
core(and optionallydeep) with--allow-mutations.
Authentication and headers
If your service requires auth, pass headers:
arp-conformance check runtime \
--url https://example.com \
--tier surface \
--headers "Authorization=Bearer ..."
For CI, prefer a headers file:
cat > headers.env <<'EOF'
Authorization=Bearer ...
EOF
arp-conformance check runtime --url https://example.com --tier surface --headers-file headers.env
CI recipes (GitHub Actions)
This repo provides a composite action that installs arp-conformance from PyPI and runs it:
AgentRuntimeProtocol/ARP_Standard/.github/actions/arp-conformance
By default, when you reference the action as .../arp-conformance@vX.Y.Z, it installs arp-conformance==X.Y.Z. You can override with package_version.
Surface gate on PR (no resource creation)
name: arp-conformance
on: [pull_request]
jobs:
surface:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Start your service under test (docker compose, process, etc) before running conformance.
- uses: AgentRuntimeProtocol/ARP_Standard/.github/actions/arp-conformance@v0.2.6
with:
service: runtime
url: http://localhost:8081
tier: surface
report_format: json
report_path: arp-conformance.json
Core gate on nightly (mutations enabled; JUnit output)
name: arp-conformance-nightly
on:
schedule:
- cron: "0 3 * * *"
jobs:
core:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Start your service under test (docker compose, process, etc) before running conformance.
- uses: AgentRuntimeProtocol/ARP_Standard/.github/actions/arp-conformance@v0.2.6
with:
service: runtime
url: http://localhost:8081
tier: core
allow_mutations: "true"
report_format: junit
report_path: arp-conformance.xml
upload_artifact: "true"
Tiers (detailed definitions)
All tiers produce per-check results with PASS, FAIL, WARN, or SKIP.
PASS: required behavior is present and schema-valid.FAIL: required behavior is missing or schema-invalid.WARN: behavior is present but not ideal for conformance (for example:401/403without headers, or a workflow could not be completed due to missing prerequisites).SKIP: a check is not applicable (for example: optional endpoints indeeptier when the service does not implement them).
You can treat WARN and SKIP as failures using --strict.
Tier smoke: connectivity + universal endpoints
Purpose: validate the service is reachable and speaking ARP.
Allowed side effects: none (GET only).
Checks:
GET /v1/healthreturns200and matches theHealthschema.GET /v1/versionreturns200and matches theVersionInfoschema.supported_api_versionscontainsv1.
Tier surface: URL + envelope conformance
Purpose: validate that required routes exist and responses are shaped correctly on both success and error paths, without requiring a fully working backend.
Allowed side effects:
- For mutation endpoints (
POST,PUT,DELETE),surfacesends intentionally invalid requests or uses clearly non-existent IDs to avoid creating resources.
What “surface conformance” means:
- For each required endpoint, the service must respond in a way that demonstrates the route is implemented:
- Either a schema-valid success response (when applicable), or
- A schema-valid
ErrorEnveloperesponse (defaulterror shape) for failures.
In addition, for mutation endpoints, surface includes a basic request schema enforcement probe:
- When given an obviously invalid JSON body (missing required fields), the service should return a non-2xx error with an
ErrorEnvelope.
Tier core: minimal success-path and end-to-end protocol works
Purpose: prove at least ONE real, end-to-end workflow works for the service type, using the smallest spec-valid sequence.
Requires: --allow-mutations because this tier creates real runs and/or invocations.
What “minimal success-path” means:
- A shortest sequence of spec-valid requests that should succeed on a correctly configured service.
- Request bodies include only required fields (plus a stable
run_id/invocation_idwhen helpful). - Every success response body is validated against the normative JSON Schemas.
On the --allow-mutations flag and service-specific minimal success-paths:
What does --allow-mutations do
This is a safety guard: without --allow-mutations, arp-conformance will not run tiers that perform real state-changing requests like core and deep.
When enabled:
- Runtime
corecreates a real run viaPOST /v1/runsand polls status/result. - Tool Registry
coreperforms a real tool invocation viaPOST /v1/tool-invocationsfor a selected tool. - Daemon
coremay create a runtime profile if none exists, creates an instance, submits a run, polls status/result, and by default deletes anything it created. You can disable cleanup with--no-cleanup.
surface does not require --allow-mutations; it may still send intentionally invalid mutation requests to verify that routes exist and that error responses match ErrorEnvelope, but it is designed to avoid creating resources.
Runtime (core)
POST /v1/runswith a minimal validRunRequest(input.goalrequired).- Expect
200RunStatus(schema-valid). - Poll
GET /v1/runs/{run_id}until terminal state or timeout. GET /v1/runs/{run_id}/resultreturns200RunResult(schema-valid).
Notes:
RunResult.okmay betrueorfalse; conformance validates the envelope and schema, not “agent quality”.
Tool Registry (core)
GET /v1/toolsreturns200and a schema-valid list.- Choose a tool:
- Prefer
--tool-id/--tool-nameif provided. - Otherwise choose the first tool in the list.
- Prefer
GET /v1/tools/{tool_id}returns200ToolDefinition.POST /v1/tool-invocationswith a schema-validToolInvocationRequestreturns200ToolInvocationResult.
Notes:
- If there are zero tools and you did not provide
--tool-id/--tool-name, invocation checks areSKIPand the run isWARNunless--strict. - If the invocation returns
ok=false, it is schema-valid; it is reported asWARN(tool execution may not be configured) unless--strict.
Daemon (core)
GET /v1/admin/runtime-profilesreturns200RuntimeProfileListResponse.- Choose a runtime profile:
- Use
--runtime-profileif provided, - Else pick the first returned profile,
- Else create a temporary profile via
PUT /v1/admin/runtime-profiles/{runtime_profile}(requires--allow-mutations).
- Use
POST /v1/instancescreates1instance for the selected profile.POST /v1/runssubmits an async run (202RunStatus).- Poll
GET /v1/runs/{run_id}until terminal. GET /v1/runs/{run_id}/resultreturns200RunResult.- Cleanup (default): delete created instance and temp runtime profile (can disable with
--no-cleanup).
Tier: deep (optional endpoints + stronger checks)
Purpose: validate optional endpoints and stronger behavioral guarantees.
Requires: --allow-mutations (builds on core).
Checks include:
- Runtime optional endpoints (if implemented):
POST /v1/runs/{run_id}:cancelGET /v1/runs/{run_id}/events(text/event-stream)
- Daemon optional endpoint (if implemented):
GET /v1/runs/{run_id}/trace
If an optional endpoint is not implemented (404/405), it is SKIP (or FAIL with --strict).
CLI reference
Commands
arp-conformance check runtime --url <base-url> [flags]arp-conformance check tool-registry --url <base-url> [flags]arp-conformance check daemon --url <base-url> [flags]arp-conformance check all --runtime-url ... --tool-registry-url ... --daemon-url ... [flags]
Common flags
--tier {smoke,surface,core,deep}--headers KEY=VALUE(repeatable)--headers-file <path>(KEY=VALUEper line)--timeout <seconds>(request timeout)--retries <n>(transport retries)--poll-timeout <seconds>and--poll-interval <seconds>(run polling)--allow-mutations(required forcoreanddeep)--no-cleanup(don’t delete instances/profiles created by the checker)--strict(treatWARN/SKIPas failures)--format {text,json,junit}and--out <path>--spec v1(default) and--spec-path <dir>(use a local spec checkout; accepts either a repo root containingspec/v1/or aspec/directory containingv1/)
Tool Registry flags
--tool-id <id>/--tool-name <name>: select which tool to invoke forcore/deep.
Daemon flags
--runtime-profile <name>: choose which runtime profile to use (or to create if missing).
Library API (minimal)
from arp_conformance.api import run
report = run(
service="runtime",
base_url="http://localhost:8081",
tier="smoke",
)
print(report.ok)
print(report.to_json())
Spec reference
arp_conformance.SPEC_REF exposes the spec tag used by the package (for example, spec/v1@v0.2.6).
The embedded spec snapshot lives under arp_conformance/_spec/ inside the wheel.
Development (maintainers)
When the spec changes, sync the embedded snapshot:
python3 tools/conformance/sync_spec.py --version v1
Then rebuild the package.
Exit codes
0: noFAILresults (and noWARN/SKIPwhen--strictis set)1: at least oneFAIL(orWARN/SKIPwith--strict)2: invalid CLI usage (bad arguments)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arp_conformance-0.2.6.tar.gz.
File metadata
- Download URL: arp_conformance-0.2.6.tar.gz
- Upload date:
- Size: 41.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
572d5aa3418360b9a35e31c788f0855def8c45467cf753a3c4d950d43f2b9451
|
|
| MD5 |
97e39f481e1c72d7f3be369e4e84b563
|
|
| BLAKE2b-256 |
5ea693c1ffc3b4201aba1a1c86b668ec18f2079fec858958a7544d5744d50d88
|
Provenance
The following attestation bundles were made for arp_conformance-0.2.6.tar.gz:
Publisher:
release.yml on AgentRuntimeProtocol/ARP_Standard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arp_conformance-0.2.6.tar.gz -
Subject digest:
572d5aa3418360b9a35e31c788f0855def8c45467cf753a3c4d950d43f2b9451 - Sigstore transparency entry: 778914720
- Sigstore integration time:
-
Permalink:
AgentRuntimeProtocol/ARP_Standard@a877e41c8b377c21e2bc7e7abefa21e218793d34 -
Branch / Tag:
refs/tags/v0.2.6 - Owner: https://github.com/AgentRuntimeProtocol
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a877e41c8b377c21e2bc7e7abefa21e218793d34 -
Trigger Event:
push
-
Statement type:
File details
Details for the file arp_conformance-0.2.6-py3-none-any.whl.
File metadata
- Download URL: arp_conformance-0.2.6-py3-none-any.whl
- Upload date:
- Size: 72.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cc360e23e322f5869c1dd57d34c6e7298924e9661d3fd68064f83d5ec6694fb
|
|
| MD5 |
5d611815186ce2cc34d27e299fd36db4
|
|
| BLAKE2b-256 |
8c9d72902f2d2aeef989ce0fb53f93c86a53dbfce5a20e4aca6aa07127c11d65
|
Provenance
The following attestation bundles were made for arp_conformance-0.2.6-py3-none-any.whl:
Publisher:
release.yml on AgentRuntimeProtocol/ARP_Standard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arp_conformance-0.2.6-py3-none-any.whl -
Subject digest:
0cc360e23e322f5869c1dd57d34c6e7298924e9661d3fd68064f83d5ec6694fb - Sigstore transparency entry: 778914722
- Sigstore integration time:
-
Permalink:
AgentRuntimeProtocol/ARP_Standard@a877e41c8b377c21e2bc7e7abefa21e218793d34 -
Branch / Tag:
refs/tags/v0.2.6 - Owner: https://github.com/AgentRuntimeProtocol
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a877e41c8b377c21e2bc7e7abefa21e218793d34 -
Trigger Event:
push
-
Statement type: