a2a-spec

The open specification for testing, validating, and guaranteeing agent-to-agent interactions.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

padobrik

These details have not been verified by PyPI

Project description

a2a-spec
The open specification for testing, validating, and guaranteeing agent-to-agent interactions.

Python 3.11+

The Problem

Multi-agent AI systems are impossible to test reliably. When Agent A changes its output format, Agent B silently breaks. LLM outputs are non-deterministic, so CI pipelines either skip testing or flake constantly. Existing tools focus on prompt evaluation or observability — none provide contract testing between agents.

The Solution

a2a-spec is a specification, testing, and validation layer for multi-agent systems. Define what one agent expects from another as a YAML spec. Record LLM outputs as snapshots. Replay them deterministically in CI with zero LLM calls. Detect structural and semantic regressions before they reach production.

Agent A ──[spec]──> Agent B ──[spec]──> Agent C
    │                   │                   │
    └── snapshot ──> replay ──> validate ──> ✓ CI passes

What a2a-spec is NOT

a2a-spec is not	Examples	What a2a-spec is
An agent framework	LangChain, CrewAI, AutoGen	A testing layer that sits alongside any framework
An observability tool	LangSmith, Arize, Langfuse	A validation engine that runs in CI, not production
A prompt evaluation tool	Promptfoo, DeepEval	A contract testing system between agents
An agent runtime	n/a	A specification framework for agent boundaries

Quick Start

Install

pip install a2a-spec

With optional features:

pip install a2a-spec[semantic]    # Embedding-based semantic comparison
pip install a2a-spec[langchain]   # LangChain adapter
pip install a2a-spec[dev]         # Testing and linting tools
pip install a2a-spec[all]         # Everything

Initialize a project

a2aspec init --name my-project

This creates:

my-project/
├── a2a-spec.yaml              # Project configuration
└── a2a_spec/
    ├── specs/                  # Agent-to-agent contracts
    │   └── example-spec.yaml
    ├── snapshots/              # Recorded outputs (committed to git!)
    ├── scenarios/              # Test input scenarios
    └── adapters/               # Agent wrappers

Define a spec

A spec is a YAML contract between a producer agent and a consumer agent. It defines structural, semantic, and policy requirements:

# a2a_spec/specs/triage-to-resolution.yaml
spec:
  name: triage-to-resolution
  version: "1.0"
  producer: triage-agent
  consumer: resolution-agent
  description: "What the resolution agent expects from triage"

  structural:
    type: object
    required: [category, summary, confidence]
    properties:
      category:
        type: string
        enum: [billing, shipping, product, general]
      summary:
        type: string
        minLength: 10
        maxLength: 500
      confidence:
        type: number
        minimum: 0.0
        maximum: 1.0

  semantic:
    - rule: summary_reflects_input
      description: "Summary must faithfully reflect the customer message"
      method: embedding_similarity
      threshold: 0.8

  policy:
    - rule: no_pii
      description: "Output must not contain PII"
      method: regex
      patterns:
        - '\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'  # Credit card
        - '\b\d{3}-\d{2}-\d{4}\b'                       # SSN

Record snapshots

a2aspec record  # Calls live agents via adapters, saves outputs to disk

Snapshots are JSON files committed to git — they become your deterministic test baselines.

Test in CI (zero LLM calls)

a2aspec test --replay  # Validates saved snapshots against specs

No API keys needed. No LLM costs. Fully deterministic. Runs in milliseconds.

Detect semantic drift

After changing a prompt or upgrading a model:

a2aspec record   # Re-record with the new configuration
a2aspec diff     # Compare new vs. baseline outputs

The diff engine reports structural changes (fields added/removed/type-changed) and semantic drift (meaning shifted beyond threshold), with severity levels from LOW to CRITICAL.

Core Concepts

Concept	Description
Spec	A YAML file defining what one agent expects from another — structure, semantics, and policy rules
Snapshot	A recorded LLM output for a given input, stored as JSON and committed to git
Replay	Running validation against saved snapshots with zero LLM calls — fast, free, deterministic
Diff	Structural + semantic comparison between old and new agent outputs, with severity levels
Pipeline	A DAG of agents with routing conditions, tested end-to-end with spec validation at each step
Adapter	A wrapper around your agent (function, HTTP, LangChain) so a2a-spec can call it

→ See docs/concepts.md for detailed explanations.

Adapters — Wrap Any Agent

a2a-spec is framework-agnostic. Adapters wrap your agents so the framework can call them during recording and testing.

Plain async functions

from a2a_spec import FunctionAdapter

async def my_triage_agent(input_data: dict) -> dict:
    # Your agent logic (calls OpenAI, Anthropic, local model, etc.)
    return {"category": "billing", "summary": "Customer reports duplicate charge", "confidence": 0.95}

adapter = FunctionAdapter(
    fn=my_triage_agent,
    agent_id="triage-agent",
    version="1.0.0",
    model="gpt-4",
)

HTTP endpoints

from a2a_spec import HTTPAdapter

adapter = HTTPAdapter(
    url="http://localhost:8000/triage",
    agent_id="triage-agent",
    version="1.0.0",
    headers={"Authorization": "Bearer $TOKEN"},
    timeout=30.0,
)

Custom adapters

from a2a_spec import AgentAdapter, AgentMetadata, AgentResponse

class MyCrewAIAdapter(AgentAdapter):
    def get_metadata(self) -> AgentMetadata:
        return AgentMetadata(agent_id="my-crew-agent", version="1.0")

    async def call(self, input_data: dict) -> AgentResponse:
        result = await my_crew.kickoff(input_data)
        return AgentResponse(output=result.dict())

→ See docs/writing-adapters.md for the full guide.

Pipeline Testing

Test entire multi-agent pipelines as a DAG. a2a-spec validates each agent's output against its spec and checks routing conditions:

pipeline:
  name: customer-support
  agents:
    triage-agent: {}
    billing-agent: {}
    shipping-agent: {}
    resolution-agent: {}
  edges:
    - from: triage-agent
      to: billing-agent
      condition: "output.category == 'billing'"
    - from: triage-agent
      to: shipping-agent
      condition: "output.category == 'shipping'"
    - from: [billing-agent, shipping-agent]
      to: resolution-agent
  test_cases:
    - name: billing_flow
      input: { message: "I was charged twice" }

a2aspec pipeline test pipeline.yaml --mode replay

→ See docs/architecture.md for the pipeline execution model.

Configuration

Project configuration lives in a2a-spec.yaml:

project_name: "my-project"
version: "1.0"

specs_dir: "./a2a_spec/specs"
scenarios_dir: "./a2a_spec/scenarios"

semantic:
  provider: sentence-transformers
  model: all-MiniLM-L6-v2     # Lazy-loaded, only when needed
  enabled: true

storage:
  backend: local
  path: ./a2a_spec/snapshots

ci:
  fail_on_semantic_drift: true
  drift_threshold: 0.15
  replay_mode: exact

Python API

Use a2a-spec programmatically in your existing test suite:

from a2a_spec import load_spec, validate_output, SnapshotStore, ReplayEngine

# Load and validate
spec = load_spec("a2a_spec/specs/triage-to-resolution.yaml")
result = validate_output(
    {"category": "billing", "summary": "Customer charged twice", "confidence": 0.95},
    spec,
)
assert result.passed

# Replay snapshots
store = SnapshotStore("./a2a_spec/snapshots")
engine = ReplayEngine(store)
output = engine.replay("triage-agent", "billing_overcharge")

# Diff two outputs
from a2a_spec import DiffEngine
diff = DiffEngine()
results = diff.diff(old_output, new_output, semantic_threshold=0.85)
for r in results:
    print(f"{r.field}: {r.severity} — {r.explanation}")

# Policy enforcement
from a2a_spec.policy.engine import PolicyEngine
from a2a_spec.policy.builtin import no_pii_in_output
engine = PolicyEngine()
engine.register_validator("no_pii", no_pii_in_output)

CLI Reference

Command	Description
`a2aspec init [DIR]`	Scaffold a new a2a-spec project with examples
`a2aspec record`	Record live agent outputs as snapshots
`a2aspec test --replay`	Validate snapshots against specs (deterministic, zero LLM calls)
`a2aspec test --live`	Validate live agent outputs against specs
`a2aspec diff`	Compare current outputs against baselines
`a2aspec diff --agent NAME`	Diff a specific agent only
`a2aspec pipeline test FILE`	Test a multi-agent pipeline DAG
`a2aspec --version`	Show version

→ See docs/cli-reference.md for full options and flags.

CI Integration

a2a-spec is designed for CI-first workflows:

# .github/workflows/a2a-spec.yml
name: Agent Contract Tests
on: [push, pull_request]

jobs:
  spec-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install a2a-spec
      - run: a2aspec test --replay

Key principle: Record locally (with API keys), test in CI (with snapshots). Snapshots are committed to git — they are your test baselines.

Output Format	Flag	Use Case
Console (Rich)	`--format console`	Local development
Markdown	`--format markdown`	PR comments
JUnit XML	`--format junit`	CI test reporters

→ See docs/ci-integration.md for GitHub Actions, Jenkins, and more.

Comparison

Feature	a2a-spec	Pact	DeepEval	Promptfoo	LangSmith
Agent-to-agent contracts	✅	✅	❌	❌	❌
LLM output snapshots	✅	❌	❌	❌	❌
Deterministic CI replay	✅	✅	❌	❌	❌
Semantic drift detection	✅	❌	✅	✅	✅
Policy enforcement (PII, etc.)	✅	❌	✅	✅	❌
Pipeline DAG testing	✅	❌	❌	❌	❌
Framework agnostic	✅	✅	❌	❌	❌
Zero LLM calls in CI	✅	N/A	❌	❌	❌
Typed Python API (PEP 561)	✅	N/A	✅	N/A	✅

Architecture

src/a2a_spec/
├── cli/          # Typer CLI (init, record, test, diff, pipeline)
├── spec/         # Spec schema (Pydantic), YAML loader, JSON Schema validator
├── snapshot/     # Record, store, fingerprint, and replay engine
├── diff/         # Structural (JSON) + semantic (embedding) comparison
├── pipeline/     # DAG builder, topological executor, execution traces
├── adapters/     # Agent wrappers: function, HTTP, LangChain
├── policy/       # Policy engine with regex and custom validators
├── semantic/     # Embedding model interface (sentence-transformers)
├── reporting/    # Console (Rich), Markdown, JUnit XML, GitHub annotations
├── config/       # YAML config loader with Pydantic validation
├── _internal/    # SHA256 hashing, safe expression evaluator, type aliases
└── exceptions.py # Hierarchical error types with actionable messages

→ See docs/architecture.md for the full design.

Examples

The examples/customer_support/ directory contains a complete walkthrough:

Two agents (triage + resolution) with a2a-spec contract
YAML spec with structural, semantic, and policy rules
Pre-recorded snapshot for deterministic replay
Test scenarios and pytest integration
Step-by-step README

Documentation

Guide	Description
Getting Started	Installation and first test in 2 minutes
Core Concepts	Specs, snapshots, replay, diff explained
CLI Reference	Every command with all options
Writing Specs	Structural, semantic, and policy rules
Writing Adapters	Wrap any agent for a2a-spec
CI Integration	GitHub Actions, JUnit, exit codes
Architecture	Module design and extension points

Contributing

Contributions are welcome. See CONTRIBUTING.md for the development setup, check commands, and PR process.

License

Apache 2.0 — see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

padobrik

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Feb 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a2a_spec-0.1.0.tar.gz (61.6 kB view details)

Uploaded Feb 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

a2a_spec-0.1.0-py3-none-any.whl (57.9 kB view details)

Uploaded Feb 19, 2026 Python 3

File details

Details for the file a2a_spec-0.1.0.tar.gz.

File metadata

Download URL: a2a_spec-0.1.0.tar.gz
Upload date: Feb 19, 2026
Size: 61.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for a2a_spec-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`65747c3b3d66e4ee46e23519d8ec964fed79c9f60e1e669bc4cb3990364f9e16`
MD5	`4e535122e37cfde4bf42ee2ec67f6020`
BLAKE2b-256	`48ca95b11d0442e24fb9fcc5883b8a67fd3a54c28642ca13e7474e5459bcad25`

See more details on using hashes here.

Provenance

The following attestation bundles were made for a2a_spec-0.1.0.tar.gz:

Publisher: publish.yml on padobrik/a2a-spec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: a2a_spec-0.1.0.tar.gz
- Subject digest: 65747c3b3d66e4ee46e23519d8ec964fed79c9f60e1e669bc4cb3990364f9e16
- Sigstore transparency entry: 969339665
- Sigstore integration time: Feb 19, 2026
Source repository:
- Permalink: padobrik/a2a-spec@da0a916baaadfd48750188ec8b160500f3a141fd
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/padobrik
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@da0a916baaadfd48750188ec8b160500f3a141fd
- Trigger Event: release

File details

Details for the file a2a_spec-0.1.0-py3-none-any.whl.

File metadata

Download URL: a2a_spec-0.1.0-py3-none-any.whl
Upload date: Feb 19, 2026
Size: 57.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for a2a_spec-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`af2858fb35f3265df8d6ead8261118d6f8981e4168b76fbbcc8ce5a3a12a6f5f`
MD5	`a404f19aba9dc7f65684dc313745f784`
BLAKE2b-256	`fe10ec68b76b0404eaafab5029be7c5f597f7aff667d287c3156f112d1fee6f7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for a2a_spec-0.1.0-py3-none-any.whl:

Publisher: publish.yml on padobrik/a2a-spec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: a2a_spec-0.1.0-py3-none-any.whl
- Subject digest: af2858fb35f3265df8d6ead8261118d6f8981e4168b76fbbcc8ce5a3a12a6f5f
- Sigstore transparency entry: 969339667
- Sigstore integration time: Feb 19, 2026
Source repository:
- Permalink: padobrik/a2a-spec@da0a916baaadfd48750188ec8b160500f3a141fd
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/padobrik
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@da0a916baaadfd48750188ec8b160500f3a141fd
- Trigger Event: release

a2a-spec 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

The Problem

The Solution

What a2a-spec is NOT

Quick Start

Install

Initialize a project

Define a spec

Record snapshots

Test in CI (zero LLM calls)

Detect semantic drift

Core Concepts

Adapters — Wrap Any Agent

Plain async functions

HTTP endpoints

Custom adapters

Pipeline Testing

Configuration

Python API

CLI Reference

CI Integration

Comparison

Architecture

Examples

Documentation

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance