Skip to main content

Agent Alignment Protocol - The missing alignment layer for the agent protocol stack

Project description

Agent Alignment Protocol (AAP)

CI CodeQL codecov PyPI npm License Spec

A transparency protocol for autonomous agents.

AAP lets agents declare their alignment posture, produce auditable decision traces, and verify value coherence before coordinating with other agents. It extends existing protocols (A2A, MCP) with an alignment layer that makes agent behavior observable.

AAP is a transparency protocol, not a trust protocol. It makes agent behavior more observable, not more guaranteed.

Quick Start

# Install
pip install agent-alignment-protocol

# Generate an Alignment Card
aap init --values "principal_benefit,transparency,harm_prevention"
# ✓ Created alignment-card.json

# Instrument your agent
from aap import trace_decision

@trace_decision(card_path="alignment-card.json")
def recommend_product(user_preferences):
    # Your agent logic here
    # Decisions are automatically traced
    ...
# Verify behavior matches declaration
aap verify --card alignment-card.json --trace logs/trace.json
# ✓ Verified [similarity: 0.82]
# Checks: autonomy, escalation, values, forbidden, behavioral_similarity

Why AAP?

The agent protocol stack provides capability discovery (A2A), tool integration (MCP), and payment authorization (AP2). None address a fundamental question: Is this agent serving its principal's interests?

Protocol Function Gap
MCP Agent-to-tool connectivity No alignment semantics
A2A Task negotiation between agents No value verification
AP2 Payment authorization No behavioral audit

As agent capabilities become symmetric—equal access to information, equal reasoning power—alignment becomes the primary differentiator. AAP provides the infrastructure to make alignment claims verifiable.

Three Components

┌─────────────────┬─────────────────┬─────────────────┐
│ Alignment Card  │    AP-Trace     │ Value Coherence │
│                 │                 │    Handshake    │
├─────────────────┼─────────────────┼─────────────────┤
│ "What I claim   │ "What I         │ "Can we work    │
│  to be"         │  actually did"  │  together?"     │
└─────────────────┴─────────────────┴─────────────────┘
     Declaration        Audit          Coordination

Alignment Card

A structured declaration of an agent's alignment posture:

{
  "aap_version": "0.1.0",
  "agent_id": "did:web:my-agent.example.com",
  "principal": {
    "type": "human",
    "relationship": "delegated_authority"
  },
  "values": {
    "declared": ["principal_benefit", "transparency", "minimal_data"],
    "conflicts_with": ["deceptive_marketing", "hidden_fees"]
  },
  "autonomy_envelope": {
    "bounded_actions": ["search", "compare", "recommend"],
    "escalation_triggers": [
      {
        "condition": "purchase_value > 100",
        "action": "escalate",
        "reason": "Exceeds autonomous spending limit"
      }
    ],
    "forbidden_actions": ["share_credentials", "subscribe_to_services"]
  },
  "audit_commitment": {
    "trace_format": "ap-trace-v1",
    "retention_days": 90,
    "queryable": true
  }
}

AP-Trace

An audit log entry recording each decision:

{
  "trace_id": "tr-f47ac10b-58cc-4372",
  "card_id": "ac-f47ac10b-58cc-4372",
  "timestamp": "2026-01-31T12:30:00Z",
  "action": {
    "type": "recommend",
    "name": "product_recommendation",
    "category": "bounded"
  },
  "decision": {
    "alternatives_considered": [
      {"option_id": "A", "score": 0.85, "flags": []},
      {"option_id": "B", "score": 0.72, "flags": ["sponsored_content"]}
    ],
    "selected": "A",
    "selection_reasoning": "Highest score. Option B flagged as sponsored and deprioritized per principal_benefit value.",
    "values_applied": ["principal_benefit", "transparency"]
  },
  "escalation": {
    "evaluated": true,
    "required": false,
    "reason": "Recommendation only, no purchase action"
  }
}

Value Coherence Handshake

Pre-coordination compatibility check between agents:

from aap import check_coherence

result = check_coherence(my_card, their_card, task_context)

if result.compatible:
    # Proceed with coordination
    proceed_with_task()
else:
    # Handle conflict
    print(f"Value conflict: {result.conflicts}")
    # Escalate to principals or negotiate scope

What AAP Does Not Do

This matters. Read it.

  1. AAP does NOT ensure alignment—it provides visibility. An agent can produce perfect traces while acting against its principal's interests.

  2. Verified does NOT equal safe. A verified trace means consistency with declared alignment. It doesn't mean the alignment is good or the outcome was beneficial.

  3. AP-Trace is sampled, not complete. Traces capture decision points, not every computation. Significant reasoning may occur between traces.

  4. Value coherence is relative to declared values. The handshake checks if declared values are compatible. It doesn't verify agents hold these values or will act on them.

  5. Tested on transformer-based agents. Other architectures may exhibit behaviors AAP doesn't capture.

For the complete limitations disclosure, see Section 10 of the Specification.

Installation

# Python
pip install agent-alignment-protocol

# TypeScript
npm install @mnemom/agent-alignment-protocol

Integration

With A2A Agents

AAP extends the A2A Agent Card with an alignment block:

{
  "name": "Shopping Assistant",
  "description": "Helps users find products",
  "url": "https://shopping.example.com",
  "alignment": {
    "$ref": "./alignment-card.json"
  }
}

See A2A Migration Guide.

With MCP Tools

AAP can trace tool invocations and verify they match declared alignment:

from aap import mcp_traced

@mcp_traced(card_path="alignment-card.json")
def my_tool(params):
    ...

See MCP Migration Guide.

Verification

The verification engine checks whether observed behavior matches declared alignment:

from aap import verify_trace, detect_drift

# Single trace verification
result = verify_trace(trace, card)
print(f"Verified: {result.verified}, Similarity: {result.similarity_score:.2f}")
if not result.verified:
    print(f"Violations: {result.violations}")
if result.warnings:
    print(f"Warnings: {result.warnings}")

# Drift detection over time
alerts = detect_drift(card, recent_traces)
for alert in alerts:
    print(f"Drift detected: {alert.analysis.drift_direction}")

Verification checks:

  • Autonomy compliance (actions within declared bounds)
  • Escalation compliance (required escalations were performed)
  • Value consistency (applied values match declared values)
  • Forbidden action compliance (no forbidden actions taken)
  • Behavioral similarity (semantic alignment using SSM analysis)

Action Name Matching

Bounded and forbidden action lists support flexible matching (v0.1.7+):

{
  "bounded_actions": [
    "search",
    "exec: execute shell commands",
    "cron: manage scheduled patrol jobs"
  ]
}
  • Exact match: "search" matches action name "search"
  • Colon-prefix match: "exec: execute shell commands" matches action name "exec"
  • Compound actions: action name "exec, read" matches when each component is in the bounded list

This allows descriptive entries in alignment cards while keeping verification flexible.

Similarity scoring: Each verification returns a similarity_score (0.0-1.0) measuring semantic similarity between the trace and declared alignment. If a trace passes structural checks but has similarity_score < 0.50, a low_behavioral_similarity warning is generated.

Try It

Interactive Playground — Verify traces in your browser with SSM visualization.

  • Paste your Alignment Card and AP-Trace
  • See verification results with similarity scoring
  • Visualize behavioral patterns with SSM heatmaps
  • Adjust thresholds in real-time

No server required — runs entirely client-side via WebAssembly.

Documentation

Document Description
SPEC.md Full protocol specification (IETF-style)
QUICKSTART.md Zero to compliant in 5 minutes
LIMITS.md What AAP guarantees and doesn't
SECURITY.md Threat model and security considerations
CALIBRATION.md How verification thresholds were derived

Examples

Example Description
simple-agent/ Minimal AAP implementation
a2a-integration/ A2A agent with AAP
mcp-integration/ MCP tools with alignment
alignment-failure/ Deliberate failure for testing

Status

Current Version: 0.1.8

Component Status
Specification ✅ Complete
JSON Schemas ✅ Complete
Python SDK ✅ Complete
TypeScript SDK ✅ Complete
Verification Engine ✅ Complete (with similarity scoring)
SSM Visualization ✅ Complete
Interactive Playground ✅ Complete

API Reference

# Core API
from aap import (
    verify_trace,      # Verify single trace against card → VerificationResult
    check_coherence,   # Check value compatibility between agents → CoherenceResult
    detect_drift,      # Detect behavioral drift over time → list[DriftAlert]
    trace_decision,    # Decorator for automatic AP-Trace generation
    mcp_traced,        # Decorator for MCP tool tracing
)

# Models
from aap import (
    AlignmentCard,
    APTrace,
    VerificationResult,  # .verified, .similarity_score, .violations, .warnings
    CoherenceResult,     # .compatible, .score, .value_alignment
    DriftAlert,          # .analysis.similarity_score, .analysis.drift_direction
)

# CLI
# aap init [--values VALUES] [--output FILE]
# aap verify --card CARD --trace TRACE        → Shows [similarity: X.XX]
# aap check-coherence --my-card MINE --their-card THEIRS
# aap drift --card CARD --traces TRACES_DIR   → Uses SSM analysis

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

Key areas where we need help:

  • SDK implementations in other languages
  • Integration examples with popular agent frameworks
  • Test vectors for edge cases
  • Documentation improvements

License

Apache 2.0. See LICENSE for details.


Agent Alignment Protocol — Making agent alignment observable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_alignment_protocol-0.1.8.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_alignment_protocol-0.1.8-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file agent_alignment_protocol-0.1.8.tar.gz.

File metadata

  • Download URL: agent_alignment_protocol-0.1.8.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for agent_alignment_protocol-0.1.8.tar.gz
Algorithm Hash digest
SHA256 4b948695adf75f0392ff1eff79acc7baf31d92157b795a23f72bd33ef88cc9a8
MD5 46a9ad1a2f5f538c03299727fe445584
BLAKE2b-256 6d8baf7178f6c4998adf3ef3efe6ce3cf00c5edcfee1b081976d7f74fda6d9a5

See more details on using hashes here.

File details

Details for the file agent_alignment_protocol-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_alignment_protocol-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2acdb7aac696166c47a6409551d8d45c212c15e41751ef9d4622f5da5a298a21
MD5 1496431c8bb834601de1babfa16f0c0b
BLAKE2b-256 8e1407c41361c49dacd7b22fc2f77fffb2d17d9a36e30a2add9ca59075c458a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page