agent-airlock

A type-checker and contract layer for AI agent tool calls — deny-by-default, in-process, zero-dep. Strict argument validation, ghost-argument stripping, and self-healing retries for MCP servers and agent frameworks.

These details have not been verified by PyPI

Project links

Project description

A type-checker and contract layer for AI agent tool calls — deny-by-default, in-process, zero-dep

Strict validation, ghost-argument stripping, and self-healing retries — one decorator, any agent or MCP server.

Test suite: 3,409 tests · Coverage: 85.36% · v0.8.47

Get Started in 30 Seconds · Why Airlock? · All Frameworks · Benchmark · Cross-tool comparison · Least-Privilege Benchmark · Docs

⬛ Reproducible block-rate — deterministic, in-process, deny-by-default

Benchmark (one command to reproduce)	agent-airlock	Compared with
Cross-tool block-rate · 210 tool calls · `python -m benchmarks.blockrate`	100% blocked · 0% false-positive · p50 ~2µs/decision	Meta LlamaFirewall · Invariant Guardrails — model-in-the-loop; scope-claimed, not re-run
Least-privilege / over-privileged tool selection · ToolPrivBench, 100 scenarios, OWASP-Agentic mapped (ASI01–04, 06) · `python -m benchmarks.toolprivbench`	100% over-priv blocked · 100% low-priv allowed · OPUR 100% → 0% enforced (−100pp)	—
Guard-suite CVE corpus · MCP Top-10 tagged	100% detection · 0% false-positive	—
Adaptive-attacker (AgentDojo) · airlock as a defense vs the `tool_knowledge` attack · workspace+banking · `python -m benchmarks.agentdojo.run`	84.4% of injection→task target tool-calls blocked (deterministic upper bound on ASR reduction)	AgentDojo model-in-the-loop ASR — real-ASR path via `--model` needs a key; not re-run here

BENCHMARK.md · benchmarks/blockrate/RESULTS.md · benchmarks/toolprivbench/RESULTS.md (ToolPrivBench: arXiv:2606.20023) · benchmarks/agentdojo/RESULTS.md (AgentDojo: arXiv:2406.13352)

Self-curated corpora: a coverage / regression baseline, not an adaptive-attacker (ASR) score. Incumbents are cited from their published detection scope and not re-run here — no fabricated competitor number. AgentDojo is now wired — airlock runs as a defense and blocks 84.4% of tool_knowledge injection→task target tool-calls on the pinned workspace+banking subset (deterministic upper bound on ASR reduction, not the model-in-the-loop ASR); see benchmarks/agentdojo/RESULTS.md.

┌────────────────────────────────────────────────────────────────┐
│  🤖 AI Agent: "Let me help clean up disk space..."            │
│                           ↓                                    │
│               rm -rf / --no-preserve-root                      │
│                           ↓                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  🛡️ AIRLOCK: BLOCKED                                     │  │
│  │                                                          │  │
│  │  Reason: Matches denied pattern 'rm_*'                   │  │
│  │  Policy: STRICT_POLICY                                   │  │
│  │  Fix: Use approved cleanup tools only                    │  │
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────┘

🧩 Where this fits (vs native MCP gateways)

MCP gateways and platform firewalls — Docker MCP Gateway, Cloudflare, Azure API Management, AWS, and the MCP spec's OAuth resource-server mandate — secure the transport and identity layer: who is allowed to connect, over what channel, with which token. That is necessary and they do it well.

agent-airlock sits one layer in, at the execution boundary after auth: it validates the actual tool-call payload the model produced — argument types (strict Pydantic, no coercion), hallucinated / ghost arguments, deny-by-default tool and capability scope, and output sanitization — then returns a self-healing error the model can retry against. A gateway can confirm the caller is authenticated; it does not check that transfer(amount=-1) is type-valid or that the agent picked the least-privileged tool for the task.

Use both. Gateway/OAuth for the connection; airlock as the in-process call-contract layer for the payload. It's a decorator, not a proxy — zero new network hops, runs wherever your tool runs.

🎯 30-Second Quickstart

pip install agent-airlock

from agent_airlock import Airlock

@Airlock()
def transfer_funds(account: str, amount: int) -> dict:
    return {"status": "transferred", "amount": amount}

# LLM sends amount="500" (string) → BLOCKED with fix_hint
# LLM sends force=True (invented arg) → STRIPPED silently
# LLM sends amount=500 (correct) → EXECUTED safely

That's it. Your function now has ghost argument stripping, strict type validation, and self-healing errors.

🧠 The Problem No One Talks About

The Hype

"MCP has 16,000+ servers on GitHub!" "OpenAI adopted it!" "Linux Foundation hosts it!"

The Reality

LLMs hallucinate tool calls. Every. Single. Day.

Claude invents arguments that don't exist
GPT-4 sends "100" when you need 100
Agents chain 47 calls before one deletes prod data

Enterprise solutions exist: Prompt Security ($50K/year), Pangea (proxy your data), Cisco ("coming soon").

We built the open-source alternative. One decorator. No vendor lock-in. Your data never leaves your infrastructure.

✨ What You Get

Ghost Args _{Strip LLM-invented params}	Strict Types _{No silent coercion}	Self-Healing _{LLM-friendly errors}	E2B Sandbox _{Isolated execution}	RBAC _{Role-based access}	PII Mask _{Auto-redact secrets}
Network Guard _{Block data exfiltration}	Path Validation _{CVE-resistant traversal}	Circuit Breaker _{Fault tolerance}	OpenTelemetry _{Enterprise observability}	Cost Tracking _{Budget limits}	Vaccination _{Auto-secure frameworks}

📋 Table of Contents

Click to expand full navigation

30-Second Quickstart
The Problem
What You Get
Core Features
Framework Compatibility
FastMCP Integration
Comparison
Installation
OWASP Compliance
Performance
Documentation
Contributing
Support

🔥 Core Features

🔒 E2B Sandbox Execution

from agent_airlock import Airlock, STRICT_POLICY

@Airlock(sandbox=True, sandbox_required=True, policy=STRICT_POLICY)
def execute_code(code: str) -> str:
    """Runs in an E2B Firecracker MicroVM. Not on your machine."""
    exec(code)
    return "executed"

Feature	Value
Boot time	~125ms cold, <200ms warm
Isolation	Firecracker MicroVM
Fallback	`sandbox_required=True` blocks local execution

Air-gapped / on-prem? DockerBackend is the supported alternative — cap_drop=["ALL"], no-new-privileges, network_mode="none", timeout enforced, opt-in pytest -m docker integration tests. See docs/sandbox/docker.md.

ModalBackend — Modal-hosted sandbox (v0.8.11+, issue #30)

Already running the rest of your agent on Modal? ModalBackend lets you keep airlocked tool execution on the same substrate instead of mixing E2B and Modal billing / observability.

pip install "agent-airlock[modal]"

from agent_airlock import Airlock, STRICT_POLICY, AirlockConfig
from agent_airlock.sandbox_backend import ModalBackend

backend = ModalBackend(
    app_name="my-airlock-sandbox",
    image_ref="python:3.11-slim",
    cpu=0.5,
    memory_mb=512,
    timeout_s=30,
    # network_policy=None  → block_network=True (fail-closed default)
)

@Airlock(sandbox=True, sandbox_required=True, policy=STRICT_POLICY,
         config=AirlockConfig(sandbox_backend=backend))
def execute_code(code: str) -> str:
    exec(code)
    return "executed"

Isolation model — read before you reach for cap_drop. Modal sandboxes run under gVisor (kernel-syscall filtering), not under Docker-style capability dropping. The Modal Python SDK does not expose cap_drop / cap_add / seccomp / no-new-privileges — there is no equivalent knob to map. If your threat model needs Linux-capability dropping at the container layer, keep using DockerBackend. The network posture is configurable: ModalBackend defaults to block_network=True (deny-by-default), and a supplied NetworkPolicy maps to Modal's block_network flag (allow_egress=False → blocked, True → allowed). Hostname allowlists in NetworkPolicy.allowed_hosts do not forward to Modal (their API is CIDR-only); the backend logs a structlog warning and the operator is expected to re-state hostname constraints at the Airlock policy layer.

ModalBackend is opt-in only — it is NOT added to the get_default_backend() priority chain (E2B → Docker → Local stays the default flow). Existing callers see no behavior change.

📜 Security Policies

Preset	Use case	Key posture
`PERMISSIVE_POLICY`	Dev / sandbox	No restrictions
`STRICT_POLICY`	Prod	Rate-limited, requires agent identity, denies dangerous capabilities
`READ_ONLY_POLICY`	Analytics / RAG	`read_` / `get_` / `list_` / `search_` only
`BUSINESS_HOURS_POLICY`	Compliance windows	`delete_` / `drop_` / `*_production` only 09:00–17:00
`CAMOUFLAGE_RESISTANT_POLICY` (v0.8.6)	Detector-independent defense vs. domain-camouflaged injection	Deny-by-default allowlist, ghost-arg BLOCK, output cap, per-call reauthorization

from agent_airlock import (
    PERMISSIVE_POLICY,
    STRICT_POLICY,
    READ_ONLY_POLICY,
    BUSINESS_HOURS_POLICY,
    CAMOUFLAGE_RESISTANT_POLICY,  # v0.8.6
)

# Or build your own:
from agent_airlock import SecurityPolicy

MY_POLICY = SecurityPolicy(
    allowed_tools=["read_*", "query_*"],
    denied_tools=["delete_*", "drop_*", "rm_*"],
    rate_limits={"*": "1000/hour", "write_*": "100/hour"},
    time_restrictions={"deploy_*": "09:00-17:00"},
)

CAMOUFLAGE_RESISTANT — detector-independent injection defense (v0.8.6)

arXiv:2605.22001 ("Blind Spots in the Guard", Pai, May 2026) shows that production injection detectors — Llama Guard 3 included — drop to IDR = 0.000 on payloads that mimic the target document's domain vocabulary and authority structure. Per the paper, detection rates collapse from 93.8% to 9.7% on Llama 3.1 8B and from 100% to 55.6% on Gemini 2.0 Flash.

CAMOUFLAGE_RESISTANT_POLICY does not rely on payload-content signatures at all. It blocks at four structural seams an attacker has to ride regardless of phrasing:

Deny-by-default tool allowlist. Empty allowed_tools means nothing is callable; deployments opt every tool in by name. A camouflaged directive targeting an unlisted tool is blocked on allowlist grounds without ever invoking a detector.
Ghost-argument BLOCK. A camouflaged directive cannot smuggle undeclared parameters past validation.
Hard output cap + sanitization. Tool output that re-enters the model context is truncated and PII/secret-masked so a camouflaged directive embedded in tool output can't carry into a downstream agent at full length.
Per-call reauthorization (debate-amplification guard). Once a tool's output has flowed back into the model, any reinvocation requires an explicit context.authorize_once(tool) grant from the harness — breaking the multi-agent fan-out path the paper identifies.

from agent_airlock import Airlock, apply_camouflage_resistant

bundle = apply_camouflage_resistant(allowed_tools=["read_file", "search"])

@Airlock(config=bundle.config, policy=bundle.policy)
def read_file(path: str) -> str:
    ...

apply_camouflage_resistant() composes the matching AirlockConfig (unknown-args BLOCK, sanitization on, output cap 4000 chars) with a SecurityPolicy carrying your explicit allowlist. The preset is deliberately incomplete on its own — the config-level knobs and the policy-level knobs span two seams, so the factory returns both as a CamouflageResistantBundle.

Running an MCP server with STDIO transport? Also wire the Ox MCP STDIO sanitizer via stdio_guard_ox_defaults() — it blocks the entire CVE-2026-30616 class (shell metacharacter injection, non-allowlisted binaries, Trojan-Source RTL overrides, and inline-code flags) before subprocess.Popen.

🪪 MCP server attestation (v0.8.10)

arXiv:2605.24248 ("Attested Tool-Server Admission", Metere, May 2026) calls out a gap MCP itself does not close: the protocol standardises message exchange between LLM agents and tool servers but says nothing about trust. Anybody who can answer on the wire can declare themselves a tool server.

mcp_attested_admission_defaults() is a deny-by-default opt-in preset that closes the gap host-side, mirroring the paper's three additive mechanisms:

Offline-signed clearance assertion. Before any tool from an MCP server is dispatched, the host fetches a JWS-compact clearance from {server_url}/.well-known/mcp-clearance (path is configurable) and verifies its signature against an operator-pinned trust root. The trust root is supplied to AttestedAdmissionConfig at process startup — never network-fetched on the hot path.
Deny-by-default per-server tool allowlist. Admitting a server is not the same as trusting its every tool. The verified clearance carries an explicit list of tool names the host will permit; everything else is denied. The sub claim is matched against the server identity the host is about to dispatch to (so a stolen clearance from server A can't admit a tool call to server B).
Flavor-gated enforcement. ENFORCE (default) hard-denies on missing / invalid / expired clearance; WARN logs and admits — the staged turn-up an operator wants when introducing the gate against real traffic.

Every admission decision emits a ReceiptVerdict on the guard="mcp_attested_admission" channel, so the existing airlock attest DSSE pipeline picks decisions up unchanged — this preset does not invent a new log.

from agent_airlock.mcp_proxy_guard import MCPProxyConfig, MCPProxyGuard
from agent_airlock.mcp_spec.attested_admission import TrustRoot
from agent_airlock.policy_presets import mcp_attested_admission_defaults

# Operator pins the trust root at startup. Never fetched at runtime.
with open("/etc/airlock/mcp-clearance-root.pem", "rb") as fh:
    pinned_pem = fh.read()

cfg = mcp_attested_admission_defaults(
    trust_root=TrustRoot(key_id="ops-2026Q2", ed25519_pem=pinned_pem),
    enforcement_mode="ENFORCE",       # deny-by-default
    max_clearance_age_days=30,
)
guard = MCPProxyGuard(MCPProxyConfig(attested_admission=cfg))

decision = guard.audit_tool_admission(
    server_url="https://mcp.example.com",
    server_id="srv-alpha",            # expected `sub` claim
    tool_name="read",
)
if not decision.admitted:
    raise RuntimeError(decision.reason)

Signature verification needs the [attested] extra (pulls in cryptography for offline Ed25519 / RSA-PSS / JWKS verification); the base install stays zero-runtime-dep.

Install with pip install "agent-airlock[attested]". Opt-in only — existing callers that don't set attested_admission get exactly v0.8.9 behavior.

🧭 Behavioral sequence guard (v0.8.12)

Watches the ordered stream of tool calls in a session and flags divergence from a declared expected order — not the model's stated reasoning trace.

arXiv:2605.27901 ("The Fragility of Chain-of-Thought Monitoring", Onyame, Zhou, Thopalli, Kailkhura & Agarwal, May 2026) reports an average 95.9% CoT unfaithfulness across 8B–120B-parameter models — including answer-switching, post-hoc rationalisation, and procedural exploitation of hints. Trusting the model's stated reasoning to detect misbehavior is therefore not viable. Trusting its behavior — the sequence of tools it actually invokes — is.

SequenceGuard is an opt-in field on SecurityPolicy that runs in the @Airlock seam right after the standard policy check, in two modes:

DECLARED mode — operator supplies a permitted-transition DAG. Any transition not in the DAG is a SequenceViolation. Deny-by-default.

from agent_airlock import Airlock, SecurityPolicy
from agent_airlock.sequence_guard import SequenceGuard, ENTRY_SENTINEL

policy = SecurityPolicy(
    sequence_guard=SequenceGuard(
        mode="declared",
        action="block",                       # or "warn"
        dag={
            ENTRY_SENTINEL: {"read"},         # only `read` may start a session
            "read": {"read", "summarize"},    # after read, either re-read or summarize
            "summarize": {"send"},            # after summarize, only send
            "send": set(),                    # send is terminal
        },
    ),
)

BASELINE mode — guard maintains a per-session-key Markov transition profile in a local JSON file (no cloud, no PII — only tool names and SHA-256 shape hashes of (arg types, kwarg names+types), never argument values) and flags transitions with observed P(curr | prev) < threshold once the sample size from prev reaches min_baseline_samples.

from pathlib import Path
from agent_airlock.sequence_guard import SequenceGuard

policy = SecurityPolicy(
    sequence_guard=SequenceGuard(
        mode="baseline",
        baseline_path=Path("/var/lib/airlock/sequence-baseline.json"),
        low_probability_threshold=0.05,   # flag the bottom 5%
        min_baseline_samples=50,          # don't flag until 50 obs from `prev`
    ),
)

Every flagged transition emits OTel span attributes on the current span (airlock.sequence_guard.mode, .from_tool, .to_tool, .session_key, .observed_probability) via the existing observability provider — telemetry failures are swallowed so they cannot break enforcement.

Not AnomalyDetector (that's rate / endpoint-diversity / error-rate / consecutive-blocked over sliding windows). SequenceGuard is per-transition ORDER signal. Run both for layered coverage. Not a chain-of-thought monitor — by construction.

Strictly opt-in. The new SecurityPolicy.sequence_guard field defaults to None; callers that don't set it get exactly v0.8.11 behavior. Zero new runtime deps — Pydantic-only core stays intact.

🛑 Action-time contradiction gate (v0.8.15)

arXiv:2605.27157 ("Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs", Yu et al., 2026) shows that LLMs readily acknowledge contradictory evidence in their reasoning trace yet "this awareness fails to constrain their final recommendations". The deficit is at action selection — single-turn diagnostics overestimate RAG safety, and detection alone is not a control.

ActionContradictionGate is an opt-in policy hook that wraps three pluggable detectors (any one trips) and a privileged-sink glob set. When a detector trips AND the dispatched tool matches a privileged sink AND the harness has not issued an explicit allow, the gate blocks the call (or warns, depending on action=).

The explicit-allow primitive is not new — the gate reuses the existing AirlockContext.authorize_once(tool_name) (introduced for the v0.8.6 reauth flow). Same one-shot grant, same semantics. After a one-shot is consumed the gate re-locks — the harness must mint a fresh authorize_once for each privileged action.

import re
from agent_airlock import Airlock, SecurityPolicy
from agent_airlock.action_contradiction_gate import ActionContradictionGate

policy = SecurityPolicy(
    action_contradiction_gate=ActionContradictionGate(
        # Detector 1: a boolean flag the RAG pipeline flips on after
        # it sees an evidence-vs-claim conflict the agent discussed.
        signal_field_key="evidence_contradiction",
        # Detector 2: pluggable regex against the SAME key when its
        # value is a string (operator-controlled marker — never the
        # model's full reasoning trace).
        marker_regex=re.compile(r"contradict|conflict|disagree", re.I),
        # Detector 3: fully pluggable callable; receives the context.
        # predicate=lambda ctx: ctx.metadata.get("conflict_count", 0) > 1,
        # Default privileged sinks: send_* / export_* / commit_* /
        # transfer_* / delete_* + the v0.8.14 outbound-integration set.
        # Operators can narrow via `privileged_sinks=(...)`.
        action="block",  # or "warn" for staged turn-up
    ),
)

Off-by-default invariant. SecurityPolicy.action_contradiction_gate defaults to None; non-RAG flows pay zero false-positive tax (no detector runs, no log lines, no metadata reads). Even when wired, the gate is inert until at least one detector slot is configured — so a partial roll-out (gate attached but detectors flipped off) admits everything.

Not a chain-of-thought monitor. The gate reads operator- controlled signals only (a metadata field, an operator regex, an operator predicate). It never reads the model's own claim that it has or has not noticed a contradiction — the paper's whole point is that those claims do not gate behavior.

Not sequence_guard (v0.8.12) — that flags unusual call ORDER. Not reauth_on_untrusted_reinvocation (v0.8.6) — that's count-driven on a per-tool counter. This gate is signal-driven and targets a specific privileged-sink glob set. They compose; run all three for layered coverage.

Strictly opt-in. Zero new runtime deps — Pydantic-only core stays intact. The new SecurityPolicy.action_contradiction_gate field defaults to None; callers that don't set it get exactly v0.8.14 behavior.

📊 Adversarial-negotiation regression harness (v0.8.17)

A deterministic harness that measures what the deny-by-default governance layer does to a fixed set of adversarial buyer-seller negotiation actions — and reports two metrics named to line up with an external published baseline so the numbers can sit side by side.

python -m agent_airlock.cli.negotiation_bench --report markdown

Each scenario carries a concrete, checkable unsafe action and runs twice — baseline (no airlock, the unsafe event lands) and governed (the same action through the real @Airlock intercept-before-execute path, no policy-layer mocking). Three unsafe-action classes each exercise a different real interception mechanism: price-below-floor → Pydantic strict-validation, secret-leak → the output sanitizer, transfer-outside-policy → deny-by-default SecurityPolicy. Benign deals are included to confirm governance does not over-block.

source	unsafe_execution_rate (base → governed)	valid_task_success_rate (base → governed)
agent-airlock (this harness)	100% → 0%	43% → 100%
OCL (external, live LLMs, arXiv:2606.04306)	88% → ~0%	12% → 96%

The OCL row is an external result, not agent-airlock's. It was measured on live frontier LLM agents in AgenticPay-adapted negotiation (OCL, arXiv:2606.04306; AgenticPay, arXiv:2602.06008) and is reproduced here only for directional comparison — both put governance at the execution boundary. It is not the same experiment: agent-airlock is a deterministic execution-boundary validator, not an LLM, and this harness does not call a model. The agent-airlock rows are a property of the policy layer under a worst-case scripted adversary, exercised through the real @Airlock path.

The harness doubles as a regression gate: --fail-if-governed-unsafe exits non-zero if the governed unsafe_execution_rate ever rises above zero, so a future change that weakens the policy layer fails CI. Zero new runtime deps; fully deterministic (no randomness, no network, no model call).

🔎 Privilege right-sizing — `airlock-explain --unused-scopes` (v0.8.13)

A read-only CLI that surfaces over-permissioning: it diffs the SecurityPolicy's granted tool scopes against the tools the agent actually called (from an OTLP export OR a native audit JSONL), per AgentIdentity, and prints the dead-weight set plus a suggested tightened allow-list.

# Install the v0.8.13 wheel; airlock-explain becomes available
pip install "agent-airlock>=0.8.13"

# Diff granted vs used; print a table
airlock-explain --unused-scopes \
    --policy ./security-policy.toml \
    --trace  ./agent.audit.jsonl

# Same, machine-readable, plus a proposed tightened policy preview
airlock-explain --unused-scopes \
    --policy ./security-policy.toml \
    --trace  ./otel-export.json \
    --format json \
    --suggest-policy

Observability-only. This command never mutates the SecurityPolicy, never writes the policy file, and never auto-applies the suggestion. The deny-by-default posture is unchanged — the right-size CLI is a review aid, not an enforcement primitive. The --suggest-policy output is intentionally a stdout preview so a human reviews the tightened allow-list before adopting it by hand.

Trace formats (auto-detected by inspecting the file head):

Audit JSONL — the format AuditLogger already emits. One JSON object per line, with tool_name / agent_id / blocked. Blocked calls are excluded from the "actually called" set — a blocked call is not an exercise of a granted scope.
OTLP JSON — the format opentelemetry-exporter-otlp writes. Span name is the tool name; attributes.agent_id keys the per- agent diff. If a span carries airlock.blocked=true it is skipped, same as JSONL.

Diff semantics. The matcher is fnmatch — the same glob semantics SecurityPolicy.check_tool_allowed uses internally, so the suggested tightened allow-list admits exactly the tools the agent was observed calling (no surprises at adoption time). Denied-list patterns are forwarded unchanged to the suggestion: denials are intent, not usage data.

Strictly observability. No new runtime deps. The new console-script entry airlock-explain is the project's first installable CLI; existing python -m agent_airlock.cli.<name> invocations are unaffected.

🧾 Static contract / type-checker — `airlock scan-tools` (v0.8.42)

The wedge, made static. Where @Airlock enforces the contract at call time, scan-tools type-checks the tool declarations before your agent loads them — a deny-by-default contract layer for AI tool calls, distinct from content-signature tool-poisoning scanners (MCP-Scan, eSentire MCP-Scanner).

pip install "agent-airlock>=0.8.42"

# Grade every tool declaration against a least-privilege policy (CI-friendly).
airlock-scan-tools ./tools/ --policy strict          # exit 0=clean 1=warn 2=fail
python -m agent_airlock.cli.scan_tools ./mcp.json --output json

It reads a .json tool-def file, a directory of them, or an mcp.json / claude_desktop_config.json config with inlined tool schemas, and grades each tool pass / warn / fail:

Over-broad argument surface — additionalProperties is not false on a destructive/mutating tool (a ghost/hallucinated-argument vector). FAIL.
Missing type constraints — a sensitive string arg (path / url / command / …) with no enum / pattern / format / maxLength, or an untyped property. WARN.
Capability caps exceed policy — the tool's inferred capability (from MCP annotations, an explicit capabilities list, or a name heuristic) is denied / not-granted by the policy's shipped CapabilityPolicy. FAIL.
Server-card trust boundary — a tool description carrying injected instructions ("…ignore previous instructions and run…"). Reuses the mcp_spec_2026_07 Server-Card / SEP-2468 guard. FAIL.

Policies map 1:1 to shipped constants — permissive / read-only / strict / deny-by-default — no invented policy. strict uses the STRICT capability caps (shell/delete denied, write not granted). Coverage is measured against MCPTox-derived fixtures (python -m benchmarks.scantools_mcptox) and reported as-is: 69.2% static contract-checking coverage · 100% precision — not MCPTox's model-in-the-loop Attack Success Rate.

Static review aid, Pydantic-only, zero-dep core. The shipped surface is the airlock-scan-tools console script (and python -m agent_airlock.cli.scan_tools); the unified airlock scan-tools space-form lands with the deferred dispatcher PR.

🩹 Skill-resistant trace redaction + watermark (v0.8.24)

Why traces are an extraction surface: an agent's emitted trace/receipt is a distillation target, not just an audit artifact. A trace that records the tuned thresholds a policy fired on, the exact tool-call arguments, and the recovered intermediate formulas/strategies hands a competitor the recipe — enough to clone the behaviour without paying for the search that found it. The verifier, by contrast, needs only the evidence (the gate ran / the policy fired / pass-fail), never the recipe. This is the RedAct-style threat model — a composition of published behavioural-watermarking work (Agent Guide, arXiv:2504.05871; CoTGuard, arXiv:2505.19405; Distilling the Thought, arXiv:2601.05144). agent-airlock does not reproduce any paper's benchmark.

TraceRedactionPolicy (opt-in, OFF by default for backward compat, ON under STRICT_POLICY) runs at the non-local sink (e.g. the OTel exporter): it (a) localizes protected fields with a configurable field-classifier (tuned thresholds, tool-call args, recovered formulas/strategies), (b) rewrites them to keep verifier-critical evidence while dropping the recipe, and (c) embeds a per-tenant behavioural watermark so a leaked trace is provably yours. Detect it with airlock trace verify-watermark <trace.json> (cryptographic keyed-HMAC match → high true-detection, low false-alarm); add --redaction-report to see what was localized / rewritten / preserved. Stdlib-only — no new runtime dependency.

from agent_airlock import TraceRedactionPolicy, trace_redact, verify_watermark

pol = TraceRedactionPolicy(enabled=True, tenant_id="acme-co", watermark_secret="...")
redacted, report = trace_redact(trace, pol)   # tuned_threshold → evidence stub; recipe dropped
assert verify_watermark(redacted, pol).detected   # provably yours

✅ Fail-closed terminal-claim guard — `no_false_success` (v0.8.25)

An honest stall is recoverable; a confident wrong done is not. The dominant failure mode of unattended long-horizon agents isn't crashing — it's confidently reporting success they never verified (Goal-Autopilot, arXiv:2606.11688). The no_false_success preset enforces that paper's No-False-Success floor: a terminal/done claim is admitted only if a named, falsifiable check actually executed and passed THIS run. No receipt, a failed check, or a forged/replayed receipt → the guard fails closed to a recoverable honest stall (run the named check and retry), never a fabricated success.

Forgery resistance is structural: the guard mints a per-run token and only trusts a receipt it stamped by executing the check this run — a receipt that's merely present (hand-built, or replayed from a prior run) is rejected. Opt in per-agent with AirlockConfig(require_done_receipt=True) (or require_done_receipt = true under [airlock] in airlock.toml); OFF by default. Stdlib-only — no new runtime dependency.

from agent_airlock import no_false_success_defaults, NoFalseSuccessStall

preset = no_false_success_defaults({"tests_green": run_pytest})  # falsifiable check
preset["guard"].run("tests_green")        # actually execute the check this run
preset["check"]("tests_green")            # raises NoFalseSuccessStall unless it passed

💰 Cost Control

A runaway agent can burn $500 in API costs before you notice.

from agent_airlock import Airlock, AirlockConfig

config = AirlockConfig(
    max_output_chars=5000,    # Truncate before token explosion
    max_output_tokens=2000,   # Hard limit on response size
)

@Airlock(config=config)
def query_logs(query: str) -> str:
    return massive_log_query(query)  # 10MB → 5KB

ROI: 10MB logs = ~2.5M tokens = $25/response. Truncated = ~1.25K tokens = $0.01. 99.96% savings.

Per-model-tier budgets (v0.8.7)

The flat max_output_* caps above apply uniformly to every call. ModelTierBudget caps per-call cost and output tokens per model tier label (e.g. "frontier" / "mid" / "small"), evaluated before the tool runs. Untagged calls fall back to a configurable strict_tier (deny-by-default — the cheapest tier).

from agent_airlock import (
    Airlock, ModelTierBudget, SecurityPolicy, TierBudget,
)

policy = SecurityPolicy(
    model_tier_budget=ModelTierBudget(
        tiers={
            "frontier": TierBudget(max_cost_cents=50, max_output_tokens=4000),
            "mid":      TierBudget(max_cost_cents=10, max_output_tokens=2000),
            "small":    TierBudget(max_cost_cents=2,  max_output_tokens=1000),
        },
        strict_tier="small",  # untagged → cheapest tier (deny-by-default)
    ),
)

@Airlock(policy=policy, return_dict=True)
def call_model(prompt: str, **_extra):
    return run_my_router(prompt)

# The router tags each call. Airlock blocks before the model fires.
call_model("Draft a tweet",  _airlock_tier="small",    _airlock_input_tokens=50)
call_model("Deep analysis", _airlock_tier="frontier", _airlock_input_tokens=200_000)
# →  AIRLOCK_BLOCK: Tier 'frontier' budget exceeded (worst-case 66¢ > cap 50¢)

Routing logic stays in the user's router. Three tagging routes are supported:

_airlock_tier kwarg — stripped before the tool sees it.
context.metadata["airlock_tier"] — set on a contextvar-stored AirlockContext by the router's session middleware.
tier_resolver callback — ModelTierBudget(tier_resolver=fn) where fn(model_id: str) -> tier_label lives in the caller's code. Airlock invokes the callback when context.metadata["model_id"] is set; it carries no vendor-specific model→tier table.

After execution, actual vs estimated cost is reconciled into the global CostTracker (observability — never blocks). See examples/model_tier_budget.py for all four patterns including composition with allow/deny lists.

A ready-to-use strict_tier_budget_policy() preset returns a SecurityPolicy seeded with the table above.

🔐 PII & Secret Masking

config = AirlockConfig(
    mask_pii=True,      # SSN, credit cards, phones, emails
    mask_secrets=True,  # API keys, passwords, JWTs
)

@Airlock(config=config)
def get_user(user_id: str) -> dict:
    return db.users.find_one({"id": user_id})

# LLM sees: {"name": "John", "ssn": "[REDACTED]", "api_key": "sk-...XXXX"}

13 PII types detected · 4 masking strategies · Zero data leakage

Opt-in regional PII (`pii_locales`)

Aadhaar / PAN / UPI / IFSC / India-mobile ship as SensitiveDataType members, but are not added to the default mask_pii=True set — to keep the surface zero-dep and US-shaped by default. v0.8.9 adds a pii_locales opt-in that pulls them in and tightens detection:

config = AirlockConfig(
    mask_pii=True,
    pii_locales=["in"],   # opt in to India-locale detection
)

@Airlock(config=config)
def lookup(query: str) -> str:
    return (
        "User: राम कुमार, "
        "Aadhaar: 234567890124, "       # → "********0124" (last-4 only, UIDAI standard)
        "PAN: ABCPE1234F, "             # → "AB******4F"  (first-2 + last-2; 4th char P = valid PAN entity)
        "UPI: alice@oksbi, "            # → "a***@oksbi"  (mask local part, keep @bank)
        "phone: 555-123-4567"           # still masked by existing PHONE regex
    )

Activated when "in" in pii_locales:

Aadhaar Verhoeff checksum gate — the existing Aadhaar regex is permissive (any 12-digit number starting 2-9 matches). With the opt-in, each match must also pass the UIDAI Verhoeff checksum, cutting the FP rate ~10x on random IDs / phone numbers.
PAN entity-char gate (v0.8.46) — a PAN-shaped match whose 4th character is not a valid holder-type code (P/C/H/F/A/T/B/L/J/G) is dropped, cutting false positives on random [A-Z]{5}[0-9]{4}[A-Z] strings.
India mobile (v0.8.46) — SensitiveDataType.INDIA_MOBILE detects a +91 / 91 / trunk-0 prefixed 10-digit number starting 6-9 and masks to +91-XXXXX-<last3>. Prefix-required by design: a bare 10-digit run is already masked by the US-shaped PHONE detector, so India-mobile only claims the +91 forms and never shares a span with it.
Devanagari personal-name detection — PERSONAL_NAME_DEVANAGARI runs against the Unicode block U+0900–U+097F, with a small allowlist of common Hindi greetings / pronouns / interrogatives to keep ordinary prose from being masked. Conservative heuristic — production callers who need precise extraction should layer NER on top.

These align with India's DPDP Act 2023 (Digital Personal Data Protection); the india_dpdp_2023_policy() / apply_india_dpdp_2023() preset pair turns the tool/capability gate and these maskers on in one call. The additions were driven by the DPDP Consent-Manager rules (effective 2026-11-14), which sharpen data-fiduciary obligations for Aadhaar / PAN / mobile handling.

The flag is additive and reversible — pii_locales=[] (the default) preserves the prior behavior bit-for-bit.

🌐 Network Airgap (V0.3.0)

Block data exfiltration during tool execution:

from agent_airlock import network_airgap, NO_NETWORK_POLICY

# Block ALL network access
with network_airgap(NO_NETWORK_POLICY):
    result = untrusted_tool()  # Any socket call → NetworkBlockedError

# Or allow specific hosts only
from agent_airlock import NetworkPolicy

INTERNAL_ONLY = NetworkPolicy(
    allow_egress=True,
    allowed_hosts=["api.internal.com", "*.company.local"],
    allowed_ports=[443],
)

💉 Framework Vaccination (V0.3.0)

Secure existing code without changing a single line:

from agent_airlock import vaccinate, STRICT_POLICY

# Before: Your existing LangChain tools are unprotected
vaccinate("langchain", policy=STRICT_POLICY)

# After: ALL @tool decorators now include Airlock security
# No code changes required!

Supported: LangChain, OpenAI Agents SDK, PydanticAI, CrewAI

⚡ Circuit Breaker (V0.4.0)

Prevent cascading failures with fault tolerance:

from agent_airlock import CircuitBreaker, AGGRESSIVE_BREAKER

breaker = CircuitBreaker("external_api", config=AGGRESSIVE_BREAKER)

@breaker
def call_external_api(query: str) -> dict:
    return external_service.query(query)

# After 5 failures → circuit OPENS → fast-fails for 30s
# Then HALF_OPEN → allows 1 test request → recovers or reopens

📈 OpenTelemetry Observability (V0.4.0)

Enterprise-grade monitoring:

from agent_airlock import configure_observability, observe

configure_observability(
    service_name="my-agent",
    otlp_endpoint="http://otel-collector:4317",
)

@observe(name="critical_operation")
def process_data(data: dict) -> dict:
    # Automatic span creation, metrics, and audit logging
    return transform(data)

🔌 Framework Compatibility

The Golden Rule: @Airlock must be closest to the function definition.

@framework_decorator    # ← Framework sees secured function
@Airlock()             # ← Security layer (innermost)
def my_function():     # ← Your code

LangChain / LangGraph

from langchain_core.tools import tool
from agent_airlock import Airlock

@tool
@Airlock()
def search(query: str) -> str:
    """Search for information."""
    return f"Results for: {query}"

OpenAI Agents SDK

from agents import function_tool
from agent_airlock import Airlock

@function_tool
@Airlock()
def get_weather(city: str) -> str:
    """Get weather for a city."""
    return f"Weather in {city}: 22°C"

PydanticAI

from pydantic_ai import Agent
from agent_airlock import Airlock

@Airlock()
def get_stock(symbol: str) -> str:
    return f"Stock {symbol}: $150"

agent = Agent("openai:gpt-4o", tools=[get_stock])

CrewAI

from crewai.tools import tool
from agent_airlock import Airlock

@tool
@Airlock()
def search_docs(query: str) -> str:
    """Search internal docs."""
    return f"Found 5 docs for: {query}"

More frameworks: LlamaIndex, AutoGen, smolagents, Anthropic

LlamaIndex

from llama_index.core.tools import FunctionTool
from agent_airlock import Airlock

@Airlock()
def calculate(expression: str) -> int:
    return eval(expression, {"__builtins__": {}})

calc_tool = FunctionTool.from_defaults(fn=calculate)

AutoGen

from autogen import ConversableAgent
from agent_airlock import Airlock

@Airlock()
def analyze_data(dataset: str) -> str:
    return f"Analysis of {dataset}: mean=42.5"

assistant = ConversableAgent(name="analyst", llm_config={"model": "gpt-4o"})
assistant.register_for_llm()(analyze_data)

smolagents

from smolagents import tool
from agent_airlock import Airlock

@tool
@Airlock(sandbox=True)
def run_code(code: str) -> str:
    """Execute in E2B sandbox."""
    exec(code)
    return "Executed"

Anthropic (Direct API)

from agent_airlock import Airlock

@Airlock()
def get_weather(city: str) -> str:
    return f"Weather in {city}: 22°C"

# Use in tool handler
def handle_tool_call(name, inputs):
    if name == "get_weather":
        return get_weather(**inputs)  # Airlock validates

Adapter-shipped vs example-only (honest split)

Both paths use the same @Airlock() decorator placement. "Adapter-shipped" means there's a dedicated src/agent_airlock/integrations/<framework>.py module with framework-specific glue (signature preservation, tool registry rewrites, request-shape adapters). "Example-only" means the decorator is compatible out of the box — no extra adapter required.

Adapter-shipped (11): LangChain (integrations/langchain.py), LangGraph (integrations/langgraph_toolnode_compat.py), OpenAI Agents SDK (integrations/openai_guardrails.py), Anthropic Messages API (integrations/anthropic.py), Anthropic Claude Agent SDK (integrations/anthropic_claude_agent_sdk.py, v0.6.1+), smolagents (integrations/smolagents_wrapper.py), Gemini 3 Agent Mode (integrations/gemini3_tool_shape_adapter.py), GPT-5.5 (integrations/gpt5_5_tool_shape_adapter.py), PydanticAI (integrations/pydantic_ai.py, v0.7.1+), CrewAI (integrations/crewai.py, v0.7.2+), FastMCP (mcp.py).

Example-only (2): AutoGen, LlamaIndex — decorator-compatible without an adapter; see examples/.

Complete Examples

Framework	Path	Surface
LangChain	adapter · example	@tool, AgentExecutor
LangGraph	adapter · example	StateGraph, ToolNode
OpenAI Agents	adapter · example	Handoffs, manager pattern
Anthropic API	adapter · example	Direct Messages API
Claude Agent SDK	adapter · doc	`wrap_agent(agent, policy=...)`
smolagents	adapter · example	CodeAgent, E2B
Gemini 3	adapter	`function_call` carrier + `thought_signature` redaction
GPT-5.5	adapter	`gpt_5_5_agent_defaults` preset
FastMCP	adapter · example	`@secure_tool` decorator
PydanticAI	adapter · doc · example	`wrap_agent(agent, policy=...)` + output_validate hook
CrewAI	adapter · doc · example	`wrap_crew(crew, policy=...)` + task-level tool overrides
LlamaIndex	example only	ReActAgent
AutoGen	example only	ConversableAgent

⚡ FastMCP Integration

from fastmcp import FastMCP
from agent_airlock.mcp import secure_tool, STRICT_POLICY

mcp = FastMCP("production-server")

@secure_tool(mcp, policy=STRICT_POLICY)
def delete_user(user_id: str) -> dict:
    """One decorator: MCP registration + Airlock protection."""
    return db.users.delete(user_id)

🏆 Why Not Enterprise Vendors?

	Prompt Security	Pangea	Agent-Airlock
Pricing	$50K+/year	Enterprise	Free forever
Integration	Proxy gateway	Proxy gateway	One decorator
Self-Healing	❌	❌	✅
E2B Sandboxing	❌	❌	✅ Native
Your Data	Their servers	Their servers	Never leaves you
Source Code	Closed	Closed	MIT Licensed

We're not anti-enterprise. We're anti-gatekeeping. Security for AI agents shouldn't require a procurement process.

📦 Installation

# Core (validation + policies + sanitization)
pip install agent-airlock

# With E2B sandbox support
pip install agent-airlock[sandbox]

# With FastMCP integration
pip install agent-airlock[mcp]

# Everything
pip install agent-airlock[all]

# E2B key for sandbox execution
export E2B_API_KEY="your-key-here"

🛡️ OWASP Compliance

Agent-Airlock maps to the OWASP Top 10 for Agentic Applications (2026) — the agentic-era successor to the old LLM Top 10. Coverage is reported honestly: Full means the primitive ships and blocks the class in tests; Partial means agent-airlock covers the runtime leg but something upstream (client UI, IAM, training data) is out of scope; Monitor-only means we surface the signal but do not actually prevent the risk.

Risk	Implemented in agent-airlock	Module / preset	Coverage
ASI01 Agent Goal Hijack	Pydantic strict validation + ghost-arg rejection + `UnknownArgsMode.BLOCK`	`validator`, `unknown_args`, `core`	Partial
ASI02 Tool Misuse and Exploitation	Deny-by-default `SecurityPolicy`, RBAC, rate limits, `SafePath` / `SafeURL`, Flowise `Function()`/`eval` token ban (CVE-2025-59528), MCPwn destructive-auth check (CVE-2026-33032), Mobile MCP intent-URL guard (CVE-2026-35394), SafeURL IPv6-transition metadata SSRF (CVE-2026-48782)	`policy`, `safe_types`, `filesystem`, `network`, `policy_presets.flowise_cve_2025_59528_defaults`, `policy_presets.mcpwn_cve_2026_33032_defaults`, `policy_presets.mobile_mcp_intent_guard_2026_05`	Full
ASI03 Identity and Privilege Abuse	`AgentIdentity`, `MCPProxyGuard` token-passthrough prevention, `CredentialScope`, OAuth-app audit (Vercel 2026-04-19), MCP Attested Tool-Server Admission (arXiv:2605.24248)	`policy`, `mcp_proxy_guard`, `mcp_spec.oauth_audit`, `mcp_spec.attested_admission`, `policy_presets.oauth_audit_vercel_2026_defaults`, `policy_presets.mcp_attested_admission_defaults`	Partial
ASI04 Agentic Supply Chain Vulnerabilities	Ox MCP STDIO sanitizer + CVE regression suite (11+ CVEs tracked) + session-snapshot integrity guard + spawn-time MCP config pin (CVE-2026-30615, `policy_presets.mcp_config_pin`)	`mcp_spec.stdio_guard`, `mcp_spec.session_guard`, `mcp_spec.zero_click_config_guard`, `policy_presets.stdio_guard_ox_defaults`, `policy_presets.mcp_config_pin`, `tests/cves/`	Partial
ASI05 Unexpected Code Execution (RCE)	E2B Firecracker sandbox, pluggable `SandboxBackend`, capability gating for `PROCESS_SHELL`, Flowise eval-token ban (CVE-2025-59528)	`sandbox`, `sandbox_backend`, `capabilities`, `policy_presets.flowise_cve_2025_59528_defaults`	Full
ASI06 Memory & Context Poisoning	`AirlockContext` `contextvars` isolation, `ConversationConstraints` budget caps, audit logging	`context`, `conversation`, `sanitizer`	Partial
ASI07 Insecure Inter-Agent Communication	A2A middleware Pydantic strict validation, method allow-lists	`a2a`	Partial
ASI08 Cascading Failures	`CircuitBreaker`, `RetryPolicy`, token-bucket rate limits	`circuit_breaker`, `retry`, `policy`	Full
ASI09 Human-Agent Trust Exploitation	Honeypot deception, audit-log attribution, structured `fix_hints`	`honeypot`, `audit_otel`	Partial
ASI10 Rogue Agents	Audit telemetry + anomaly detector; no quarantine primitive	`observability`, `anomaly`	Monitor-only

MCP-specific mapping

The OWASP MCP Top 10 (2026 beta) is covered end-to-end by the OWASP_MCP_TOP_10_2026 policy preset:

MCP risk	Ships in agent-airlock
MCP01 Token Mismanagement	`MCPProxyGuard` rejects passthrough headers, enforces audience
MCP02 Excessive Permissions	`SecurityPolicy` + `CredentialScope`
MCP03 Tool Poisoning	ghost-arg rejection + `SafePath`/`SafeURL`
MCP04 Supply Chain	`stdio_guard_ox_defaults()` (Ox 2026-04-16 advisory)
MCP05 Command Injection	`stdio_guard` shell-metachar + deny-pattern rules
MCP07 Insufficient Authentication	OAuth 2.1 + PKCE S256 helpers in `mcp_spec.oauth`; SEP-2468 `iss` mix-up validation (`mcp_spec_2026_07_defaults`); SEP-2243 header/body routing integrity (`mcp_spec_2026_07_header_integrity_defaults`)
MCP10 Context Oversharing	PII/secret sanitizer + workspace-scoped config

Use it directly:

from agent_airlock import Airlock
from agent_airlock.policy_presets import owasp_mcp_top_10_2026_policy

@Airlock(policy=owasp_mcp_top_10_2026_policy())
def my_mcp_tool(...):
    ...

Ox Security STDIO advisory (2026-04-16, CVE-2026-30616): see docs/cves/index.md#cve-2026-30616 and the stdio_guard_ox_defaults() preset above. agent-airlock blocks 3 of 4 Ox attack classes at the runtime seam.

🏢 Used By

Agent-Airlock secures AI agent systems in production:

Project	Use Case
FerrumDeck	AgentOps control plane — deny-by-default tool execution
Mnemo	MCP-native memory database — secure tool call validation

Using Agent-Airlock in production? Open a PR to add your project!

📊 Performance

Test count and coverage are published by the TEST-BADGE block at the top of this file, regenerated from pytest on every release via python scripts/update_test_badge.py. That block is the source of truth; this table tracks latency and surface area only.

Metric	Value
Validation overhead	<50ms
Sandbox cold start	~125ms
Sandbox warm pool	<200ms
Framework integrations	13
Core dependencies	0 (Pydantic only)

📖 Documentation

Resource	Description
AGENTS.md	v0.6.1 — repo-root entrypoint for agentic IDEs (Cursor, Claude Code, Windsurf, Mintlify)
Anthropic Claude Agent SDK adapter	v0.6.1 — `AnthropicClaudeAgentSDKAdapter.wrap_agent(agent, policy=...)`; canonical-list trio
`airlock manifest enforce`	v0.6.1 — fail-closed CLI runtime allowlist gate against signed manifests; CI exits 0/2/3
Managed Agents Outcomes-rubric guard	v0.7.4 — fail-closed gate on the Anthropic Managed Agents 2026-05-06 Outcomes rubric ID; `ManagedAgentsOutcomesGuard.evaluate(provenance)` + `managed_agents_outcomes_2026_05_06_defaults` factory; no SDK dep
Filter-Eval RCE guard (CVE-2026-25592 + CVE-2026-26030)	v0.7.5 — regex detector for the Semantic-Kernel-class lambda-filter / template-expression eval RCE primitive (MSRC 2026-05-07); `FilterEvalRCEGuard.evaluate(args)` + `semantic_kernel_filter_eval_rce_2026_25592_26030_defaults` factory; framework-agnostic
OIDC publish-window guard (TanStack 2026-05-11)	v0.7.6 — known-bad blast-list guard for the TanStack/Mini-Shai-Hulud npm OIDC trusted-publisher class (postmortem 2026-05-11; 42 pkgs × 84 versions); `OIDCPublishWindowGuard.evaluate(args)` + `npm_oidc_publish_window_guard_defaults` factory; pure-data preset, no runtime npm calls
MCP STDIO command-injection guard	v0.7.6 — shell metachar + opt-in path-traversal denier for MCP STDIO argv vectors (HelpNetSecurity 2026-05-05); `StdioCommandInjectionGuard.evaluate(args)` + `mcp_stdio_command_injection_preset_defaults` factory; no `mcp` SDK dep
Eval-RCE guard (CVE-2026-44717)	v0.8.0 — bare-`eval()`/`parse_expr()`/`exec()` invocation detector for the MCP Calculate Server class (NVD 2026-05-15); `EvalRCEGuard.evaluate(args)` + curated vulnerable-package denylist + `parse_expr` safe-form exemption + `stdio_guard_eval_defaults_2026_05_15` factory
MCP Inspector exposure guard (CVE-2026-23744 runtime)	v0.8.0 — Linux runtime listener-scan via stdlib `/proc/net/tcp` for the MCPJam Inspector public-bind class; complements v0.5.x config-time `bind_address_guard`; `MCP_INSPECTOR_REQUIRE_AUTH=1` operator bypass
Agent SDK Credit pool budget	v0.8.0 — per-month USD pool tracker for Anthropic's 2026-06-15 billing split (Zed blog 2026-05-14); `AgentSDKCreditBudget.register_call(model, input_tokens, output_tokens)` with 90% near-limit + 100% exhausted thresholds; packaged 2026-06 pricing fixture
OpenAPI Drift Guard (Hermes 2026-05-13)	v0.8.1 — payload-shape drift detector against an operator-supplied OpenAPI 3.x spec (arXiv:2605.14312); `OpenAPIDriftGuard.evaluate(operation_id, args)` detects `missing_required` / `unknown_field` / `type_mismatch`; three modes (`strict` / `warn` / `shadow`); `vaccinate_openapi(spec)` decorator + `openapi_doc_drift_guard_defaults` factory; caller supplies spec dict, no PyYAML dep
MCP Calc-Server bundle preset	v0.8.1 — composition factory `mcp_calc_server_bundle_defaults_2026_05_15()` wires v0.8.0 `EvalRCEGuard` + v0.7.6 `StdioCommandInjectionGuard` under a single preset_id (CVE-2026-44717 anchor) scoped to calc/calculate/evaluate/sympy_eval/math_eval tool-name patterns; pure config composition, no new detector module
Metis-inspired corpus block-rate regression	v0.8.2 — release-gate primitive `MetisInspiredCorpusBlockRateGuard` runs a deterministic 25-entry exploit-shape corpus (CVE-2026-44717 + 2026-05-05 STDIO injection) through `EvalRCEGuard + StdioCommandInjectionGuard`; one-sided gate fires when block rate drops below baseline − 5%; NOT a reproduction of the Metis paper's POMDP attacker (arXiv:2605.10067 cited as motivation, not as prompt source); `airlock corpus-bench` CLI ships text/json/md reports
Corpus per-category coverage	v0.8.3 — extends the v0.8.2 corpus-bench with HarnessAudit-Bench (arXiv:2605.14271) two-category taxonomy (`resource_access`, `info_transfer`); `CorpusEntry.violation_category` field + `CategoryCount` decision field; `airlock corpus-bench` reports per-category coverage in text/json/md; NOT a reproduction of HarnessAudit-Bench (artifacts not yet public — taxonomy adopted as schema, scoring is not)
Stainless SDK provenance classifier	v0.8.3 — pure-function `classify_sdk_lineage(user_agent, response_body_head)` building block flags MCP servers generated by the deprecated Stainless SDK toolchain (Anthropic acquired Stainless 2026-05-13, hosted generator winding down); operator-callable from own audit hooks — NOT an automatic HTTP probe (decorator-in-process architecture, see ROADMAP §1); `stainless_provenance_probe_defaults()` preset is `default_action=tag_only`, visibility not enforcement
Human-oversight decorator	v0.8.4 — `@requires_human_oversight(approver=...)` gates a tool function on an operator-supplied approval callable (Code-as-Harness arXiv:2605.18747 anchor); `GRANT` → call wrapped fn, `DENY` → `OversightDeniedError`, `TIMEOUT` → `OversightTimeoutError`; composes with `@Airlock(...)`; protocol shapes + `InProcessRecordedApprover` testing helper; NOT a bidirectional audit-emitter RPC channel — operator owns the transport (Slack/PagerDuty/CLI), agent-airlock owns the gate + the protocol
Layer-contract receipt block	v0.8.5 — opt-in `LayerContract` (assume/guarantee) block on signed `airlock attest receipt` payloads (arXiv:2605.18672 anchor); `--contract` derives per-guard `pass_rate` from the verdicts list, `--assumes id1,id2` declares upstream-layer dependencies; receipt schema v1 unchanged (additive field); `pass_rate` is a measured statistic over the sample (not a proof) — every Guarantee carries `sample_size` so verifiers can weight low-N appropriately; NOT backed by a window-counter store (that infrastructure doesn't exist yet — derived from the operator-supplied verdicts list, no new abstraction)
MCP Attested Tool-Server Admission (arXiv:2605.24248)	v0.8.10 — opt-in admission gate for MCP tool servers per Metere (May 2026). Host fetches a JWS-compact clearance from `{server_url}/.well-known/mcp-clearance`, verifies its signature against an operator-pinned trust root (Ed25519 / RSA-PSS / JWKS — never network-fetched on the hot path), and enforces a deny-by-default per-server tool allowlist parsed from the verified clearance. Flavor-gated `ENFORCE` (hard-deny) / `WARN` (log only) modes. Every decision emits a `ReceiptVerdict` on the `guard="mcp_attested_admission"` channel — reuses the existing `airlock attest` DSSE path, does not invent a new log. `mcp_attested_admission_defaults()` factory + `MCPProxyGuard.audit_tool_admission()` integration; signature verification gated behind `pip install agent-airlock[attested]`.
Mobile MCP intent-URL guard (CVE-2026-35394)	v0.8.8 — defensive bundle for the Mobilenexthq Mobile MCP `mobile_open_url` intent-injection RCE class (< 0.0.50). `mobile_mcp_intent_guard_2026_05()` returns a pre-configured `SafeURLValidator(allowed_schemes=["http", "https"])` (blocks `intent:`, `content:`, `file:`, `app:`, `data:`, `javascript:`, `vbscript:`), an `AirlockConfig(unknown_args=UnknownArgsMode.BLOCK)`, and the canonical Mobile MCP tool-name corpus (`mobile_open_url`, `open_url`, `mobile_launch_url`). DIFF-COMPATIBLE with the existing `SafeURL` type — no new validator invented. Also fixes a pre-existing `block_private_ips=True` no-op in `SafeURLValidator` (RFC1918 ranges were not actually blocked because the validator's own `SafeURLValidationError` raise was caught by `except ValueError`).
Capsule ShareLeak / PipeLeak (CVE-2026-21520)	v0.8.14 — defensive bundle for the Capsule Security-disclosed indirect-prompt-injection class hitting Microsoft Copilot Studio (ShareLeak, CVE-2026-21520, CVSS 7.5 HIGH, CWE-77, patched 2026-01-15) and Salesforce Agentforce (PipeLeak, parallel pattern). Both vectors share the same architecture: untrusted form input (SharePoint form / Web-to-Lead form) is concatenated into the agent's context with no boundary, while the agent simultaneously holds outbound exfil tools (Outlook send / Salesforce email-case). `capsule_indirect_injection_cve_2026_21520_defaults()` composes existing primitives — `default_deny=True` + canonical exfil-sink `denied_tools` (`send_email`, `outlook_`, `create_case`, `share_`, `export_`, `post_to_`, `webhook_*`, ...) + `reauth_on_untrusted_reinvocation=True` (v0.8.6 debate-amplification guard at `threshold=1`) + `AirlockConfig(unknown_args=UnknownArgsMode.BLOCK)`. Opt-in only — no new validator invented, no default-priority-chain entry. Pairs with `airlock-explain --unused-scopes` (v0.8.13) so operators populate the read-side allow-list from a real trace before deploying.
Flowise MCP-stdio adapter RCE (CVE-2026-40933)	v0.8.16 — defensive control for the Flowise authenticated-RCE-via-MCP-stdio-adapter class (CVSS 9.9, fixed upstream in Flowise 3.1.0). Flowise ≤ 3.0.x serialises a user-defined CustomMCP `command`+`args` straight into a child-process spawn with no sandbox or argv sanitisation — importing a crafted chatflow is a one-click path to OS-level RCE. `flowise_mcp_stdio_guard_2026_defaults()` is a per-tool-class projection of the v0.7.6 `StdioCommandInjectionGuard` (no new detector invented), scoped to the Flowise CustomMCP stdio surface. Fail-closed on shell metachars (`;`, `&&`, `\|\|`, `\|`, newline, backtick, `$(`) in the `command`/`args` path + opt-in path-traversal outside a `cwd_allowlist`; `check(args)` raises `FlowiseMcpStdioInjectionError`. OWASP MCP05 Command Injection. Wired into `ox_mcp_supply_chain_2026_04_defaults()` — corrects a prior mis-attribution where CVE-2026-40933 was recorded as a "Semantic Kernel auth-header leak".
MCP description-vs-manifest guard (`mcp_description_manifest_guard`)	v0.8.18 — runtime consistency gate that asserts a tool's model-facing description (declared input schema + advertised capability/security boundary) matches its registered manifest before the tool is admitted, failing closed per the deny-by-default posture. Anchored on the DCIChecker study (arXiv:2606.04769), which measured Description-Code Inconsistency at 9.93% of 19,200 tool pairs across 2,214 MCP servers. `DescriptionManifestGuard.evaluate(description)` detects `described_arg_not_in_manifest` (description claims a ghost argument), `undisclosed_side_effect` (manifest has a side effect the description hides — the tool-poisoning direction), and `overclaimed_capability` (description advertises a capability absent from the manifest); three modes (`strict` / `warn` / `shadow`); `vaccinate_description_manifest(manifests)` decorator + `mcp_description_manifest_guard_defaults()` factory. Composes above ghost-arg stripping + Pydantic type-validation (which govern the call payload) — it does not replace them. OWASP MCP03 Tool Poisoning. Pydantic-only core, no new runtime deps.
LeRobot pickle-deserialization RCE (CVE-2026-25874)	v0.8.19 — deny-by-default posture for the HuggingFace LeRobot unauthenticated-RCE class (CVSS 9.3). LeRobot's async-inference PolicyServer / robot-client `pickle.loads()` payloads received over an unauthenticated, non-TLS gRPC channel (`SendObservations` / `SendPolicyInstructions` / `GetActions`) — an unauthenticated, network-reachable attacker reaches arbitrary OS command execution. Ships a reusable `UnsafeDeserializationGuard` (in `safe_types`, next to `SafePath`/`SafeURL`) that fails closed on pickle magic bytes (`0x80` PROTO), base64-encoded pickle, and `pickle`/`marshal`/`shelve`/`dill`/`jsonpickle` marker tokens in string args — plus an airgap pairing that refuses serialized-object (`bytes`) args unless the call declares an authenticated and TLS transport. Wired into `SecurityPolicy.deserialization_guard` and run at the `@Airlock` seam (Step 2.7) before the tool body; the block carries a `fix_hint` naming CVE-2026-25874. `lerobot_cve_2026_25874_defaults()` is the per-CVE projection (deny-by-name globs for `deserialize`/`pickle.loads`/`torch_load`/the gRPC methods + the wired content guard). Composes above ghost-arg stripping + Pydantic type-validation. Pydantic-only core, no new runtime deps.
MCP server-URL env-interpolation secret leak (CVE-2026-32625)	v0.8.20 — deny-by-default guard for the LibreChat MCP-server-URL credential-disclosure class (CVSS 9.6, CWE-200, OWASP MCP01). A user-supplied MCP server connection template (URL / header / arg) carrying an env-interpolation token (`${VAR}`, bare `$VAR`, or `%VAR%`) is expanded server-side against the host `process.env` and leaks a secret (`${JWT_SECRET}` / `${CREDS_KEY}` / `${MONGO_URI}`) into the outbound request. `MCPServerEnvInterpolationGuard.evaluate(config)` (in `mcp_spec/env_interpolation_guard.py`) scans the URL/headers/args recursively and refuses any interpolation token unless its variable is on an operator-declared `allowed_vars` allowlist of explicitly non-secret vars (empty default = deny all). It never reads `os.environ` or expands anything — token-match only, so it cannot itself leak. `mcp_server_env_interpolation_guard_defaults()` factory + `check(config)` raising `MCPServerEnvInterpolationError`; escaped `\$`/`$$` are not flagged. Pydantic-only core, no new runtime deps.
Codegen triple-quote / delimiter break-out RCE (CVE-2026-11393)	v0.8.21 — deny-by-default guard for the AWS AgentCore CLI code-injection class (CVSS 9, CWE-94, OWASP ASI05). The CLI splices a model-/user-controlled `collaborationInstruction` into generated Python without neutralising triple-quote characters, so a crafted `"""` closes the generated string literal and injects statements that execute on agent import — RCE on the AgentCore Runtime + the importer's machine. `CodegenDelimiterInjectionGuard.evaluate(args)` (in `mcp_spec/codegen_delimiter_guard.py`) recursively scans args bound for a codegen / template / `exec`/`eval` sink and fails closed on triple-quote tokens (`"""` / `'''`), quote break-out tokens (`");` / `')` / `" +` / `']`), and raw newlines — unless the field is on an operator-declared `allowed_literal_fields` allowlist of safe literal contexts. It never generates or executes code — token-match only. `codegen_delimiter_injection_guard_defaults()` factory + `check(args)` raising `CodegenDelimiterInjectionError`; composes one layer above the v0.8.0 `EvalRCEGuard` (which gates the sink itself). Pydantic-only core, no new runtime deps.
MCP-bridge subprocess command/args/env RCE (CVE-2026-42271, CISA KEV 2026-06-08, actively exploited) — covered + regression-tested	v0.8.22 — deny-by-default guard for the LiteLLM MCP-preview-endpoint command-injection class (CVSS 8.8, CWE-77 + CWE-78, OWASP ASI05; on the CISA KEV catalog, added 2026-06-08, actively exploited). LiteLLM's `POST /mcp-rest/test/connection` + `/mcp-rest/test/tools/list` accepted the stdio-transport `command` / `args` / `env` fields in the request body and spawned the supplied command as a subprocess with the privileges of the proxy process — any low-privilege API key reached host command execution (unauthenticated RCE when chained with the Starlette Host-header bypass CVE-2026-48710). `McpSubprocessArgInjectionGuard.evaluate(config)` (in `mcp_spec/subprocess_arg_guard.py`) treats spawn-shaped MCP-bridge args (`command`/`cmd`/`args`/`argv`/`env`) as untrusted and refuses them unless the resolved program is on an operator-declared `allowed_commands` allowlist of safe static commands (empty default = deny all); an `env` carrying a code-loading var (`LD_PRELOAD`/`PATH`/`PYTHONPATH`/…) is refused regardless, and a config with no spawn-shaped fields passes. Never spawns anything — config inspection only. `mcp_subprocess_arg_injection_guard_defaults()` factory + `check(config)` raising `McpSubprocessArgInjectionError`; composes one layer above the v0.7.6 `StdioCommandInjectionGuard`. Regression-proofed (v0.8.38) against the actual `/mcp-rest/test/connection` + `/mcp-rest/test/tools/list` request bodies in `tests/cves/test_cve_2026_42271_kev_regression.py`. Pydantic-only core, no new runtime deps.
Cline cross-origin WebSocket hijack (CVE-2026-44211)	v0.8.27 — deny-by-default guard for the Cline Kanban cross-origin WebSocket-hijack class (npm `kanban` < 2.13.0, CVSS 9.7, CWE-1385 Missing Origin Validation in WebSockets + CWE-306 Missing Authentication, OWASP ASI05). Cline runs a control WebSocket server on `127.0.0.1:3484` that accepts every upgrade without validating the `Origin` header; because browsers do not apply same-origin/CORS to `ws://`, any website the developer visits can drive the agent — leak workspace data, inject prompts into the agent terminal (RCE), or kill tasks. Binding to loopback is not a mitigation. `WebSocketOriginGuard` (in `mcp_spec/ws_origin_guard.py`) has two surfaces: `audit_endpoint(host=…, origin_allowlist_enforced=…)` flags a control endpoint that enforces no `Origin` allow-list (the misconfiguration), and `check_upgrade(origin)` / `enforce_upgrade(origin)` / `wrap_handler(handler)` form a runtime gate that rejects a WebSocket upgrade whose `Origin` is missing or outside an explicit allow-list (empty allow-list = deny all). The guard never opens a socket — descriptor / single-`Origin` inspection only. `cline_cve_2026_44211_defaults(allowed_origins=[…])` factory + `check(origin)` raising `WebSocketOriginHijackError`. Pydantic-only core, no new runtime deps.
SSRF egress guard — alternate-encoding loopback / rebinding (CVE-2026-47390)	v0.8.29 — deny-by-default egress guard for the SSRF-protection-bypass class (CWE-918, OWASP ASI02). An egress filter that checks the literal hostname string instead of the resolved IP is bypassed by encoding loopback / link-local / cloud-metadata in a form `ipaddress` rejects but the HTTP client connects to: `127.1`, decimal `2130706433`, octal `0177.0.0.1`, hex `0x7f000001`, `::ffff:127.0.0.1`, or a public hostname whose DNS record points at `169.254.169.254` (rebinding). `SSRFEgressGuard` (in `ssrf_egress_guard.py`) reduces every target to its canonical IP(s) — decoding the alternate encodings via `socket.inet_aton` and resolving hostnames at check time (so a rebind to loopback is caught at connect time, not just parse time) — and fails closed on any loopback / link-local / metadata / unspecified address or an RFC1918 range not on `allow_internal_hosts`, with a 3-line `explain` audit trace (rule / resolved IP / encoding) on every denial. Composes with the v0.5.5 `is_blocked_ipv6_range` set (IPv4-mapped / NAT64 / 6to4 / ULA). `ssrf_egress_guard_defaults(allow_internal_hosts=…)` factory + `check(url)` raising `SSRFEgressBlocked`. Pydantic-only core, no new runtime deps.
MCP Origin/Host DNS-rebinding guard (CVE-2026-11624)	v0.8.30 — deny-by-default Origin/Host validation for MCP HTTP/SSE/streamable transports (CWE-346 Origin Validation Error, CVSS 9.4, OWASP-MCP MCP07). Google MCP Toolbox for Databases < 0.25.0 served a local HTTP transport that did not validate the `Origin` or `Host` header, so a browser the developer visits can DNS-rebind to `127.0.0.1` and script MCP tool calls at the local server (file reads, command execution, DB access). Fixed upstream in 0.25.0 with a new `--allowed-hosts` flag alongside `--allowed-origins`, warning on the `` wildcard. `McpOriginHostGuard`* (in `mcp_spec/mcp_origin_host_guard.py`) validates the inbound `Host` (always) and `Origin` (when present) against explicit `allowed_origins` / `allowed_hosts` allow-lists; with none configured it falls back to loopback-only and records a startup warning, and a `*` wildcard allows all but also warns — mirroring the upstream fix (stdio transports have no Origin and are out of scope). `check_headers(headers)` / `validate(headers)` + a `startup_warnings` list; `mcp_origin_host_guard_defaults(allowed_origins=…, allowed_hosts=…)` factory + `check(headers)` raising `McpOriginHostRebindingError`. Pydantic-only core, no new runtime deps.
OpenClaw exec-denylist bypass at MCP loopback spawn (CVE-2026-53820)	v0.8.31 — deny-by-default re-check of the resolved effective command at the MCP loopback session-spawn seam (CWE-693 Protection Mechanism Failure, CVSS 6.9, OWASP-MCP MCP05). OpenClaw < 2026.5.12 let an authenticated caller reach a denylisted command via the bundled MCP loopback spawn path because the surface command checked against the exec restriction differs from the effective command actually spawned — a name that passes the surface check resolves, via an alias / wrapper binary (`env`, `sudo`, `timeout`, `nice`, …) / shell, to a denied executable. This is a protection-mechanism-bypass at the spawn boundary, not a config-time check. `LoopbackSessionSpawnGuard` (in `mcp_spec/loopback_spawn_guard.py`) unwraps aliases + wrapper binaries to the effective program immediately before spawn and re-checks it: a resolved shell / denylisted exec is refused, and any resolved program not in `allowed_commands` is refused (deny-by-default). Shells are terminal (not unwrapped to their `-c` script), and the full unwrap is reported on `decision.resolution_chain` so the "effective ≠ surface" bypass is auditable. `check_spawn(command)` / `enforce(command)`; `openclaw_cve_2026_53820_defaults(allowed_commands=…, aliases=…)` factory + `check(command)` raising `LoopbackSessionSpawnError`. Pydantic-only core, no new runtime deps.
SafeURL IPv6-transition cloud-metadata SSRF (CVE-2026-48782)	v0.8.32 — extends the existing `SafeURL` metadata blocklist to canonicalize the host to packed `ipaddress` form before the comparison, closing the IPv6-transition / alternate-encoding bypass class (CWE-918, OWASP ASI02). pydantic-ai's metadata blocklist compared the hostname string, so encoding `169.254.169.254` as IPv4-mapped IPv6 (`::ffff:169.254.169.254`), IPv4-compatible (`::169.254.169.254`), 6to4 (`2002:a9fe:a9fe::`), Teredo, or a decimal/octal/hex integer (`2852039166` / `0xa9fea9fe`) slipped past it while the HTTP client still reached IMDS — exposing cloud IAM credentials. This closes the IPv6-transition gap left by the upstream fix of CVE-2026-46678. The new `metadata_ip_candidates(host)` helper collapses every encoding (incl. the embedded IPv4 of each transition form) to its packed address and compares against a canonical metadata-IP set (AWS/GCP/Azure `169.254.169.254`/`.253`, AWS IPv6 `fd00:ec2::254`, Alibaba `100.100.100.200`); legitimate public IPv6 hosts are unaffected. Additive only — the public `SafeURL` signature is unchanged and the check is on by default under `block_metadata_urls=True`, so every `SafeURLValidator` (and the `SafeURL` Pydantic type) inherits it. Pydantic-only core, no new runtime deps.
SafeURL DNS-rebinding SSRF (GHSA-mrvx-jmjw-vggc)	v0.8.34 — makes the `SafeURL` egress guard DNS-rebinding-safe (OWASP-MCP MCP06 SSRF). The SearXNG MCP Server (High, disclosed 2026-06-19) validated only the syntactic hostname string against a private-IP blocklist without resolving DNS — so an attacker domain resolving to a private/loopback IP (wildcard DNS like `nip.io`, or a custom record) passed the string check and the server then read arbitrary internal services. `SafeURLValidator` gains a `dns_rebinding_guard` flag (additive, default off for back-compat; ON in the `dns_rebinding_safe_url_defaults()` OWASP-MCP preset): after the syntactic allowlist it resolves the host at call time and re-validates every resolved A/AAAA address against the private / loopback / link-local / metadata blocklist (`169.254.169.254`, `::1`, `127/8`, `10/8`, `172.16/12`, `192.168/16`, `fc00::/7`); unresolvable hosts fail closed. `guard.resolve_and_pin(url)` returns the validated URL + the pinned IP to connect to (resolve once, connect to that IP) so a rebind cannot flip the record between check and connect (TOCTOU). Composes with the v0.8.32 metadata-canonicalization (so a resolved `::ffff:169.254.169.254` is caught too). Pydantic-only core, no new runtime deps.
Tool-output trust-boundary guard — Agentjacking class	v0.8.33 — opt-in, per-tool defense on the output side of the tool seam (OWASP-MCP MCP08 indirect / tool-result injection). Every other Airlock layer guards the input (arguments into a tool); this guards the output — a tool/MCP result about to flow back into the agent's context. It flags content shaped like injected instructions (override directives like "ignore previous instructions", imperative command directives, fenced shell commands — the Agentjacking "resolution steps" shape — and tool-call-shaped JSON smuggled inside diagnostic/error fields), wraps the result in a clearly-delimited untrusted-data envelope so the model treats it as data not instruction, and emits a structured event (bridged to OTel by the audit exporter). Default STRICT = flag + envelope; never executes, never silently drops. Reference cases: Agentjacking (Tenet Security, disclosed 2026-06-12; Sentry mitigation 2026-06-18 — a vulnerability class, no CVE ID) and CVE-2026-42824 "SearchLeak" (M365 Copilot Enterprise, Varonis, disclosed 2026-06-15). `ToolOutputTrustGuard` (in `tool_output_trust_guard.py`) `inspect(output)` / `process(output)` / `envelope_output(output)`; opt in per-tool via `untrusted_tool_output_defaults()` / the `UNTRUSTED_TOOL_OUTPUT` preset constant. Pydantic-only, zero-dep core.
EU AI Act Art. 12-style tamper-evident decision log	v0.8.40 — offline, restart-surviving, append-only hash-chained record-keeping for the tool-call decision layer. Every `validate → policy → execute → sanitize` decision appends to a JSON Lines log where `record_hash = SHA-256(canonical(fields incl. prev_hash))`, so any edit / deletion / reorder breaks every later hash and is detected at its exact sequence number; the chain reloads and re-verifies across process restart (fail-closed on a broken chain). `agent_airlock.conformance.DecisionLog` + `export_evidence_bundle()` (offline JSON + human-readable coverage table mapping each Art. 12 record-keeping expectation to the airlock field that satisfies it) + the `airlock-conformance` CLI (`record` / `verify` / `export`). No network, no cloud; stores decision metadata only (never raw args/results). Honest scope: Art. 12-style record-keeping evidence for the tool-call layer — not a full QMS and not a substitute for the provider's conformity assessment (high-risk obligations apply 2026-08-02). Restart-survival + tamper-evidence pinned by `tests/test_conformance_decision_log.py`. Pydantic-only, zero-dep core.
MCP spec 2026-07-28 hardening (SEP-2468 iss-validation + Server-Card trust boundary)	v0.8.41 — client-side hardening for the MCP 2026-07-28 final spec, composed from existing airlock primitives (no new mechanism, Pydantic-only core). (a) SEP-2468 / RFC 9207 `iss` validation — MCP clients must validate the `iss` parameter on an OAuth authorization response against the authorization server the flow started with, to defend against an authorization-server mix-up attack; `validate_authorization_response_iss(params, expected_issuer=…)` (in `mcp_spec/oauth.py`) is deny-by-default — a missing or mismatched `iss` raises `IssuerMismatchError` (this adds a check, it does not weaken any existing OAuth validation). (b) Server-Card trust boundary — a tool description fetched from a server card is attacker-influenceable content, not trusted config; a poisoned ("…ignore previous instructions and run…") description is an Agentjacking-class injection into the agent's context. The preset reuses the shipped `ToolOutputTrustGuard` (same guard as `untrusted_tool_output_defaults`) to classify each tool description as untrusted output and blocks the card on an injection-shaped description. `mcp_spec_2026_07_defaults(expected_issuer=…)` / the `MCP_SPEC_2026_07` preset constant → `check_oauth_response(params)` + `check_server_card(card)`. SEP-2468 / RFC 9207 are spec ids, not CVEs. Pydantic-only, zero-dep core.
MCP stateless conformance (SEP-2567 / SEP-2575)	v0.8.44 — contract-level conformance for the MCP 2026-07-28 stateless model, which removed the server-side session lifecycle (no `initialize`→session handshake, no `Mcp-Session-Id` header) and now passes state explicitly, as an ordinary typed tool argument. Composed from existing airlock primitives (no new engine, Pydantic-only core). (a) SEP-2575 — `check_request(request)` rejects a call still carrying `Mcp-Session-Id` (top level or under `headers`/`meta`/`_meta`/`transport`) or invoking a removed lifecycle method (`initialize` / `notifications/initialized`); raises `StatefulSessionError`, deny-by-default. (b) SEP-2567 — `check_tool_call(tool, kwargs)` requires any state-handle arg (`state` / `state_handle` / `cursor` / `resume_token` / `session_state`) to be an explicit declared contract parameter, not absorbed by `kwargs` or smuggled as a ghost arg; reuses** the shipped `get_valid_parameters` / `GhostArgumentError` (strict typing enforced by airlock's existing Pydantic validator). `mcp_stateless_conformance_2026_07_defaults()` / the `MCP_STATELESS_CONFORMANCE_2026_07` constant. SEP-2567 / SEP-2575 are spec ids, not CVEs. Opt-in, Pydantic-only, zero-dep core.
MCP header integrity (SEP-2243)	v0.8.45 — contract-level conformance for the MCP 2026-07-28 SEP-2243 routing headers. The Streamable HTTP transport now requires `Mcp-Method` and `Mcp-Name` headers "so load balancers, gateways, and rate-limiters can route on the operation without inspecting the body", and mandates that "Servers reject requests where the headers and body disagree". Because the edge routes / rate-limits / authorizes on those headers while the server executes the body, a header/body mismatch is a confused-deputy vector (one operation routed past the gateway, another executed). `check_request(request)` asserts both routing headers are present and that `Mcp-Method` / `Mcp-Name` match the body's method (`request["method"]`) and operation name (`request["params"]["name"]`); fail-closed (deny) on any missing header or disagreement, raising `HeaderBodyMismatchError` — which carries a structured `audit_event` mapping (`event` / `reason` / `header_method` / `body_method` / …) for the audit log. `mcp_spec_2026_07_header_integrity_defaults()` / the `MCP_SPEC_2026_07_HEADER_INTEGRITY` constant. Composed from existing primitives (stdlib mapping traversal, no new engine). SEP-2243 is a spec id, not a CVE. Opt-in, Pydantic-only, zero-dep core.
`airlock scan-tools` — static contract / type-checker	v0.8.42 — a static type-checker for MCP tool declarations, distinct from the runtime `@Airlock` seam and from content-signature tool-poisoning scanners (MCP-Scan, eSentire MCP-Scanner). Given a set of tool defs (`.json` / dir / `mcp.json`-style config), it checks each tool's declared contract against a least-privilege `SecurityPolicy` and grades it pass / warn / fail: over-broad argument surfaces (`additionalProperties` not `false` on a destructive tool → ghost-arg vector), missing type constraints on sensitive args, capability caps that exceed the policy (via the shipped `CapabilityPolicy`), and server-card tool descriptions that widen the trust boundary (reuses the `mcp_spec_2026_07` Server-Card / SEP-2468 guard). `airlock-scan-tools ./tools/ --policy strict` (exit `0`/`1`/`2` for CI); `--output json`. Coverage measured against MCPTox-derived fixtures (`python -m benchmarks.scantools_mcptox`): 69.2% static contract-checking coverage · 100% precision, reported as-is — explicitly not MCPTox's model-in-the-loop Attack Success Rate. `agent_airlock.scan`, Pydantic-only, zero-dep core.
Examples	13 framework integrations (11 adapter-shipped + 2 example-only) with copy-paste code
Security Guide	Production deployment checklist
API Reference	Every function, every parameter
Egress Bench	CVE fixture walker — every payload previously blocked stays blocked
OX MCP Supply-Chain preset	Umbrella for the 2026-04-20 OX dossier (10 CVEs)
Elicitation guard (`mcp_elicitation_guard_2026_04`)	v0.6.0 — runtime mitigation for the MCP `tool/elicitation` round-trip (spec PR #1487, draft 2026-04-r1); blocks credential-request and policy-override classes
Config-path guard (CVE-2026-31402)	v0.6.0 — Claude Desktop MCP-server-registration path-traversal mitigation (CVSS 8.8)
Gemini 3 Agent Mode adapter	v0.6.0 — `function_call` carrier normalisation + `thought_signature` redaction; pinned `SUPPORTED_VERSIONS` set
OAuth `state` entropy guard	v0.6.0 — base64/hex/JSON decode + prompt-injection scan on the OAuth `state` parameter (BlackHat Asia 2026 vector)
`airlock console`	v0.6.0 — three-pane Textual TUI with live verdict stream + replay-on-edit; gated behind `airlock[console]` extra
`airlock attest receipt`	v0.6.0 — Sigstore-compatible signed agent-run receipts; `emit` + `verify` subcommands
`policy_bundle.lock`	v0.6.0 — hash-pinned preset bundles with `Cargo.lock` semantics; `airlock pack lock` + `airlock replay --bundle-lock`
`airlock studio`	v0.6.0 — local stdlib HTTP rehearsal sandbox; paste-a-transcript verdicts + diff between runs
smolagents wrapper	v0.6.0 — `wrap_agent(agent, policy_bundle)` for HuggingFace smolagents 1.18+ (4th first-class framework)
STDIO meta-guard (`mcp_stdio_meta_cve_2026_04`)	v0.5.9 — bundles every airlock STDIO defence into one chain; recommended default for any MCP server registered after 2026-04-26
LangGraph 1.0.11 ToolNode compat shim	v0.5.9 — silent unwrap survives the prebuilt 1.0.11 list-vs-dict shape break
GPT-5.5 ("Spud") agent defaults + tool-shape adapter	v0.5.9 — caps fan-out at 8 / context at 900k / per-call egress at 512 KB
Capability caps (`agent_capability_default_caps`)	v0.5.9 — programmatic caps for SIGN_CONTRACT / DELEGATE_TO_AGENT / INVOKE_TOOL / WRITE_FILE / NETWORK_EGRESS
OWASP Agentic 2026-Q1 coverage matrix	v0.5.9 — 10/10 mapping risk_id → guard + preset + test, CI gate fails on stale entries
Short-form-video corpus (`wild-2026-04/short_form_video`)	v0.5.9 — 5 transcript / on-screen / RTL PoCs; `airlock replay --namespace short_form_video`
`airlock graph serve`	v0.5.9 — local web UI of the live agent → tool → MCP-server topology with verdict overlay
`airlock policy compile / explain`	v0.5.9 — natural-language policy authoring with hash-pinned prompt + deterministic cache
`airlock kill-switch`	v0.5.9 — HMAC-signed cluster-wide freeze with 2-of-3 quorum reset
Comment-and-Control PR-metadata guard	v0.5.8 — neutralises CVSS 9.4 cross-vendor PR-title prompt injection
`airlock pack`	v0.5.8 — signed policy bundles; `airlock pack install claude-code-ci@2026.04`
`airlock baseline`	v0.5.8 — per-agent 7-day rolling profile + drift score
`airlock attest`	v0.5.8 — DSSE provenance per verdict
Cloudflare Mesh compat	v0.5.8 — runs alongside Mesh; de-duplicates overlapping policies
Manifest-only STDIO mode	v0.5.7 — signed-manifest registry; argv never originates from runtime input
STDIO-taint CI gate	v0.5.7 — AST taint analyzer; flags remote→Popen flows at PR time
Declarative preset YAML	v0.5.7 — composite presets via stdlib-only YAML parser
CVE-2026-30615 Windsurf zero-click	v0.5.7 — diff-on-demand mcp.json auto-load guard; v0.8.23 adds `mcp_config_pin` — a spawn-time `{name, command, args, env-keys}` fingerprint pin (`McpConfigPinSet.check()`) that fails closed (raises, never warns) on an injected (unpinned) or mutated STDIO server even when the injection never touched a watched config file; emits on the structlog + JSON-Lines audit channels
CVE-2026-6980 GitPilot-MCP	v0.5.7 — repo_path injection (vendor unresponsive)
DockerBackend	v0.5.1 hardening + known gaps

Regulatory engagement

Public comment draft — NIST AI RMF v2.0 Agentic-AI Security (window: 2026-04-18 → mid-June)

👤 About

Built by Sattyam Jain — AI infrastructure engineer.

This started as an internal tool after watching an agent hallucinate its way through a production database. Now it's yours.

🤝 Contributing

We review every PR within 48 hours.

git clone https://github.com/sattyamjjain/agent-airlock
cd agent-airlock
pip install -e ".[dev]"
pytest tests/ -v

Bug? Open an issue
Feature idea? Start a discussion
Want to contribute? See open issues

💖 Support

If Agent-Airlock saved your production database:

⭐ Star this repo — Helps others discover it
🐛 Report bugs — Open an issue
📣 Spread the word — Tweet, blog, share

⭐ Star History

Built with 🛡️ by Sattyam Jain

_{Making AI agents safe, one decorator at a time.}

_{Sources: This README follows best practices from awesome-readme, Best-README-Template, and the GitHub Blog.}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.8.47

Jul 14, 2026

0.8.46

Jul 13, 2026

0.8.45

Jul 8, 2026

0.8.44

Jul 7, 2026

0.8.43

Jul 6, 2026

0.8.42

Jul 5, 2026

0.8.41

Jul 4, 2026

0.8.40

Jul 2, 2026

0.8.39

Jun 29, 2026

0.8.38

Jun 28, 2026

0.8.37

Jun 27, 2026

0.8.36

Jun 23, 2026

0.8.35

Jun 22, 2026

0.8.34

Jun 21, 2026

0.8.33

Jun 21, 2026

0.8.32

Jun 21, 2026

0.8.31

Jun 21, 2026

0.8.30

Jun 21, 2026

0.8.29

Jun 21, 2026

0.8.28

Jun 21, 2026

0.8.27

Jun 21, 2026

0.8.26

Jun 13, 2026

0.8.25

Jun 13, 2026

0.8.24

Jun 13, 2026

0.8.23

Jun 11, 2026

0.8.22

Jun 11, 2026

0.8.21

Jun 9, 2026

0.8.20

Jun 8, 2026

0.8.19

Jun 7, 2026

0.8.18

Jun 6, 2026

0.8.17

Jun 6, 2026

0.8.16

Jun 6, 2026

0.8.15

Jun 6, 2026

0.8.14

Jun 6, 2026

0.8.13

Jun 6, 2026

0.8.12

Jun 6, 2026

0.8.11

Jun 6, 2026

0.8.10

Jun 6, 2026

0.8.9

May 26, 2026

0.8.8

May 25, 2026

0.8.7

May 24, 2026

0.8.6

Jun 6, 2026

0.8.5

May 21, 2026

0.8.4

May 20, 2026

0.8.3

May 19, 2026

0.8.2

May 18, 2026

0.8.1

May 17, 2026

0.8.0

May 17, 2026

0.7.6

May 16, 2026

0.7.5

May 10, 2026

0.7.4

May 9, 2026

0.7.3

May 6, 2026

0.7.2

May 5, 2026

0.7.1

May 4, 2026

0.7.0

May 3, 2026

0.6.1

May 3, 2026

0.6.0

Apr 29, 2026

0.5.9

Apr 29, 2026

0.5.8

Apr 27, 2026

0.5.7.1

Apr 26, 2026

0.5.7

Apr 26, 2026

0.5.6.1

Apr 25, 2026

0.5.6

Apr 25, 2026

0.5.5

Apr 24, 2026

0.5.4

Apr 24, 2026

0.5.3

Apr 21, 2026

0.5.2

Apr 20, 2026

0.5.1

Apr 19, 2026

0.5.0

Apr 18, 2026

0.4.1

Mar 15, 2026

0.4.0

Feb 1, 2026

0.2.0

Jan 31, 2026

0.1.5

Jan 31, 2026

0.1.3

Jan 31, 2026

0.1.2

Jan 31, 2026

0.1.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_airlock-0.8.47.tar.gz (598.3 kB view details)

Uploaded Jul 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_airlock-0.8.47-py3-none-any.whl (677.8 kB view details)

Uploaded Jul 14, 2026 Python 3

File details

Details for the file agent_airlock-0.8.47.tar.gz.

File metadata

Download URL: agent_airlock-0.8.47.tar.gz
Upload date: Jul 14, 2026
Size: 598.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_airlock-0.8.47.tar.gz
Algorithm	Hash digest
SHA256	`f232a5af150316f17f2fb5c78e33a8027711f3b1675cd57724d47247e56b168e`
MD5	`5aa98b75fca9eb202c5168fd4a222ab3`
BLAKE2b-256	`772e3fd8a905d62ff68e7943b31eeae7afa5e71e57ba52428f2cdeb828d53472`

See more details on using hashes here.

File details

Details for the file agent_airlock-0.8.47-py3-none-any.whl.

File metadata

Download URL: agent_airlock-0.8.47-py3-none-any.whl
Upload date: Jul 14, 2026
Size: 677.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_airlock-0.8.47-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9761b65803758dcccde38487274666fe02cd1e3655c3c9045e7f1090d13f8772`
MD5	`fcbafacf7f403757cceb579708ba9acd`
BLAKE2b-256	`b825abef132886f2ce20b4cace46467d8f2cca440cf9aade53f94d9c2b5989b5`

See more details on using hashes here.

agent-airlock 0.8.47

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

A type-checker and contract layer for AI agent tool calls — deny-by-default, in-process, zero-dep

🧩 Where this fits (vs native MCP gateways)

🎯 30-Second Quickstart

🧠 The Problem No One Talks About

The Hype

The Reality

✨ What You Get

📋 Table of Contents

🔥 Core Features

🔒 E2B Sandbox Execution

ModalBackend — Modal-hosted sandbox (v0.8.11+, issue #30)

📜 Security Policies

CAMOUFLAGE_RESISTANT — detector-independent injection defense (v0.8.6)

🪪 MCP server attestation (v0.8.10)

🧭 Behavioral sequence guard (v0.8.12)

🛑 Action-time contradiction gate (v0.8.15)

📊 Adversarial-negotiation regression harness (v0.8.17)

🔎 Privilege right-sizing — airlock-explain --unused-scopes (v0.8.13)

🧾 Static contract / type-checker — airlock scan-tools (v0.8.42)

🩹 Skill-resistant trace redaction + watermark (v0.8.24)

✅ Fail-closed terminal-claim guard — no_false_success (v0.8.25)

💰 Cost Control

Per-model-tier budgets (v0.8.7)

🔐 PII & Secret Masking

Opt-in regional PII (pii_locales)

🌐 Network Airgap (V0.3.0)

💉 Framework Vaccination (V0.3.0)

⚡ Circuit Breaker (V0.4.0)

📈 OpenTelemetry Observability (V0.4.0)

🔌 Framework Compatibility

LangChain / LangGraph

OpenAI Agents SDK

PydanticAI

CrewAI

LlamaIndex

AutoGen

smolagents

Anthropic (Direct API)

Adapter-shipped vs example-only (honest split)

Complete Examples

⚡ FastMCP Integration

🏆 Why Not Enterprise Vendors?

📦 Installation

🛡️ OWASP Compliance

MCP-specific mapping

🏢 Used By

📊 Performance

📖 Documentation

Regulatory engagement

👤 About

🤝 Contributing

💖 Support

⭐ Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

🔎 Privilege right-sizing — `airlock-explain --unused-scopes` (v0.8.13)

🧾 Static contract / type-checker — `airlock scan-tools` (v0.8.42)

✅ Fail-closed terminal-claim guard — `no_false_success` (v0.8.25)

Opt-in regional PII (`pii_locales`)