agent-airlock

A type-checker for AI tool calls — strict argument validation, ghost-argument stripping, and self-healing retries for MCP servers and agent frameworks.

These details have not been verified by PyPI

Project links

Project description

A type-checker for AI tool calls

Strict validation, ghost-argument stripping, and self-healing retries — one decorator, any agent or MCP server.

Test suite: 2,510 tests · Coverage: 83.42% · v0.8.5

Get Started in 30 Seconds · Why Airlock? · All Frameworks · Benchmark · Docs

┌────────────────────────────────────────────────────────────────┐
│  🤖 AI Agent: "Let me help clean up disk space..."            │
│                           ↓                                    │
│               rm -rf / --no-preserve-root                      │
│                           ↓                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  🛡️ AIRLOCK: BLOCKED                                     │  │
│  │                                                          │  │
│  │  Reason: Matches denied pattern 'rm_*'                   │  │
│  │  Policy: STRICT_POLICY                                   │  │
│  │  Fix: Use approved cleanup tools only                    │  │
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────┘

🎯 30-Second Quickstart

pip install agent-airlock

from agent_airlock import Airlock

@Airlock()
def transfer_funds(account: str, amount: int) -> dict:
    return {"status": "transferred", "amount": amount}

# LLM sends amount="500" (string) → BLOCKED with fix_hint
# LLM sends force=True (invented arg) → STRIPPED silently
# LLM sends amount=500 (correct) → EXECUTED safely

That's it. Your function now has ghost argument stripping, strict type validation, and self-healing errors.

🧠 The Problem No One Talks About

The Hype

"MCP has 16,000+ servers on GitHub!" "OpenAI adopted it!" "Linux Foundation hosts it!"

The Reality

LLMs hallucinate tool calls. Every. Single. Day.

Claude invents arguments that don't exist
GPT-4 sends "100" when you need 100
Agents chain 47 calls before one deletes prod data

Enterprise solutions exist: Prompt Security ($50K/year), Pangea (proxy your data), Cisco ("coming soon").

We built the open-source alternative. One decorator. No vendor lock-in. Your data never leaves your infrastructure.

✨ What You Get

Ghost Args _{Strip LLM-invented params}	Strict Types _{No silent coercion}	Self-Healing _{LLM-friendly errors}	E2B Sandbox _{Isolated execution}	RBAC _{Role-based access}	PII Mask _{Auto-redact secrets}
Network Guard _{Block data exfiltration}	Path Validation _{CVE-resistant traversal}	Circuit Breaker _{Fault tolerance}	OpenTelemetry _{Enterprise observability}	Cost Tracking _{Budget limits}	Vaccination _{Auto-secure frameworks}

📋 Table of Contents

Click to expand full navigation

30-Second Quickstart
The Problem
What You Get
Core Features
Framework Compatibility
FastMCP Integration
Comparison
Installation
OWASP Compliance
Performance
Documentation
Contributing
Support

🔥 Core Features

🔒 E2B Sandbox Execution

from agent_airlock import Airlock, STRICT_POLICY

@Airlock(sandbox=True, sandbox_required=True, policy=STRICT_POLICY)
def execute_code(code: str) -> str:
    """Runs in an E2B Firecracker MicroVM. Not on your machine."""
    exec(code)
    return "executed"

Feature	Value
Boot time	~125ms cold, <200ms warm
Isolation	Firecracker MicroVM
Fallback	`sandbox_required=True` blocks local execution

Air-gapped / on-prem? DockerBackend is the supported alternative — cap_drop=["ALL"], no-new-privileges, network_mode="none", timeout enforced, opt-in pytest -m docker integration tests. See docs/sandbox/docker.md.

ModalBackend — Modal-hosted sandbox (v0.8.11+, issue #30)

Already running the rest of your agent on Modal? ModalBackend lets you keep airlocked tool execution on the same substrate instead of mixing E2B and Modal billing / observability.

pip install "agent-airlock[modal]"

from agent_airlock import Airlock, STRICT_POLICY, AirlockConfig
from agent_airlock.sandbox_backend import ModalBackend

backend = ModalBackend(
    app_name="my-airlock-sandbox",
    image_ref="python:3.11-slim",
    cpu=0.5,
    memory_mb=512,
    timeout_s=30,
    # network_policy=None  → block_network=True (fail-closed default)
)

@Airlock(sandbox=True, sandbox_required=True, policy=STRICT_POLICY,
         config=AirlockConfig(sandbox_backend=backend))
def execute_code(code: str) -> str:
    exec(code)
    return "executed"

Isolation model — read before you reach for cap_drop. Modal sandboxes run under gVisor (kernel-syscall filtering), not under Docker-style capability dropping. The Modal Python SDK does not expose cap_drop / cap_add / seccomp / no-new-privileges — there is no equivalent knob to map. If your threat model needs Linux-capability dropping at the container layer, keep using DockerBackend. The network posture is configurable: ModalBackend defaults to block_network=True (deny-by-default), and a supplied NetworkPolicy maps to Modal's block_network flag (allow_egress=False → blocked, True → allowed). Hostname allowlists in NetworkPolicy.allowed_hosts do not forward to Modal (their API is CIDR-only); the backend logs a structlog warning and the operator is expected to re-state hostname constraints at the Airlock policy layer.

ModalBackend is opt-in only — it is NOT added to the get_default_backend() priority chain (E2B → Docker → Local stays the default flow). Existing callers see no behavior change.

📜 Security Policies

Preset	Use case	Key posture
`PERMISSIVE_POLICY`	Dev / sandbox	No restrictions
`STRICT_POLICY`	Prod	Rate-limited, requires agent identity, denies dangerous capabilities
`READ_ONLY_POLICY`	Analytics / RAG	`read_` / `get_` / `list_` / `search_` only
`BUSINESS_HOURS_POLICY`	Compliance windows	`delete_` / `drop_` / `*_production` only 09:00–17:00
`CAMOUFLAGE_RESISTANT_POLICY` (v0.8.6)	Detector-independent defense vs. domain-camouflaged injection	Deny-by-default allowlist, ghost-arg BLOCK, output cap, per-call reauthorization

from agent_airlock import (
    PERMISSIVE_POLICY,
    STRICT_POLICY,
    READ_ONLY_POLICY,
    BUSINESS_HOURS_POLICY,
    CAMOUFLAGE_RESISTANT_POLICY,  # v0.8.6
)

# Or build your own:
from agent_airlock import SecurityPolicy

MY_POLICY = SecurityPolicy(
    allowed_tools=["read_*", "query_*"],
    denied_tools=["delete_*", "drop_*", "rm_*"],
    rate_limits={"*": "1000/hour", "write_*": "100/hour"},
    time_restrictions={"deploy_*": "09:00-17:00"},
)

CAMOUFLAGE_RESISTANT — detector-independent injection defense (v0.8.6)

arXiv:2605.22001 ("Blind Spots in the Guard", Pai, May 2026) shows that production injection detectors — Llama Guard 3 included — drop to IDR = 0.000 on payloads that mimic the target document's domain vocabulary and authority structure. Per the paper, detection rates collapse from 93.8% to 9.7% on Llama 3.1 8B and from 100% to 55.6% on Gemini 2.0 Flash.

CAMOUFLAGE_RESISTANT_POLICY does not rely on payload-content signatures at all. It blocks at four structural seams an attacker has to ride regardless of phrasing:

Deny-by-default tool allowlist. Empty allowed_tools means nothing is callable; deployments opt every tool in by name. A camouflaged directive targeting an unlisted tool is blocked on allowlist grounds without ever invoking a detector.
Ghost-argument BLOCK. A camouflaged directive cannot smuggle undeclared parameters past validation.
Hard output cap + sanitization. Tool output that re-enters the model context is truncated and PII/secret-masked so a camouflaged directive embedded in tool output can't carry into a downstream agent at full length.
Per-call reauthorization (debate-amplification guard). Once a tool's output has flowed back into the model, any reinvocation requires an explicit context.authorize_once(tool) grant from the harness — breaking the multi-agent fan-out path the paper identifies.

from agent_airlock import Airlock, apply_camouflage_resistant

bundle = apply_camouflage_resistant(allowed_tools=["read_file", "search"])

@Airlock(config=bundle.config, policy=bundle.policy)
def read_file(path: str) -> str:
    ...

apply_camouflage_resistant() composes the matching AirlockConfig (unknown-args BLOCK, sanitization on, output cap 4000 chars) with a SecurityPolicy carrying your explicit allowlist. The preset is deliberately incomplete on its own — the config-level knobs and the policy-level knobs span two seams, so the factory returns both as a CamouflageResistantBundle.

Running an MCP server with STDIO transport? Also wire the Ox MCP STDIO sanitizer via stdio_guard_ox_defaults() — it blocks the entire CVE-2026-30616 class (shell metacharacter injection, non-allowlisted binaries, Trojan-Source RTL overrides, and inline-code flags) before subprocess.Popen.

🪪 MCP server attestation (v0.8.10)

arXiv:2605.24248 ("Attested Tool-Server Admission", Metere, May 2026) calls out a gap MCP itself does not close: the protocol standardises message exchange between LLM agents and tool servers but says nothing about trust. Anybody who can answer on the wire can declare themselves a tool server.

mcp_attested_admission_defaults() is a deny-by-default opt-in preset that closes the gap host-side, mirroring the paper's three additive mechanisms:

Offline-signed clearance assertion. Before any tool from an MCP server is dispatched, the host fetches a JWS-compact clearance from {server_url}/.well-known/mcp-clearance (path is configurable) and verifies its signature against an operator-pinned trust root. The trust root is supplied to AttestedAdmissionConfig at process startup — never network-fetched on the hot path.
Deny-by-default per-server tool allowlist. Admitting a server is not the same as trusting its every tool. The verified clearance carries an explicit list of tool names the host will permit; everything else is denied. The sub claim is matched against the server identity the host is about to dispatch to (so a stolen clearance from server A can't admit a tool call to server B).
Flavor-gated enforcement. ENFORCE (default) hard-denies on missing / invalid / expired clearance; WARN logs and admits — the staged turn-up an operator wants when introducing the gate against real traffic.

Every admission decision emits a ReceiptVerdict on the guard="mcp_attested_admission" channel, so the existing airlock attest DSSE pipeline picks decisions up unchanged — this preset does not invent a new log.

from agent_airlock.mcp_proxy_guard import MCPProxyConfig, MCPProxyGuard
from agent_airlock.mcp_spec.attested_admission import TrustRoot
from agent_airlock.policy_presets import mcp_attested_admission_defaults

# Operator pins the trust root at startup. Never fetched at runtime.
with open("/etc/airlock/mcp-clearance-root.pem", "rb") as fh:
    pinned_pem = fh.read()

cfg = mcp_attested_admission_defaults(
    trust_root=TrustRoot(key_id="ops-2026Q2", ed25519_pem=pinned_pem),
    enforcement_mode="ENFORCE",       # deny-by-default
    max_clearance_age_days=30,
)
guard = MCPProxyGuard(MCPProxyConfig(attested_admission=cfg))

decision = guard.audit_tool_admission(
    server_url="https://mcp.example.com",
    server_id="srv-alpha",            # expected `sub` claim
    tool_name="read",
)
if not decision.admitted:
    raise RuntimeError(decision.reason)

Signature verification needs the [attested] extra (pulls in cryptography for offline Ed25519 / RSA-PSS / JWKS verification); the base install stays zero-runtime-dep.

Install with pip install "agent-airlock[attested]". Opt-in only — existing callers that don't set attested_admission get exactly v0.8.9 behavior.

🧭 Behavioral sequence guard (v0.8.12)

Watches the ordered stream of tool calls in a session and flags divergence from a declared expected order — not the model's stated reasoning trace.

arXiv:2605.27901 ("The Fragility of Chain-of-Thought Monitoring", Onyame, Zhou, Thopalli, Kailkhura & Agarwal, May 2026) reports an average 95.9% CoT unfaithfulness across 8B–120B-parameter models — including answer-switching, post-hoc rationalisation, and procedural exploitation of hints. Trusting the model's stated reasoning to detect misbehavior is therefore not viable. Trusting its behavior — the sequence of tools it actually invokes — is.

SequenceGuard is an opt-in field on SecurityPolicy that runs in the @Airlock seam right after the standard policy check, in two modes:

DECLARED mode — operator supplies a permitted-transition DAG. Any transition not in the DAG is a SequenceViolation. Deny-by-default.

from agent_airlock import Airlock, SecurityPolicy
from agent_airlock.sequence_guard import SequenceGuard, ENTRY_SENTINEL

policy = SecurityPolicy(
    sequence_guard=SequenceGuard(
        mode="declared",
        action="block",                       # or "warn"
        dag={
            ENTRY_SENTINEL: {"read"},         # only `read` may start a session
            "read": {"read", "summarize"},    # after read, either re-read or summarize
            "summarize": {"send"},            # after summarize, only send
            "send": set(),                    # send is terminal
        },
    ),
)

BASELINE mode — guard maintains a per-session-key Markov transition profile in a local JSON file (no cloud, no PII — only tool names and SHA-256 shape hashes of (arg types, kwarg names+types), never argument values) and flags transitions with observed P(curr | prev) < threshold once the sample size from prev reaches min_baseline_samples.

from pathlib import Path
from agent_airlock.sequence_guard import SequenceGuard

policy = SecurityPolicy(
    sequence_guard=SequenceGuard(
        mode="baseline",
        baseline_path=Path("/var/lib/airlock/sequence-baseline.json"),
        low_probability_threshold=0.05,   # flag the bottom 5%
        min_baseline_samples=50,          # don't flag until 50 obs from `prev`
    ),
)

Every flagged transition emits OTel span attributes on the current span (airlock.sequence_guard.mode, .from_tool, .to_tool, .session_key, .observed_probability) via the existing observability provider — telemetry failures are swallowed so they cannot break enforcement.

Not AnomalyDetector (that's rate / endpoint-diversity / error-rate / consecutive-blocked over sliding windows). SequenceGuard is per-transition ORDER signal. Run both for layered coverage. Not a chain-of-thought monitor — by construction.

Strictly opt-in. The new SecurityPolicy.sequence_guard field defaults to None; callers that don't set it get exactly v0.8.11 behavior. Zero new runtime deps — Pydantic-only core stays intact.

🛑 Action-time contradiction gate (v0.8.15)

arXiv:2605.27157 ("Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs", Yu et al., 2026) shows that LLMs readily acknowledge contradictory evidence in their reasoning trace yet "this awareness fails to constrain their final recommendations". The deficit is at action selection — single-turn diagnostics overestimate RAG safety, and detection alone is not a control.

ActionContradictionGate is an opt-in policy hook that wraps three pluggable detectors (any one trips) and a privileged-sink glob set. When a detector trips AND the dispatched tool matches a privileged sink AND the harness has not issued an explicit allow, the gate blocks the call (or warns, depending on action=).

The explicit-allow primitive is not new — the gate reuses the existing AirlockContext.authorize_once(tool_name) (introduced for the v0.8.6 reauth flow). Same one-shot grant, same semantics. After a one-shot is consumed the gate re-locks — the harness must mint a fresh authorize_once for each privileged action.

import re
from agent_airlock import Airlock, SecurityPolicy
from agent_airlock.action_contradiction_gate import ActionContradictionGate

policy = SecurityPolicy(
    action_contradiction_gate=ActionContradictionGate(
        # Detector 1: a boolean flag the RAG pipeline flips on after
        # it sees an evidence-vs-claim conflict the agent discussed.
        signal_field_key="evidence_contradiction",
        # Detector 2: pluggable regex against the SAME key when its
        # value is a string (operator-controlled marker — never the
        # model's full reasoning trace).
        marker_regex=re.compile(r"contradict|conflict|disagree", re.I),
        # Detector 3: fully pluggable callable; receives the context.
        # predicate=lambda ctx: ctx.metadata.get("conflict_count", 0) > 1,
        # Default privileged sinks: send_* / export_* / commit_* /
        # transfer_* / delete_* + the v0.8.14 outbound-integration set.
        # Operators can narrow via `privileged_sinks=(...)`.
        action="block",  # or "warn" for staged turn-up
    ),
)

Off-by-default invariant. SecurityPolicy.action_contradiction_gate defaults to None; non-RAG flows pay zero false-positive tax (no detector runs, no log lines, no metadata reads). Even when wired, the gate is inert until at least one detector slot is configured — so a partial roll-out (gate attached but detectors flipped off) admits everything.

Not a chain-of-thought monitor. The gate reads operator- controlled signals only (a metadata field, an operator regex, an operator predicate). It never reads the model's own claim that it has or has not noticed a contradiction — the paper's whole point is that those claims do not gate behavior.

Not sequence_guard (v0.8.12) — that flags unusual call ORDER. Not reauth_on_untrusted_reinvocation (v0.8.6) — that's count-driven on a per-tool counter. This gate is signal-driven and targets a specific privileged-sink glob set. They compose; run all three for layered coverage.

Strictly opt-in. Zero new runtime deps — Pydantic-only core stays intact. The new SecurityPolicy.action_contradiction_gate field defaults to None; callers that don't set it get exactly v0.8.14 behavior.

📊 Adversarial-negotiation regression harness (v0.8.17)

A deterministic harness that measures what the deny-by-default governance layer does to a fixed set of adversarial buyer-seller negotiation actions — and reports two metrics named to line up with an external published baseline so the numbers can sit side by side.

python -m agent_airlock.cli.negotiation_bench --report markdown

Each scenario carries a concrete, checkable unsafe action and runs twice — baseline (no airlock, the unsafe event lands) and governed (the same action through the real @Airlock intercept-before-execute path, no policy-layer mocking). Three unsafe-action classes each exercise a different real interception mechanism: price-below-floor → Pydantic strict-validation, secret-leak → the output sanitizer, transfer-outside-policy → deny-by-default SecurityPolicy. Benign deals are included to confirm governance does not over-block.

source	unsafe_execution_rate (base → governed)	valid_task_success_rate (base → governed)
agent-airlock (this harness)	100% → 0%	43% → 100%
OCL (external, live LLMs, arXiv:2606.04306)	88% → ~0%	12% → 96%

The OCL row is an external result, not agent-airlock's. It was measured on live frontier LLM agents in AgenticPay-adapted negotiation (OCL, arXiv:2606.04306; AgenticPay, arXiv:2602.06008) and is reproduced here only for directional comparison — both put governance at the execution boundary. It is not the same experiment: agent-airlock is a deterministic execution-boundary validator, not an LLM, and this harness does not call a model. The agent-airlock rows are a property of the policy layer under a worst-case scripted adversary, exercised through the real @Airlock path.

The harness doubles as a regression gate: --fail-if-governed-unsafe exits non-zero if the governed unsafe_execution_rate ever rises above zero, so a future change that weakens the policy layer fails CI. Zero new runtime deps; fully deterministic (no randomness, no network, no model call).

🔎 Privilege right-sizing — `airlock-explain --unused-scopes` (v0.8.13)

A read-only CLI that surfaces over-permissioning: it diffs the SecurityPolicy's granted tool scopes against the tools the agent actually called (from an OTLP export OR a native audit JSONL), per AgentIdentity, and prints the dead-weight set plus a suggested tightened allow-list.

# Install the v0.8.13 wheel; airlock-explain becomes available
pip install "agent-airlock>=0.8.13"

# Diff granted vs used; print a table
airlock-explain --unused-scopes \
    --policy ./security-policy.toml \
    --trace  ./agent.audit.jsonl

# Same, machine-readable, plus a proposed tightened policy preview
airlock-explain --unused-scopes \
    --policy ./security-policy.toml \
    --trace  ./otel-export.json \
    --format json \
    --suggest-policy

Observability-only. This command never mutates the SecurityPolicy, never writes the policy file, and never auto-applies the suggestion. The deny-by-default posture is unchanged — the right-size CLI is a review aid, not an enforcement primitive. The --suggest-policy output is intentionally a stdout preview so a human reviews the tightened allow-list before adopting it by hand.

Trace formats (auto-detected by inspecting the file head):

Audit JSONL — the format AuditLogger already emits. One JSON object per line, with tool_name / agent_id / blocked. Blocked calls are excluded from the "actually called" set — a blocked call is not an exercise of a granted scope.
OTLP JSON — the format opentelemetry-exporter-otlp writes. Span name is the tool name; attributes.agent_id keys the per- agent diff. If a span carries airlock.blocked=true it is skipped, same as JSONL.

Diff semantics. The matcher is fnmatch — the same glob semantics SecurityPolicy.check_tool_allowed uses internally, so the suggested tightened allow-list admits exactly the tools the agent was observed calling (no surprises at adoption time). Denied-list patterns are forwarded unchanged to the suggestion: denials are intent, not usage data.

Strictly observability. No new runtime deps. The new console-script entry airlock-explain is the project's first installable CLI; existing python -m agent_airlock.cli.<name> invocations are unaffected.

🩹 Skill-resistant trace redaction + watermark (v0.8.24)

Why traces are an extraction surface: an agent's emitted trace/receipt is a distillation target, not just an audit artifact. A trace that records the tuned thresholds a policy fired on, the exact tool-call arguments, and the recovered intermediate formulas/strategies hands a competitor the recipe — enough to clone the behaviour without paying for the search that found it. The verifier, by contrast, needs only the evidence (the gate ran / the policy fired / pass-fail), never the recipe. This is the RedAct-style threat model — a composition of published behavioural-watermarking work (Agent Guide, arXiv:2504.05871; CoTGuard, arXiv:2505.19405; Distilling the Thought, arXiv:2601.05144). agent-airlock does not reproduce any paper's benchmark.

TraceRedactionPolicy (opt-in, OFF by default for backward compat, ON under STRICT_POLICY) runs at the non-local sink (e.g. the OTel exporter): it (a) localizes protected fields with a configurable field-classifier (tuned thresholds, tool-call args, recovered formulas/strategies), (b) rewrites them to keep verifier-critical evidence while dropping the recipe, and (c) embeds a per-tenant behavioural watermark so a leaked trace is provably yours. Detect it with airlock trace verify-watermark <trace.json> (cryptographic keyed-HMAC match → high true-detection, low false-alarm); add --redaction-report to see what was localized / rewritten / preserved. Stdlib-only — no new runtime dependency.

from agent_airlock import TraceRedactionPolicy, trace_redact, verify_watermark

pol = TraceRedactionPolicy(enabled=True, tenant_id="acme-co", watermark_secret="...")
redacted, report = trace_redact(trace, pol)   # tuned_threshold → evidence stub; recipe dropped
assert verify_watermark(redacted, pol).detected   # provably yours

✅ Fail-closed terminal-claim guard — `no_false_success` (v0.8.25)

An honest stall is recoverable; a confident wrong done is not. The dominant failure mode of unattended long-horizon agents isn't crashing — it's confidently reporting success they never verified (Goal-Autopilot, arXiv:2606.11688). The no_false_success preset enforces that paper's No-False-Success floor: a terminal/done claim is admitted only if a named, falsifiable check actually executed and passed THIS run. No receipt, a failed check, or a forged/replayed receipt → the guard fails closed to a recoverable honest stall (run the named check and retry), never a fabricated success.

Forgery resistance is structural: the guard mints a per-run token and only trusts a receipt it stamped by executing the check this run — a receipt that's merely present (hand-built, or replayed from a prior run) is rejected. Opt in per-agent with AirlockConfig(require_done_receipt=True) (or require_done_receipt = true under [airlock] in airlock.toml); OFF by default. Stdlib-only — no new runtime dependency.

from agent_airlock import no_false_success_defaults, NoFalseSuccessStall

preset = no_false_success_defaults({"tests_green": run_pytest})  # falsifiable check
preset["guard"].run("tests_green")        # actually execute the check this run
preset["check"]("tests_green")            # raises NoFalseSuccessStall unless it passed

💰 Cost Control

A runaway agent can burn $500 in API costs before you notice.

from agent_airlock import Airlock, AirlockConfig

config = AirlockConfig(
    max_output_chars=5000,    # Truncate before token explosion
    max_output_tokens=2000,   # Hard limit on response size
)

@Airlock(config=config)
def query_logs(query: str) -> str:
    return massive_log_query(query)  # 10MB → 5KB

ROI: 10MB logs = ~2.5M tokens = $25/response. Truncated = ~1.25K tokens = $0.01. 99.96% savings.

Per-model-tier budgets (v0.8.7)

The flat max_output_* caps above apply uniformly to every call. ModelTierBudget caps per-call cost and output tokens per model tier label (e.g. "frontier" / "mid" / "small"), evaluated before the tool runs. Untagged calls fall back to a configurable strict_tier (deny-by-default — the cheapest tier).

from agent_airlock import (
    Airlock, ModelTierBudget, SecurityPolicy, TierBudget,
)

policy = SecurityPolicy(
    model_tier_budget=ModelTierBudget(
        tiers={
            "frontier": TierBudget(max_cost_cents=50, max_output_tokens=4000),
            "mid":      TierBudget(max_cost_cents=10, max_output_tokens=2000),
            "small":    TierBudget(max_cost_cents=2,  max_output_tokens=1000),
        },
        strict_tier="small",  # untagged → cheapest tier (deny-by-default)
    ),
)

@Airlock(policy=policy, return_dict=True)
def call_model(prompt: str, **_extra):
    return run_my_router(prompt)

# The router tags each call. Airlock blocks before the model fires.
call_model("Draft a tweet",  _airlock_tier="small",    _airlock_input_tokens=50)
call_model("Deep analysis", _airlock_tier="frontier", _airlock_input_tokens=200_000)
# →  AIRLOCK_BLOCK: Tier 'frontier' budget exceeded (worst-case 66¢ > cap 50¢)

Routing logic stays in the user's router. Three tagging routes are supported:

_airlock_tier kwarg — stripped before the tool sees it.
context.metadata["airlock_tier"] — set on a contextvar-stored AirlockContext by the router's session middleware.
tier_resolver callback — ModelTierBudget(tier_resolver=fn) where fn(model_id: str) -> tier_label lives in the caller's code. Airlock invokes the callback when context.metadata["model_id"] is set; it carries no vendor-specific model→tier table.

After execution, actual vs estimated cost is reconciled into the global CostTracker (observability — never blocks). See examples/model_tier_budget.py for all four patterns including composition with allow/deny lists.

A ready-to-use strict_tier_budget_policy() preset returns a SecurityPolicy seeded with the table above.

🔐 PII & Secret Masking

config = AirlockConfig(
    mask_pii=True,      # SSN, credit cards, phones, emails
    mask_secrets=True,  # API keys, passwords, JWTs
)

@Airlock(config=config)
def get_user(user_id: str) -> dict:
    return db.users.find_one({"id": user_id})

# LLM sees: {"name": "John", "ssn": "[REDACTED]", "api_key": "sk-...XXXX"}

12 PII types detected · 4 masking strategies · Zero data leakage

Opt-in regional PII (`pii_locales`)

Aadhaar / PAN / UPI / IFSC have always shipped as SensitiveDataType members, but are not added to the default mask_pii=True set — to keep the surface zero-dep and US-shaped by default. v0.8.9 adds a pii_locales opt-in that pulls them in and tightens detection:

config = AirlockConfig(
    mask_pii=True,
    pii_locales=["in"],   # opt in to India-locale detection
)

@Airlock(config=config)
def lookup(query: str) -> str:
    return (
        "User: राम कुमार, "
        "Aadhaar: 234567890124, "       # → "23********24" (PARTIAL)
        "PAN: ABCDE1234F, "             # → "AB******4F" (PARTIAL)
        "phone: 555-123-4567"           # still masked by existing PHONE regex
    )

Two things activate when "in" in pii_locales:

Aadhaar Verhoeff checksum gate — the existing Aadhaar regex is permissive (any 12-digit number starting 2-9 matches). With the opt-in, each match must also pass the UIDAI Verhoeff checksum, cutting the FP rate ~10x on random IDs / phone numbers.
Devanagari personal-name detection — PERSONAL_NAME_DEVANAGARI runs against the Unicode block U+0900–U+097F, with a small allowlist of common Hindi greetings / pronouns / interrogatives to keep ordinary prose from being masked. Conservative heuristic — production callers who need precise extraction should layer NER on top.

The flag is additive and reversible — pii_locales=[] (the default) preserves the prior behavior bit-for-bit.

🌐 Network Airgap (V0.3.0)

Block data exfiltration during tool execution:

from agent_airlock import network_airgap, NO_NETWORK_POLICY

# Block ALL network access
with network_airgap(NO_NETWORK_POLICY):
    result = untrusted_tool()  # Any socket call → NetworkBlockedError

# Or allow specific hosts only
from agent_airlock import NetworkPolicy

INTERNAL_ONLY = NetworkPolicy(
    allow_egress=True,
    allowed_hosts=["api.internal.com", "*.company.local"],
    allowed_ports=[443],
)

💉 Framework Vaccination (V0.3.0)

Secure existing code without changing a single line:

from agent_airlock import vaccinate, STRICT_POLICY

# Before: Your existing LangChain tools are unprotected
vaccinate("langchain", policy=STRICT_POLICY)

# After: ALL @tool decorators now include Airlock security
# No code changes required!

Supported: LangChain, OpenAI Agents SDK, PydanticAI, CrewAI

⚡ Circuit Breaker (V0.4.0)

Prevent cascading failures with fault tolerance:

from agent_airlock import CircuitBreaker, AGGRESSIVE_BREAKER

breaker = CircuitBreaker("external_api", config=AGGRESSIVE_BREAKER)

@breaker
def call_external_api(query: str) -> dict:
    return external_service.query(query)

# After 5 failures → circuit OPENS → fast-fails for 30s
# Then HALF_OPEN → allows 1 test request → recovers or reopens

📈 OpenTelemetry Observability (V0.4.0)

Enterprise-grade monitoring:

from agent_airlock import configure_observability, observe

configure_observability(
    service_name="my-agent",
    otlp_endpoint="http://otel-collector:4317",
)

@observe(name="critical_operation")
def process_data(data: dict) -> dict:
    # Automatic span creation, metrics, and audit logging
    return transform(data)

🔌 Framework Compatibility

The Golden Rule: @Airlock must be closest to the function definition.

@framework_decorator    # ← Framework sees secured function
@Airlock()             # ← Security layer (innermost)
def my_function():     # ← Your code

LangChain / LangGraph

from langchain_core.tools import tool
from agent_airlock import Airlock

@tool
@Airlock()
def search(query: str) -> str:
    """Search for information."""
    return f"Results for: {query}"

OpenAI Agents SDK

from agents import function_tool
from agent_airlock import Airlock

@function_tool
@Airlock()
def get_weather(city: str) -> str:
    """Get weather for a city."""
    return f"Weather in {city}: 22°C"

PydanticAI

from pydantic_ai import Agent
from agent_airlock import Airlock

@Airlock()
def get_stock(symbol: str) -> str:
    return f"Stock {symbol}: $150"

agent = Agent("openai:gpt-4o", tools=[get_stock])

CrewAI

from crewai.tools import tool
from agent_airlock import Airlock

@tool
@Airlock()
def search_docs(query: str) -> str:
    """Search internal docs."""
    return f"Found 5 docs for: {query}"

More frameworks: LlamaIndex, AutoGen, smolagents, Anthropic

LlamaIndex

from llama_index.core.tools import FunctionTool
from agent_airlock import Airlock

@Airlock()
def calculate(expression: str) -> int:
    return eval(expression, {"__builtins__": {}})

calc_tool = FunctionTool.from_defaults(fn=calculate)

AutoGen

from autogen import ConversableAgent
from agent_airlock import Airlock

@Airlock()
def analyze_data(dataset: str) -> str:
    return f"Analysis of {dataset}: mean=42.5"

assistant = ConversableAgent(name="analyst", llm_config={"model": "gpt-4o"})
assistant.register_for_llm()(analyze_data)

smolagents

from smolagents import tool
from agent_airlock import Airlock

@tool
@Airlock(sandbox=True)
def run_code(code: str) -> str:
    """Execute in E2B sandbox."""
    exec(code)
    return "Executed"

Anthropic (Direct API)

from agent_airlock import Airlock

@Airlock()
def get_weather(city: str) -> str:
    return f"Weather in {city}: 22°C"

# Use in tool handler
def handle_tool_call(name, inputs):
    if name == "get_weather":
        return get_weather(**inputs)  # Airlock validates

Adapter-shipped vs example-only (honest split)

Both paths use the same @Airlock() decorator placement. "Adapter-shipped" means there's a dedicated src/agent_airlock/integrations/<framework>.py module with framework-specific glue (signature preservation, tool registry rewrites, request-shape adapters). "Example-only" means the decorator is compatible out of the box — no extra adapter required.

Adapter-shipped (11): LangChain (integrations/langchain.py), LangGraph (integrations/langgraph_toolnode_compat.py), OpenAI Agents SDK (integrations/openai_guardrails.py), Anthropic Messages API (integrations/anthropic.py), Anthropic Claude Agent SDK (integrations/anthropic_claude_agent_sdk.py, v0.6.1+), smolagents (integrations/smolagents_wrapper.py), Gemini 3 Agent Mode (integrations/gemini3_tool_shape_adapter.py), GPT-5.5 (integrations/gpt5_5_tool_shape_adapter.py), PydanticAI (integrations/pydantic_ai.py, v0.7.1+), CrewAI (integrations/crewai.py, v0.7.2+), FastMCP (mcp.py).

Example-only (2): AutoGen, LlamaIndex — decorator-compatible without an adapter; see examples/.

Complete Examples

Framework	Path	Surface
LangChain	adapter · example	@tool, AgentExecutor
LangGraph	adapter · example	StateGraph, ToolNode
OpenAI Agents	adapter · example	Handoffs, manager pattern
Anthropic API	adapter · example	Direct Messages API
Claude Agent SDK	adapter · doc	`wrap_agent(agent, policy=...)`
smolagents	adapter · example	CodeAgent, E2B
Gemini 3	adapter	`function_call` carrier + `thought_signature` redaction
GPT-5.5	adapter	`gpt_5_5_agent_defaults` preset
FastMCP	adapter · example	`@secure_tool` decorator
PydanticAI	adapter · doc · example	`wrap_agent(agent, policy=...)` + output_validate hook
CrewAI	adapter · doc · example	`wrap_crew(crew, policy=...)` + task-level tool overrides
LlamaIndex	example only	ReActAgent
AutoGen	example only	ConversableAgent

⚡ FastMCP Integration

from fastmcp import FastMCP
from agent_airlock.mcp import secure_tool, STRICT_POLICY

mcp = FastMCP("production-server")

@secure_tool(mcp, policy=STRICT_POLICY)
def delete_user(user_id: str) -> dict:
    """One decorator: MCP registration + Airlock protection."""
    return db.users.delete(user_id)

🏆 Why Not Enterprise Vendors?

	Prompt Security	Pangea	Agent-Airlock
Pricing	$50K+/year	Enterprise	Free forever
Integration	Proxy gateway	Proxy gateway	One decorator
Self-Healing	❌	❌	✅
E2B Sandboxing	❌	❌	✅ Native
Your Data	Their servers	Their servers	Never leaves you
Source Code	Closed	Closed	MIT Licensed

We're not anti-enterprise. We're anti-gatekeeping. Security for AI agents shouldn't require a procurement process.

📦 Installation

# Core (validation + policies + sanitization)
pip install agent-airlock

# With E2B sandbox support
pip install agent-airlock[sandbox]

# With FastMCP integration
pip install agent-airlock[mcp]

# Everything
pip install agent-airlock[all]

# E2B key for sandbox execution
export E2B_API_KEY="your-key-here"

🛡️ OWASP Compliance

Agent-Airlock maps to the OWASP Top 10 for Agentic Applications (2026) — the agentic-era successor to the old LLM Top 10. Coverage is reported honestly: Full means the primitive ships and blocks the class in tests; Partial means agent-airlock covers the runtime leg but something upstream (client UI, IAM, training data) is out of scope; Monitor-only means we surface the signal but do not actually prevent the risk.

Risk	Implemented in agent-airlock	Module / preset	Coverage
ASI01 Agent Goal Hijack	Pydantic strict validation + ghost-arg rejection + `UnknownArgsMode.BLOCK`	`validator`, `unknown_args`, `core`	Partial
ASI02 Tool Misuse and Exploitation	Deny-by-default `SecurityPolicy`, RBAC, rate limits, `SafePath` / `SafeURL`, Flowise `Function()`/`eval` token ban (CVE-2025-59528), MCPwn destructive-auth check (CVE-2026-33032), Mobile MCP intent-URL guard (CVE-2026-35394)	`policy`, `safe_types`, `filesystem`, `network`, `policy_presets.flowise_cve_2025_59528_defaults`, `policy_presets.mcpwn_cve_2026_33032_defaults`, `policy_presets.mobile_mcp_intent_guard_2026_05`	Full
ASI03 Identity and Privilege Abuse	`AgentIdentity`, `MCPProxyGuard` token-passthrough prevention, `CredentialScope`, OAuth-app audit (Vercel 2026-04-19), MCP Attested Tool-Server Admission (arXiv:2605.24248)	`policy`, `mcp_proxy_guard`, `mcp_spec.oauth_audit`, `mcp_spec.attested_admission`, `policy_presets.oauth_audit_vercel_2026_defaults`, `policy_presets.mcp_attested_admission_defaults`	Partial
ASI04 Agentic Supply Chain Vulnerabilities	Ox MCP STDIO sanitizer + CVE regression suite (11+ CVEs tracked) + session-snapshot integrity guard + spawn-time MCP config pin (CVE-2026-30615, `policy_presets.mcp_config_pin`)	`mcp_spec.stdio_guard`, `mcp_spec.session_guard`, `mcp_spec.zero_click_config_guard`, `policy_presets.stdio_guard_ox_defaults`, `policy_presets.mcp_config_pin`, `tests/cves/`	Partial
ASI05 Unexpected Code Execution (RCE)	E2B Firecracker sandbox, pluggable `SandboxBackend`, capability gating for `PROCESS_SHELL`, Flowise eval-token ban (CVE-2025-59528)	`sandbox`, `sandbox_backend`, `capabilities`, `policy_presets.flowise_cve_2025_59528_defaults`	Full
ASI06 Memory & Context Poisoning	`AirlockContext` `contextvars` isolation, `ConversationConstraints` budget caps, audit logging	`context`, `conversation`, `sanitizer`	Partial
ASI07 Insecure Inter-Agent Communication	A2A middleware Pydantic strict validation, method allow-lists	`a2a`	Partial
ASI08 Cascading Failures	`CircuitBreaker`, `RetryPolicy`, token-bucket rate limits	`circuit_breaker`, `retry`, `policy`	Full
ASI09 Human-Agent Trust Exploitation	Honeypot deception, audit-log attribution, structured `fix_hints`	`honeypot`, `audit_otel`	Partial
ASI10 Rogue Agents	Audit telemetry + anomaly detector; no quarantine primitive	`observability`, `anomaly`	Monitor-only

MCP-specific mapping

The OWASP MCP Top 10 (2026 beta) is covered end-to-end by the OWASP_MCP_TOP_10_2026 policy preset:

MCP risk	Ships in agent-airlock
MCP01 Token Mismanagement	`MCPProxyGuard` rejects passthrough headers, enforces audience
MCP02 Excessive Permissions	`SecurityPolicy` + `CredentialScope`
MCP03 Tool Poisoning	ghost-arg rejection + `SafePath`/`SafeURL`
MCP04 Supply Chain	`stdio_guard_ox_defaults()` (Ox 2026-04-16 advisory)
MCP05 Command Injection	`stdio_guard` shell-metachar + deny-pattern rules
MCP07 Insufficient Authentication	OAuth 2.1 + PKCE S256 helpers in `mcp_spec.oauth`
MCP10 Context Oversharing	PII/secret sanitizer + workspace-scoped config

Use it directly:

from agent_airlock import Airlock
from agent_airlock.policy_presets import owasp_mcp_top_10_2026_policy

@Airlock(policy=owasp_mcp_top_10_2026_policy())
def my_mcp_tool(...):
    ...

Ox Security STDIO advisory (2026-04-16, CVE-2026-30616): see docs/cves/index.md#cve-2026-30616 and the stdio_guard_ox_defaults() preset above. agent-airlock blocks 3 of 4 Ox attack classes at the runtime seam.

🏢 Used By

Agent-Airlock secures AI agent systems in production:

Project	Use Case
FerrumDeck	AgentOps control plane — deny-by-default tool execution
Mnemo	MCP-native memory database — secure tool call validation

Using Agent-Airlock in production? Open a PR to add your project!

📊 Performance

Test count and coverage are published by the TEST-BADGE block at the top of this file, regenerated from pytest on every release via python scripts/update_test_badge.py. That block is the source of truth; this table tracks latency and surface area only.

Metric	Value
Validation overhead	<50ms
Sandbox cold start	~125ms
Sandbox warm pool	<200ms
Framework integrations	13
Core dependencies	0 (Pydantic only)

📖 Documentation

Resource	Description
AGENTS.md	v0.6.1 — repo-root entrypoint for agentic IDEs (Cursor, Claude Code, Windsurf, Mintlify)
Anthropic Claude Agent SDK adapter	v0.6.1 — `AnthropicClaudeAgentSDKAdapter.wrap_agent(agent, policy=...)`; canonical-list trio
`airlock manifest enforce`	v0.6.1 — fail-closed CLI runtime allowlist gate against signed manifests; CI exits 0/2/3
Managed Agents Outcomes-rubric guard	v0.7.4 — fail-closed gate on the Anthropic Managed Agents 2026-05-06 Outcomes rubric ID; `ManagedAgentsOutcomesGuard.evaluate(provenance)` + `managed_agents_outcomes_2026_05_06_defaults` factory; no SDK dep
Filter-Eval RCE guard (CVE-2026-25592 + CVE-2026-26030)	v0.7.5 — regex detector for the Semantic-Kernel-class lambda-filter / template-expression eval RCE primitive (MSRC 2026-05-07); `FilterEvalRCEGuard.evaluate(args)` + `semantic_kernel_filter_eval_rce_2026_25592_26030_defaults` factory; framework-agnostic
OIDC publish-window guard (TanStack 2026-05-11)	v0.7.6 — known-bad blast-list guard for the TanStack/Mini-Shai-Hulud npm OIDC trusted-publisher class (postmortem 2026-05-11; 42 pkgs × 84 versions); `OIDCPublishWindowGuard.evaluate(args)` + `npm_oidc_publish_window_guard_defaults` factory; pure-data preset, no runtime npm calls
MCP STDIO command-injection guard	v0.7.6 — shell metachar + opt-in path-traversal denier for MCP STDIO argv vectors (HelpNetSecurity 2026-05-05); `StdioCommandInjectionGuard.evaluate(args)` + `mcp_stdio_command_injection_preset_defaults` factory; no `mcp` SDK dep
Eval-RCE guard (CVE-2026-44717)	v0.8.0 — bare-`eval()`/`parse_expr()`/`exec()` invocation detector for the MCP Calculate Server class (NVD 2026-05-15); `EvalRCEGuard.evaluate(args)` + curated vulnerable-package denylist + `parse_expr` safe-form exemption + `stdio_guard_eval_defaults_2026_05_15` factory
MCP Inspector exposure guard (CVE-2026-23744 runtime)	v0.8.0 — Linux runtime listener-scan via stdlib `/proc/net/tcp` for the MCPJam Inspector public-bind class; complements v0.5.x config-time `bind_address_guard`; `MCP_INSPECTOR_REQUIRE_AUTH=1` operator bypass
Agent SDK Credit pool budget	v0.8.0 — per-month USD pool tracker for Anthropic's 2026-06-15 billing split (Zed blog 2026-05-14); `AgentSDKCreditBudget.register_call(model, input_tokens, output_tokens)` with 90% near-limit + 100% exhausted thresholds; packaged 2026-06 pricing fixture
OpenAPI Drift Guard (Hermes 2026-05-13)	v0.8.1 — payload-shape drift detector against an operator-supplied OpenAPI 3.x spec (arXiv:2605.14312); `OpenAPIDriftGuard.evaluate(operation_id, args)` detects `missing_required` / `unknown_field` / `type_mismatch`; three modes (`strict` / `warn` / `shadow`); `vaccinate_openapi(spec)` decorator + `openapi_doc_drift_guard_defaults` factory; caller supplies spec dict, no PyYAML dep
MCP Calc-Server bundle preset	v0.8.1 — composition factory `mcp_calc_server_bundle_defaults_2026_05_15()` wires v0.8.0 `EvalRCEGuard` + v0.7.6 `StdioCommandInjectionGuard` under a single preset_id (CVE-2026-44717 anchor) scoped to calc/calculate/evaluate/sympy_eval/math_eval tool-name patterns; pure config composition, no new detector module
Metis-inspired corpus block-rate regression	v0.8.2 — release-gate primitive `MetisInspiredCorpusBlockRateGuard` runs a deterministic 25-entry exploit-shape corpus (CVE-2026-44717 + 2026-05-05 STDIO injection) through `EvalRCEGuard + StdioCommandInjectionGuard`; one-sided gate fires when block rate drops below baseline − 5%; NOT a reproduction of the Metis paper's POMDP attacker (arXiv:2605.10067 cited as motivation, not as prompt source); `airlock corpus-bench` CLI ships text/json/md reports
Corpus per-category coverage	v0.8.3 — extends the v0.8.2 corpus-bench with HarnessAudit-Bench (arXiv:2605.14271) two-category taxonomy (`resource_access`, `info_transfer`); `CorpusEntry.violation_category` field + `CategoryCount` decision field; `airlock corpus-bench` reports per-category coverage in text/json/md; NOT a reproduction of HarnessAudit-Bench (artifacts not yet public — taxonomy adopted as schema, scoring is not)
Stainless SDK provenance classifier	v0.8.3 — pure-function `classify_sdk_lineage(user_agent, response_body_head)` building block flags MCP servers generated by the deprecated Stainless SDK toolchain (Anthropic acquired Stainless 2026-05-13, hosted generator winding down); operator-callable from own audit hooks — NOT an automatic HTTP probe (decorator-in-process architecture, see ROADMAP §1); `stainless_provenance_probe_defaults()` preset is `default_action=tag_only`, visibility not enforcement
Human-oversight decorator	v0.8.4 — `@requires_human_oversight(approver=...)` gates a tool function on an operator-supplied approval callable (Code-as-Harness arXiv:2605.18747 anchor); `GRANT` → call wrapped fn, `DENY` → `OversightDeniedError`, `TIMEOUT` → `OversightTimeoutError`; composes with `@Airlock(...)`; protocol shapes + `InProcessRecordedApprover` testing helper; NOT a bidirectional audit-emitter RPC channel — operator owns the transport (Slack/PagerDuty/CLI), agent-airlock owns the gate + the protocol
Layer-contract receipt block	v0.8.5 — opt-in `LayerContract` (assume/guarantee) block on signed `airlock attest receipt` payloads (arXiv:2605.18672 anchor); `--contract` derives per-guard `pass_rate` from the verdicts list, `--assumes id1,id2` declares upstream-layer dependencies; receipt schema v1 unchanged (additive field); `pass_rate` is a measured statistic over the sample (not a proof) — every Guarantee carries `sample_size` so verifiers can weight low-N appropriately; NOT backed by a window-counter store (that infrastructure doesn't exist yet — derived from the operator-supplied verdicts list, no new abstraction)
MCP Attested Tool-Server Admission (arXiv:2605.24248)	v0.8.10 — opt-in admission gate for MCP tool servers per Metere (May 2026). Host fetches a JWS-compact clearance from `{server_url}/.well-known/mcp-clearance`, verifies its signature against an operator-pinned trust root (Ed25519 / RSA-PSS / JWKS — never network-fetched on the hot path), and enforces a deny-by-default per-server tool allowlist parsed from the verified clearance. Flavor-gated `ENFORCE` (hard-deny) / `WARN` (log only) modes. Every decision emits a `ReceiptVerdict` on the `guard="mcp_attested_admission"` channel — reuses the existing `airlock attest` DSSE path, does not invent a new log. `mcp_attested_admission_defaults()` factory + `MCPProxyGuard.audit_tool_admission()` integration; signature verification gated behind `pip install agent-airlock[attested]`.
Mobile MCP intent-URL guard (CVE-2026-35394)	v0.8.8 — defensive bundle for the Mobilenexthq Mobile MCP `mobile_open_url` intent-injection RCE class (< 0.0.50). `mobile_mcp_intent_guard_2026_05()` returns a pre-configured `SafeURLValidator(allowed_schemes=["http", "https"])` (blocks `intent:`, `content:`, `file:`, `app:`, `data:`, `javascript:`, `vbscript:`), an `AirlockConfig(unknown_args=UnknownArgsMode.BLOCK)`, and the canonical Mobile MCP tool-name corpus (`mobile_open_url`, `open_url`, `mobile_launch_url`). DIFF-COMPATIBLE with the existing `SafeURL` type — no new validator invented. Also fixes a pre-existing `block_private_ips=True` no-op in `SafeURLValidator` (RFC1918 ranges were not actually blocked because the validator's own `SafeURLValidationError` raise was caught by `except ValueError`).
Capsule ShareLeak / PipeLeak (CVE-2026-21520)	v0.8.14 — defensive bundle for the Capsule Security-disclosed indirect-prompt-injection class hitting Microsoft Copilot Studio (ShareLeak, CVE-2026-21520, CVSS 7.5 HIGH, CWE-77, patched 2026-01-15) and Salesforce Agentforce (PipeLeak, parallel pattern). Both vectors share the same architecture: untrusted form input (SharePoint form / Web-to-Lead form) is concatenated into the agent's context with no boundary, while the agent simultaneously holds outbound exfil tools (Outlook send / Salesforce email-case). `capsule_indirect_injection_cve_2026_21520_defaults()` composes existing primitives — `default_deny=True` + canonical exfil-sink `denied_tools` (`send_email`, `outlook_`, `create_case`, `share_`, `export_`, `post_to_`, `webhook_*`, ...) + `reauth_on_untrusted_reinvocation=True` (v0.8.6 debate-amplification guard at `threshold=1`) + `AirlockConfig(unknown_args=UnknownArgsMode.BLOCK)`. Opt-in only — no new validator invented, no default-priority-chain entry. Pairs with `airlock-explain --unused-scopes` (v0.8.13) so operators populate the read-side allow-list from a real trace before deploying.
Flowise MCP-stdio adapter RCE (CVE-2026-40933)	v0.8.16 — defensive control for the Flowise authenticated-RCE-via-MCP-stdio-adapter class (CVSS 9.9, fixed upstream in Flowise 3.1.0). Flowise ≤ 3.0.x serialises a user-defined CustomMCP `command`+`args` straight into a child-process spawn with no sandbox or argv sanitisation — importing a crafted chatflow is a one-click path to OS-level RCE. `flowise_mcp_stdio_guard_2026_defaults()` is a per-tool-class projection of the v0.7.6 `StdioCommandInjectionGuard` (no new detector invented), scoped to the Flowise CustomMCP stdio surface. Fail-closed on shell metachars (`;`, `&&`, `\|\|`, `\|`, newline, backtick, `$(`) in the `command`/`args` path + opt-in path-traversal outside a `cwd_allowlist`; `check(args)` raises `FlowiseMcpStdioInjectionError`. OWASP MCP05 Command Injection. Wired into `ox_mcp_supply_chain_2026_04_defaults()` — corrects a prior mis-attribution where CVE-2026-40933 was recorded as a "Semantic Kernel auth-header leak".
MCP description-vs-manifest guard (`mcp_description_manifest_guard`)	v0.8.18 — runtime consistency gate that asserts a tool's model-facing description (declared input schema + advertised capability/security boundary) matches its registered manifest before the tool is admitted, failing closed per the deny-by-default posture. Anchored on the DCIChecker study (arXiv:2606.04769), which measured Description-Code Inconsistency at 9.93% of 19,200 tool pairs across 2,214 MCP servers. `DescriptionManifestGuard.evaluate(description)` detects `described_arg_not_in_manifest` (description claims a ghost argument), `undisclosed_side_effect` (manifest has a side effect the description hides — the tool-poisoning direction), and `overclaimed_capability` (description advertises a capability absent from the manifest); three modes (`strict` / `warn` / `shadow`); `vaccinate_description_manifest(manifests)` decorator + `mcp_description_manifest_guard_defaults()` factory. Composes above ghost-arg stripping + Pydantic type-validation (which govern the call payload) — it does not replace them. OWASP MCP03 Tool Poisoning. Pydantic-only core, no new runtime deps.
LeRobot pickle-deserialization RCE (CVE-2026-25874)	v0.8.19 — deny-by-default posture for the HuggingFace LeRobot unauthenticated-RCE class (CVSS 9.3). LeRobot's async-inference PolicyServer / robot-client `pickle.loads()` payloads received over an unauthenticated, non-TLS gRPC channel (`SendObservations` / `SendPolicyInstructions` / `GetActions`) — an unauthenticated, network-reachable attacker reaches arbitrary OS command execution. Ships a reusable `UnsafeDeserializationGuard` (in `safe_types`, next to `SafePath`/`SafeURL`) that fails closed on pickle magic bytes (`0x80` PROTO), base64-encoded pickle, and `pickle`/`marshal`/`shelve`/`dill`/`jsonpickle` marker tokens in string args — plus an airgap pairing that refuses serialized-object (`bytes`) args unless the call declares an authenticated and TLS transport. Wired into `SecurityPolicy.deserialization_guard` and run at the `@Airlock` seam (Step 2.7) before the tool body; the block carries a `fix_hint` naming CVE-2026-25874. `lerobot_cve_2026_25874_defaults()` is the per-CVE projection (deny-by-name globs for `deserialize`/`pickle.loads`/`torch_load`/the gRPC methods + the wired content guard). Composes above ghost-arg stripping + Pydantic type-validation. Pydantic-only core, no new runtime deps.
MCP server-URL env-interpolation secret leak (CVE-2026-32625)	v0.8.20 — deny-by-default guard for the LibreChat MCP-server-URL credential-disclosure class (CVSS 9.6, CWE-200, OWASP MCP01). A user-supplied MCP server connection template (URL / header / arg) carrying an env-interpolation token (`${VAR}`, bare `$VAR`, or `%VAR%`) is expanded server-side against the host `process.env` and leaks a secret (`${JWT_SECRET}` / `${CREDS_KEY}` / `${MONGO_URI}`) into the outbound request. `MCPServerEnvInterpolationGuard.evaluate(config)` (in `mcp_spec/env_interpolation_guard.py`) scans the URL/headers/args recursively and refuses any interpolation token unless its variable is on an operator-declared `allowed_vars` allowlist of explicitly non-secret vars (empty default = deny all). It never reads `os.environ` or expands anything — token-match only, so it cannot itself leak. `mcp_server_env_interpolation_guard_defaults()` factory + `check(config)` raising `MCPServerEnvInterpolationError`; escaped `\$`/`$$` are not flagged. Pydantic-only core, no new runtime deps.
Codegen triple-quote / delimiter break-out RCE (CVE-2026-11393)	v0.8.21 — deny-by-default guard for the AWS AgentCore CLI code-injection class (CVSS 9, CWE-94, OWASP ASI05). The CLI splices a model-/user-controlled `collaborationInstruction` into generated Python without neutralising triple-quote characters, so a crafted `"""` closes the generated string literal and injects statements that execute on agent import — RCE on the AgentCore Runtime + the importer's machine. `CodegenDelimiterInjectionGuard.evaluate(args)` (in `mcp_spec/codegen_delimiter_guard.py`) recursively scans args bound for a codegen / template / `exec`/`eval` sink and fails closed on triple-quote tokens (`"""` / `'''`), quote break-out tokens (`");` / `')` / `" +` / `']`), and raw newlines — unless the field is on an operator-declared `allowed_literal_fields` allowlist of safe literal contexts. It never generates or executes code — token-match only. `codegen_delimiter_injection_guard_defaults()` factory + `check(args)` raising `CodegenDelimiterInjectionError`; composes one layer above the v0.8.0 `EvalRCEGuard` (which gates the sink itself). Pydantic-only core, no new runtime deps.
MCP-bridge subprocess command/args/env RCE (CVE-2026-42271, CISA KEV)	v0.8.22 — deny-by-default guard for the LiteLLM MCP-preview-endpoint command-injection class (CVSS 8.7, CWE-78, OWASP ASI05; on the CISA KEV catalog as of 2026-06-09, actively exploited). LiteLLM's `POST /mcp-rest/test/connection` + `/mcp-rest/test/tools/list` accepted a full MCP server config (`command` / `args` / `env`) in the request body and spawned it as a subprocess with no validation — any low-privilege API key reached host command execution (unauthenticated RCE when chained with the Starlette Host-header bypass CVE-2026-48710). `McpSubprocessArgInjectionGuard.evaluate(config)` (in `mcp_spec/subprocess_arg_guard.py`) treats spawn-shaped MCP-bridge args (`command`/`cmd`/`args`/`argv`/`env`) as untrusted and refuses them unless the resolved program is on an operator-declared `allowed_commands` allowlist of safe static commands (empty default = deny all); an `env` carrying a code-loading var (`LD_PRELOAD`/`PATH`/`PYTHONPATH`/…) is refused regardless, and a config with no spawn-shaped fields passes. Never spawns anything — config inspection only. `mcp_subprocess_arg_injection_guard_defaults()` factory + `check(config)` raising `McpSubprocessArgInjectionError`; composes one layer above the v0.7.6 `StdioCommandInjectionGuard` (which scans an allowed argv for shell metachars). Pydantic-only core, no new runtime deps.
Cline cross-origin WebSocket hijack (CVE-2026-44211)	v0.8.27 — deny-by-default guard for the Cline Kanban cross-origin WebSocket-hijack class (npm `kanban` < 2.13.0, CVSS 9.7, CWE-1385 Missing Origin Validation in WebSockets + CWE-306 Missing Authentication, OWASP ASI05). Cline runs a control WebSocket server on `127.0.0.1:3484` that accepts every upgrade without validating the `Origin` header; because browsers do not apply same-origin/CORS to `ws://`, any website the developer visits can drive the agent — leak workspace data, inject prompts into the agent terminal (RCE), or kill tasks. Binding to loopback is not a mitigation. `WebSocketOriginGuard` (in `mcp_spec/ws_origin_guard.py`) has two surfaces: `audit_endpoint(host=…, origin_allowlist_enforced=…)` flags a control endpoint that enforces no `Origin` allow-list (the misconfiguration), and `check_upgrade(origin)` / `enforce_upgrade(origin)` / `wrap_handler(handler)` form a runtime gate that rejects a WebSocket upgrade whose `Origin` is missing or outside an explicit allow-list (empty allow-list = deny all). The guard never opens a socket — descriptor / single-`Origin` inspection only. `cline_cve_2026_44211_defaults(allowed_origins=[…])` factory + `check(origin)` raising `WebSocketOriginHijackError`. Pydantic-only core, no new runtime deps.
SSRF egress guard — alternate-encoding loopback / rebinding (CVE-2026-47390)	v0.8.29 — deny-by-default egress guard for the SSRF-protection-bypass class (CWE-918, OWASP ASI02). An egress filter that checks the literal hostname string instead of the resolved IP is bypassed by encoding loopback / link-local / cloud-metadata in a form `ipaddress` rejects but the HTTP client connects to: `127.1`, decimal `2130706433`, octal `0177.0.0.1`, hex `0x7f000001`, `::ffff:127.0.0.1`, or a public hostname whose DNS record points at `169.254.169.254` (rebinding). `SSRFEgressGuard` (in `ssrf_egress_guard.py`) reduces every target to its canonical IP(s) — decoding the alternate encodings via `socket.inet_aton` and resolving hostnames at check time (so a rebind to loopback is caught at connect time, not just parse time) — and fails closed on any loopback / link-local / metadata / unspecified address or an RFC1918 range not on `allow_internal_hosts`, with a 3-line `explain` audit trace (rule / resolved IP / encoding) on every denial. Composes with the v0.5.5 `is_blocked_ipv6_range` set (IPv4-mapped / NAT64 / 6to4 / ULA). `ssrf_egress_guard_defaults(allow_internal_hosts=…)` factory + `check(url)` raising `SSRFEgressBlocked`. Pydantic-only core, no new runtime deps.
MCP Origin/Host DNS-rebinding guard (CVE-2026-11624)	v0.8.30 — deny-by-default Origin/Host validation for MCP HTTP/SSE/streamable transports (CWE-346 Origin Validation Error, CVSS 9.4, OWASP-MCP MCP07). Google MCP Toolbox for Databases < 0.25.0 served a local HTTP transport that did not validate the `Origin` or `Host` header, so a browser the developer visits can DNS-rebind to `127.0.0.1` and script MCP tool calls at the local server (file reads, command execution, DB access). Fixed upstream in 0.25.0 with a new `--allowed-hosts` flag alongside `--allowed-origins`, warning on the `` wildcard. `McpOriginHostGuard`* (in `mcp_spec/mcp_origin_host_guard.py`) validates the inbound `Host` (always) and `Origin` (when present) against explicit `allowed_origins` / `allowed_hosts` allow-lists; with none configured it falls back to loopback-only and records a startup warning, and a `*` wildcard allows all but also warns — mirroring the upstream fix (stdio transports have no Origin and are out of scope). `check_headers(headers)` / `validate(headers)` + a `startup_warnings` list; `mcp_origin_host_guard_defaults(allowed_origins=…, allowed_hosts=…)` factory + `check(headers)` raising `McpOriginHostRebindingError`. Pydantic-only core, no new runtime deps.
Examples	13 framework integrations (11 adapter-shipped + 2 example-only) with copy-paste code
Security Guide	Production deployment checklist
API Reference	Every function, every parameter
Egress Bench	CVE fixture walker — every payload previously blocked stays blocked
OX MCP Supply-Chain preset	Umbrella for the 2026-04-20 OX dossier (10 CVEs)
Elicitation guard (`mcp_elicitation_guard_2026_04`)	v0.6.0 — runtime mitigation for the MCP `tool/elicitation` round-trip (spec PR #1487, draft 2026-04-r1); blocks credential-request and policy-override classes
Config-path guard (CVE-2026-31402)	v0.6.0 — Claude Desktop MCP-server-registration path-traversal mitigation (CVSS 8.8)
Gemini 3 Agent Mode adapter	v0.6.0 — `function_call` carrier normalisation + `thought_signature` redaction; pinned `SUPPORTED_VERSIONS` set
OAuth `state` entropy guard	v0.6.0 — base64/hex/JSON decode + prompt-injection scan on the OAuth `state` parameter (BlackHat Asia 2026 vector)
`airlock console`	v0.6.0 — three-pane Textual TUI with live verdict stream + replay-on-edit; gated behind `airlock[console]` extra
`airlock attest receipt`	v0.6.0 — Sigstore-compatible signed agent-run receipts; `emit` + `verify` subcommands
`policy_bundle.lock`	v0.6.0 — hash-pinned preset bundles with `Cargo.lock` semantics; `airlock pack lock` + `airlock replay --bundle-lock`
`airlock studio`	v0.6.0 — local stdlib HTTP rehearsal sandbox; paste-a-transcript verdicts + diff between runs
smolagents wrapper	v0.6.0 — `wrap_agent(agent, policy_bundle)` for HuggingFace smolagents 1.18+ (4th first-class framework)
STDIO meta-guard (`mcp_stdio_meta_cve_2026_04`)	v0.5.9 — bundles every airlock STDIO defence into one chain; recommended default for any MCP server registered after 2026-04-26
LangGraph 1.0.11 ToolNode compat shim	v0.5.9 — silent unwrap survives the prebuilt 1.0.11 list-vs-dict shape break
GPT-5.5 ("Spud") agent defaults + tool-shape adapter	v0.5.9 — caps fan-out at 8 / context at 900k / per-call egress at 512 KB
Capability caps (`agent_capability_default_caps`)	v0.5.9 — programmatic caps for SIGN_CONTRACT / DELEGATE_TO_AGENT / INVOKE_TOOL / WRITE_FILE / NETWORK_EGRESS
OWASP Agentic 2026-Q1 coverage matrix	v0.5.9 — 10/10 mapping risk_id → guard + preset + test, CI gate fails on stale entries
Short-form-video corpus (`wild-2026-04/short_form_video`)	v0.5.9 — 5 transcript / on-screen / RTL PoCs; `airlock replay --namespace short_form_video`
`airlock graph serve`	v0.5.9 — local web UI of the live agent → tool → MCP-server topology with verdict overlay
`airlock policy compile / explain`	v0.5.9 — natural-language policy authoring with hash-pinned prompt + deterministic cache
`airlock kill-switch`	v0.5.9 — HMAC-signed cluster-wide freeze with 2-of-3 quorum reset
Comment-and-Control PR-metadata guard	v0.5.8 — neutralises CVSS 9.4 cross-vendor PR-title prompt injection
`airlock pack`	v0.5.8 — signed policy bundles; `airlock pack install claude-code-ci@2026.04`
`airlock baseline`	v0.5.8 — per-agent 7-day rolling profile + drift score
`airlock attest`	v0.5.8 — DSSE provenance per verdict
Cloudflare Mesh compat	v0.5.8 — runs alongside Mesh; de-duplicates overlapping policies
Manifest-only STDIO mode	v0.5.7 — signed-manifest registry; argv never originates from runtime input
STDIO-taint CI gate	v0.5.7 — AST taint analyzer; flags remote→Popen flows at PR time
Declarative preset YAML	v0.5.7 — composite presets via stdlib-only YAML parser
CVE-2026-30615 Windsurf zero-click	v0.5.7 — diff-on-demand mcp.json auto-load guard; v0.8.23 adds `mcp_config_pin` — a spawn-time `{name, command, args, env-keys}` fingerprint pin (`McpConfigPinSet.check()`) that fails closed (raises, never warns) on an injected (unpinned) or mutated STDIO server even when the injection never touched a watched config file; emits on the structlog + JSON-Lines audit channels
CVE-2026-6980 GitPilot-MCP	v0.5.7 — repo_path injection (vendor unresponsive)
DockerBackend	v0.5.1 hardening + known gaps

Regulatory engagement

Public comment draft — NIST AI RMF v2.0 Agentic-AI Security (window: 2026-04-18 → mid-June)

👤 About

Built by Sattyam Jain — AI infrastructure engineer.

This started as an internal tool after watching an agent hallucinate its way through a production database. Now it's yours.

🤝 Contributing

We review every PR within 48 hours.

git clone https://github.com/sattyamjjain/agent-airlock
cd agent-airlock
pip install -e ".[dev]"
pytest tests/ -v

Bug? Open an issue
Feature idea? Start a discussion
Want to contribute? See open issues

💖 Support

If Agent-Airlock saved your production database:

⭐ Star this repo — Helps others discover it
🐛 Report bugs — Open an issue
📣 Spread the word — Tweet, blog, share

⭐ Star History

Built with 🛡️ by Sattyam Jain

_{Making AI agents safe, one decorator at a time.}

_{Sources: This README follows best practices from awesome-readme, Best-README-Template, and the GitHub Blog.}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.36

Jun 23, 2026

0.8.35

Jun 22, 2026

0.8.34

Jun 21, 2026

0.8.33

Jun 21, 2026

0.8.32

Jun 21, 2026

0.8.31

Jun 21, 2026

This version

0.8.30

Jun 21, 2026

0.8.29

Jun 21, 2026

0.8.28

Jun 21, 2026

0.8.27

Jun 21, 2026

0.8.26

Jun 13, 2026

0.8.25

Jun 13, 2026

0.8.24

Jun 13, 2026

0.8.23

Jun 11, 2026

0.8.22

Jun 11, 2026

0.8.21

Jun 9, 2026

0.8.20

Jun 8, 2026

0.8.19

Jun 7, 2026

0.8.18

Jun 6, 2026

0.8.17

Jun 6, 2026

0.8.16

Jun 6, 2026

0.8.15

Jun 6, 2026

0.8.14

Jun 6, 2026

0.8.13

Jun 6, 2026

0.8.12

Jun 6, 2026

0.8.11

Jun 6, 2026

0.8.10

Jun 6, 2026

0.8.9

May 26, 2026

0.8.8

May 25, 2026

0.8.7

May 24, 2026

0.8.6

Jun 6, 2026

0.8.5

May 21, 2026

0.8.4

May 20, 2026

0.8.3

May 19, 2026

0.8.2

May 18, 2026

0.8.1

May 17, 2026

0.8.0

May 17, 2026

0.7.6

May 16, 2026

0.7.5

May 10, 2026

0.7.4

May 9, 2026

0.7.3

May 6, 2026

0.7.2

May 5, 2026

0.7.1

May 4, 2026

0.7.0

May 3, 2026

0.6.1

May 3, 2026

0.6.0

Apr 29, 2026

0.5.9

Apr 29, 2026

0.5.8

Apr 27, 2026

0.5.7.1

Apr 26, 2026

0.5.7

Apr 26, 2026

0.5.6.1

Apr 25, 2026

0.5.6

Apr 25, 2026

0.5.5

Apr 24, 2026

0.5.4

Apr 24, 2026

0.5.3

Apr 21, 2026

0.5.2

Apr 20, 2026

0.5.1

Apr 19, 2026

0.5.0

Apr 18, 2026

0.4.1

Mar 15, 2026

0.4.0

Feb 1, 2026

0.2.0

Jan 31, 2026

0.1.5

Jan 31, 2026

0.1.3

Jan 31, 2026

0.1.2

Jan 31, 2026

0.1.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_airlock-0.8.30.tar.gz (541.7 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_airlock-0.8.30-py3-none-any.whl (620.9 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file agent_airlock-0.8.30.tar.gz.

File metadata

Download URL: agent_airlock-0.8.30.tar.gz
Upload date: Jun 21, 2026
Size: 541.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_airlock-0.8.30.tar.gz
Algorithm	Hash digest
SHA256	`6e4f57af76ca4cc69310e27368f1e19a5ac48d14b34b1541c3632226047e8f8c`
MD5	`e41bf99e151556ab4f71174f327a0596`
BLAKE2b-256	`2fe811d4654beae7ab2508c0b2a372be78fca8b1a3009e3e670c17a554da87c0`

See more details on using hashes here.

File details

Details for the file agent_airlock-0.8.30-py3-none-any.whl.

File metadata

Download URL: agent_airlock-0.8.30-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 620.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_airlock-0.8.30-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a36677a4f874156453602cc39bed7dd546f67861c97b046ce9a73fd7e327e9f0`
MD5	`7bf579c8f65417f23e6a0606d0eee793`
BLAKE2b-256	`465974f63238110faf5bfe3f23ab709f7770ef00fce4098e271feaea607b206f`

See more details on using hashes here.

agent-airlock 0.8.30

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

A type-checker for AI tool calls

🎯 30-Second Quickstart

🧠 The Problem No One Talks About

The Hype

The Reality

✨ What You Get

📋 Table of Contents

🔥 Core Features

🔒 E2B Sandbox Execution

ModalBackend — Modal-hosted sandbox (v0.8.11+, issue #30)

📜 Security Policies

CAMOUFLAGE_RESISTANT — detector-independent injection defense (v0.8.6)

🪪 MCP server attestation (v0.8.10)

🧭 Behavioral sequence guard (v0.8.12)

🛑 Action-time contradiction gate (v0.8.15)

📊 Adversarial-negotiation regression harness (v0.8.17)

🔎 Privilege right-sizing — airlock-explain --unused-scopes (v0.8.13)

🩹 Skill-resistant trace redaction + watermark (v0.8.24)

✅ Fail-closed terminal-claim guard — no_false_success (v0.8.25)

💰 Cost Control

Per-model-tier budgets (v0.8.7)

🔐 PII & Secret Masking

Opt-in regional PII (pii_locales)

🌐 Network Airgap (V0.3.0)

💉 Framework Vaccination (V0.3.0)

⚡ Circuit Breaker (V0.4.0)

📈 OpenTelemetry Observability (V0.4.0)

🔌 Framework Compatibility

LangChain / LangGraph

OpenAI Agents SDK

PydanticAI

CrewAI

LlamaIndex

AutoGen

smolagents

Anthropic (Direct API)

Adapter-shipped vs example-only (honest split)

Complete Examples

⚡ FastMCP Integration

🏆 Why Not Enterprise Vendors?

📦 Installation

🛡️ OWASP Compliance

MCP-specific mapping

🏢 Used By

📊 Performance

📖 Documentation

Regulatory engagement

👤 About

🤝 Contributing

💖 Support

⭐ Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

🔎 Privilege right-sizing — `airlock-explain --unused-scopes` (v0.8.13)

✅ Fail-closed terminal-claim guard — `no_false_success` (v0.8.25)

Opt-in regional PII (`pii_locales`)