A type-checker for AI tool calls — strict argument validation, ghost-argument stripping, and self-healing retries for MCP servers and agent frameworks.
Project description
A type-checker for AI tool calls
Strict validation, ghost-argument stripping, and self-healing retries — one decorator, any agent or MCP server.
Test suite: 2,510 tests · Coverage: 83.42% · v0.8.5
Get Started in 30 Seconds · Why Airlock? · All Frameworks · Benchmark · Docs
┌────────────────────────────────────────────────────────────────┐
│ 🤖 AI Agent: "Let me help clean up disk space..." │
│ ↓ │
│ rm -rf / --no-preserve-root │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 🛡️ AIRLOCK: BLOCKED │ │
│ │ │ │
│ │ Reason: Matches denied pattern 'rm_*' │ │
│ │ Policy: STRICT_POLICY │ │
│ │ Fix: Use approved cleanup tools only │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
🎯 30-Second Quickstart
pip install agent-airlock
from agent_airlock import Airlock
@Airlock()
def transfer_funds(account: str, amount: int) -> dict:
return {"status": "transferred", "amount": amount}
# LLM sends amount="500" (string) → BLOCKED with fix_hint
# LLM sends force=True (invented arg) → STRIPPED silently
# LLM sends amount=500 (correct) → EXECUTED safely
That's it. Your function now has ghost argument stripping, strict type validation, and self-healing errors.
🧠 The Problem No One Talks About
Enterprise solutions exist: Prompt Security ($50K/year), Pangea (proxy your data), Cisco ("coming soon").
We built the open-source alternative. One decorator. No vendor lock-in. Your data never leaves your infrastructure.
✨ What You Get
|
Ghost Args Strip LLM-invented params |
Strict Types No silent coercion |
Self-Healing LLM-friendly errors |
E2B Sandbox Isolated execution |
RBAC Role-based access |
PII Mask Auto-redact secrets |
|
Network Guard Block data exfiltration |
Path Validation CVE-resistant traversal |
Circuit Breaker Fault tolerance |
OpenTelemetry Enterprise observability |
Cost Tracking Budget limits |
Vaccination Auto-secure frameworks |
📋 Table of Contents
Click to expand full navigation
🔥 Core Features
🔒 E2B Sandbox Execution
from agent_airlock import Airlock, STRICT_POLICY
@Airlock(sandbox=True, sandbox_required=True, policy=STRICT_POLICY)
def execute_code(code: str) -> str:
"""Runs in an E2B Firecracker MicroVM. Not on your machine."""
exec(code)
return "executed"
| Feature | Value |
|---|---|
| Boot time | ~125ms cold, <200ms warm |
| Isolation | Firecracker MicroVM |
| Fallback | sandbox_required=True blocks local execution |
Air-gapped / on-prem? DockerBackend is the supported alternative
— cap_drop=["ALL"], no-new-privileges, network_mode="none",
timeout enforced, opt-in pytest -m docker integration tests. See
docs/sandbox/docker.md.
ModalBackend — Modal-hosted sandbox (v0.8.11+, issue #30)
Already running the rest of your agent on Modal?
ModalBackend lets you keep airlocked tool execution on the same
substrate instead of mixing E2B and Modal billing / observability.
pip install "agent-airlock[modal]"
from agent_airlock import Airlock, STRICT_POLICY, AirlockConfig
from agent_airlock.sandbox_backend import ModalBackend
backend = ModalBackend(
app_name="my-airlock-sandbox",
image_ref="python:3.11-slim",
cpu=0.5,
memory_mb=512,
timeout_s=30,
# network_policy=None → block_network=True (fail-closed default)
)
@Airlock(sandbox=True, sandbox_required=True, policy=STRICT_POLICY,
config=AirlockConfig(sandbox_backend=backend))
def execute_code(code: str) -> str:
exec(code)
return "executed"
Isolation model — read before you reach for cap_drop. Modal
sandboxes run under gVisor (kernel-syscall filtering), not under
Docker-style capability dropping. The Modal Python SDK does not
expose cap_drop / cap_add / seccomp / no-new-privileges —
there is no equivalent knob to map. If your threat model needs
Linux-capability dropping at the container layer, keep using
DockerBackend. The network posture is configurable: ModalBackend
defaults to block_network=True (deny-by-default), and a supplied
NetworkPolicy maps to Modal's block_network flag (allow_egress=False
→ blocked, True → allowed). Hostname allowlists in NetworkPolicy.allowed_hosts
do not forward to Modal (their API is CIDR-only); the backend logs
a structlog warning and the operator is expected to re-state hostname
constraints at the Airlock policy layer.
ModalBackend is opt-in only — it is NOT added to the
get_default_backend() priority chain (E2B → Docker → Local stays the
default flow). Existing callers see no behavior change.
📜 Security Policies
| Preset | Use case | Key posture |
|---|---|---|
PERMISSIVE_POLICY |
Dev / sandbox | No restrictions |
STRICT_POLICY |
Prod | Rate-limited, requires agent identity, denies dangerous capabilities |
READ_ONLY_POLICY |
Analytics / RAG | read_* / get_* / list_* / search_* only |
BUSINESS_HOURS_POLICY |
Compliance windows | delete_* / drop_* / *_production only 09:00–17:00 |
CAMOUFLAGE_RESISTANT_POLICY (v0.8.6) |
Detector-independent defense vs. domain-camouflaged injection | Deny-by-default allowlist, ghost-arg BLOCK, output cap, per-call reauthorization |
from agent_airlock import (
PERMISSIVE_POLICY,
STRICT_POLICY,
READ_ONLY_POLICY,
BUSINESS_HOURS_POLICY,
CAMOUFLAGE_RESISTANT_POLICY, # v0.8.6
)
# Or build your own:
from agent_airlock import SecurityPolicy
MY_POLICY = SecurityPolicy(
allowed_tools=["read_*", "query_*"],
denied_tools=["delete_*", "drop_*", "rm_*"],
rate_limits={"*": "1000/hour", "write_*": "100/hour"},
time_restrictions={"deploy_*": "09:00-17:00"},
)
CAMOUFLAGE_RESISTANT — detector-independent injection defense (v0.8.6)
arXiv:2605.22001 ("Blind Spots in the Guard", Pai, May 2026) shows that production injection detectors — Llama Guard 3 included — drop to IDR = 0.000 on payloads that mimic the target document's domain vocabulary and authority structure. Per the paper, detection rates collapse from 93.8% to 9.7% on Llama 3.1 8B and from 100% to 55.6% on Gemini 2.0 Flash.
CAMOUFLAGE_RESISTANT_POLICY does not rely on payload-content
signatures at all. It blocks at four structural seams an attacker has
to ride regardless of phrasing:
- Deny-by-default tool allowlist. Empty
allowed_toolsmeans nothing is callable; deployments opt every tool in by name. A camouflaged directive targeting an unlisted tool is blocked on allowlist grounds without ever invoking a detector. - Ghost-argument BLOCK. A camouflaged directive cannot smuggle undeclared parameters past validation.
- Hard output cap + sanitization. Tool output that re-enters the model context is truncated and PII/secret-masked so a camouflaged directive embedded in tool output can't carry into a downstream agent at full length.
- Per-call reauthorization (debate-amplification guard). Once a
tool's output has flowed back into the model, any reinvocation
requires an explicit
context.authorize_once(tool)grant from the harness — breaking the multi-agent fan-out path the paper identifies.
from agent_airlock import Airlock, apply_camouflage_resistant
bundle = apply_camouflage_resistant(allowed_tools=["read_file", "search"])
@Airlock(config=bundle.config, policy=bundle.policy)
def read_file(path: str) -> str:
...
apply_camouflage_resistant() composes the matching AirlockConfig
(unknown-args BLOCK, sanitization on, output cap 4000 chars) with a
SecurityPolicy carrying your explicit allowlist. The preset is
deliberately incomplete on its own — the config-level knobs and the
policy-level knobs span two seams, so the factory returns both as a
CamouflageResistantBundle.
Running an MCP server with STDIO transport? Also wire the Ox MCP STDIO sanitizer via
stdio_guard_ox_defaults()— it blocks the entire CVE-2026-30616 class (shell metacharacter injection, non-allowlisted binaries, Trojan-Source RTL overrides, and inline-code flags) beforesubprocess.Popen.
🪪 MCP server attestation (v0.8.10)
arXiv:2605.24248 ("Attested Tool-Server Admission", Metere, May 2026) calls out a gap MCP itself does not close: the protocol standardises message exchange between LLM agents and tool servers but says nothing about trust. Anybody who can answer on the wire can declare themselves a tool server.
mcp_attested_admission_defaults() is a deny-by-default opt-in preset
that closes the gap host-side, mirroring the paper's three additive
mechanisms:
- Offline-signed clearance assertion. Before any tool from an MCP
server is dispatched, the host fetches a JWS-compact clearance from
{server_url}/.well-known/mcp-clearance(path is configurable) and verifies its signature against an operator-pinned trust root. The trust root is supplied toAttestedAdmissionConfigat process startup — never network-fetched on the hot path. - Deny-by-default per-server tool allowlist. Admitting a server
is not the same as trusting its every tool. The verified clearance
carries an explicit list of tool names the host will permit;
everything else is denied. The
subclaim is matched against the server identity the host is about to dispatch to (so a stolen clearance from server A can't admit a tool call to server B). - Flavor-gated enforcement.
ENFORCE(default) hard-denies on missing / invalid / expired clearance;WARNlogs and admits — the staged turn-up an operator wants when introducing the gate against real traffic.
Every admission decision emits a
ReceiptVerdict on the
guard="mcp_attested_admission" channel, so the existing airlock attest
DSSE pipeline picks decisions up unchanged — this preset does not
invent a new log.
from agent_airlock.mcp_proxy_guard import MCPProxyConfig, MCPProxyGuard
from agent_airlock.mcp_spec.attested_admission import TrustRoot
from agent_airlock.policy_presets import mcp_attested_admission_defaults
# Operator pins the trust root at startup. Never fetched at runtime.
with open("/etc/airlock/mcp-clearance-root.pem", "rb") as fh:
pinned_pem = fh.read()
cfg = mcp_attested_admission_defaults(
trust_root=TrustRoot(key_id="ops-2026Q2", ed25519_pem=pinned_pem),
enforcement_mode="ENFORCE", # deny-by-default
max_clearance_age_days=30,
)
guard = MCPProxyGuard(MCPProxyConfig(attested_admission=cfg))
decision = guard.audit_tool_admission(
server_url="https://mcp.example.com",
server_id="srv-alpha", # expected `sub` claim
tool_name="read",
)
if not decision.admitted:
raise RuntimeError(decision.reason)
Signature verification needs the [attested] extra (pulls in
cryptography for offline Ed25519 / RSA-PSS / JWKS verification); the
base install stays zero-runtime-dep.
Install with
pip install "agent-airlock[attested]". Opt-in only — existing callers that don't setattested_admissionget exactly v0.8.9 behavior.
🧭 Behavioral sequence guard (v0.8.12)
Watches the ordered stream of tool calls in a session and flags divergence from a declared expected order — not the model's stated reasoning trace.
arXiv:2605.27901 ("The Fragility of Chain-of-Thought Monitoring", Onyame, Zhou, Thopalli, Kailkhura & Agarwal, May 2026) reports an average 95.9% CoT unfaithfulness across 8B–120B-parameter models — including answer-switching, post-hoc rationalisation, and procedural exploitation of hints. Trusting the model's stated reasoning to detect misbehavior is therefore not viable. Trusting its behavior — the sequence of tools it actually invokes — is.
SequenceGuard is an opt-in field on SecurityPolicy that runs in
the @Airlock seam right after the standard policy check, in two
modes:
DECLARED mode — operator supplies a permitted-transition DAG.
Any transition not in the DAG is a SequenceViolation. Deny-by-default.
from agent_airlock import Airlock, SecurityPolicy
from agent_airlock.sequence_guard import SequenceGuard, ENTRY_SENTINEL
policy = SecurityPolicy(
sequence_guard=SequenceGuard(
mode="declared",
action="block", # or "warn"
dag={
ENTRY_SENTINEL: {"read"}, # only `read` may start a session
"read": {"read", "summarize"}, # after read, either re-read or summarize
"summarize": {"send"}, # after summarize, only send
"send": set(), # send is terminal
},
),
)
BASELINE mode — guard maintains a per-session-key Markov transition
profile in a local JSON file (no cloud, no PII — only tool names and
SHA-256 shape hashes of (arg types, kwarg names+types), never
argument values) and flags transitions with observed
P(curr | prev) < threshold once the sample size from prev reaches
min_baseline_samples.
from pathlib import Path
from agent_airlock.sequence_guard import SequenceGuard
policy = SecurityPolicy(
sequence_guard=SequenceGuard(
mode="baseline",
baseline_path=Path("/var/lib/airlock/sequence-baseline.json"),
low_probability_threshold=0.05, # flag the bottom 5%
min_baseline_samples=50, # don't flag until 50 obs from `prev`
),
)
Every flagged transition emits OTel span attributes on the current
span (airlock.sequence_guard.mode, .from_tool, .to_tool,
.session_key, .observed_probability) via the existing
observability provider — telemetry failures are swallowed so they
cannot break enforcement.
Not AnomalyDetector (that's rate / endpoint-diversity /
error-rate / consecutive-blocked over sliding windows). SequenceGuard
is per-transition ORDER signal. Run both for layered coverage. Not
a chain-of-thought monitor — by construction.
Strictly opt-in. The new
SecurityPolicy.sequence_guardfield defaults toNone; callers that don't set it get exactly v0.8.11 behavior. Zero new runtime deps — Pydantic-only core stays intact.
🛑 Action-time contradiction gate (v0.8.15)
arXiv:2605.27157 ("Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs", Yu et al., 2026) shows that LLMs readily acknowledge contradictory evidence in their reasoning trace yet "this awareness fails to constrain their final recommendations". The deficit is at action selection — single-turn diagnostics overestimate RAG safety, and detection alone is not a control.
ActionContradictionGate is an opt-in policy hook that wraps three
pluggable detectors (any one trips) and a privileged-sink glob
set. When a detector trips AND the dispatched tool matches a
privileged sink AND the harness has not issued an explicit allow,
the gate blocks the call (or warns, depending on action=).
The explicit-allow primitive is not new — the gate reuses the
existing AirlockContext.authorize_once(tool_name) (introduced for
the v0.8.6 reauth flow). Same one-shot grant, same semantics. After
a one-shot is consumed the gate re-locks — the harness must mint
a fresh authorize_once for each privileged action.
import re
from agent_airlock import Airlock, SecurityPolicy
from agent_airlock.action_contradiction_gate import ActionContradictionGate
policy = SecurityPolicy(
action_contradiction_gate=ActionContradictionGate(
# Detector 1: a boolean flag the RAG pipeline flips on after
# it sees an evidence-vs-claim conflict the agent discussed.
signal_field_key="evidence_contradiction",
# Detector 2: pluggable regex against the SAME key when its
# value is a string (operator-controlled marker — never the
# model's full reasoning trace).
marker_regex=re.compile(r"contradict|conflict|disagree", re.I),
# Detector 3: fully pluggable callable; receives the context.
# predicate=lambda ctx: ctx.metadata.get("conflict_count", 0) > 1,
# Default privileged sinks: send_* / export_* / commit_* /
# transfer_* / delete_* + the v0.8.14 outbound-integration set.
# Operators can narrow via `privileged_sinks=(...)`.
action="block", # or "warn" for staged turn-up
),
)
Off-by-default invariant. SecurityPolicy.action_contradiction_gate
defaults to None; non-RAG flows pay zero false-positive tax
(no detector runs, no log lines, no metadata reads). Even when wired,
the gate is inert until at least one detector slot is configured
— so a partial roll-out (gate attached but detectors flipped off)
admits everything.
Not a chain-of-thought monitor. The gate reads operator- controlled signals only (a metadata field, an operator regex, an operator predicate). It never reads the model's own claim that it has or has not noticed a contradiction — the paper's whole point is that those claims do not gate behavior.
Not sequence_guard (v0.8.12) — that flags unusual call ORDER.
Not reauth_on_untrusted_reinvocation (v0.8.6) — that's
count-driven on a per-tool counter. This gate is signal-driven and
targets a specific privileged-sink glob set. They compose; run all
three for layered coverage.
Strictly opt-in. Zero new runtime deps — Pydantic-only core stays intact. The new
SecurityPolicy.action_contradiction_gatefield defaults toNone; callers that don't set it get exactly v0.8.14 behavior.
📊 Adversarial-negotiation regression harness (v0.8.17)
A deterministic harness that measures what the deny-by-default governance layer does to a fixed set of adversarial buyer-seller negotiation actions — and reports two metrics named to line up with an external published baseline so the numbers can sit side by side.
python -m agent_airlock.cli.negotiation_bench --report markdown
Each scenario carries a concrete, checkable unsafe action and runs
twice — baseline (no airlock, the unsafe event lands) and
governed (the same action through the real @Airlock
intercept-before-execute path, no policy-layer mocking). Three
unsafe-action classes each exercise a different real interception
mechanism: price-below-floor → Pydantic strict-validation,
secret-leak → the output sanitizer, transfer-outside-policy →
deny-by-default SecurityPolicy. Benign deals are included to confirm
governance does not over-block.
| source | unsafe_execution_rate (base → governed) | valid_task_success_rate (base → governed) |
|---|---|---|
| agent-airlock (this harness) | 100% → 0% | 43% → 100% |
| OCL (external, live LLMs, arXiv:2606.04306) | 88% → ~0% | 12% → 96% |
The OCL row is an external result, not agent-airlock's. It was measured on live frontier LLM agents in AgenticPay-adapted negotiation (OCL, arXiv:2606.04306; AgenticPay, arXiv:2602.06008) and is reproduced here only for directional comparison — both put governance at the execution boundary. It is not the same experiment: agent-airlock is a deterministic execution-boundary validator, not an LLM, and this harness does not call a model. The agent-airlock rows are a property of the policy layer under a worst-case scripted adversary, exercised through the real
@Airlockpath.
The harness doubles as a regression gate: --fail-if-governed-unsafe
exits non-zero if the governed unsafe_execution_rate ever rises above
zero, so a future change that weakens the policy layer fails CI. Zero
new runtime deps; fully deterministic (no randomness, no network, no
model call).
🔎 Privilege right-sizing — airlock-explain --unused-scopes (v0.8.13)
A read-only CLI that surfaces over-permissioning: it diffs the
SecurityPolicy's granted tool scopes against the tools the agent
actually called (from an OTLP export OR a native audit JSONL),
per AgentIdentity, and prints the dead-weight set plus a suggested
tightened allow-list.
# Install the v0.8.13 wheel; airlock-explain becomes available
pip install "agent-airlock>=0.8.13"
# Diff granted vs used; print a table
airlock-explain --unused-scopes \
--policy ./security-policy.toml \
--trace ./agent.audit.jsonl
# Same, machine-readable, plus a proposed tightened policy preview
airlock-explain --unused-scopes \
--policy ./security-policy.toml \
--trace ./otel-export.json \
--format json \
--suggest-policy
Observability-only. This command never mutates the
SecurityPolicy, never writes the policy file, and never
auto-applies the suggestion. The deny-by-default posture is
unchanged — the right-size CLI is a review aid, not an enforcement
primitive. The --suggest-policy output is intentionally a stdout
preview so a human reviews the tightened allow-list before adopting
it by hand.
Trace formats (auto-detected by inspecting the file head):
- Audit JSONL — the format
AuditLoggeralready emits. One JSON object per line, withtool_name/agent_id/blocked. Blocked calls are excluded from the "actually called" set — a blocked call is not an exercise of a granted scope. - OTLP JSON — the format
opentelemetry-exporter-otlpwrites. Spannameis the tool name;attributes.agent_idkeys the per- agent diff. If a span carriesairlock.blocked=trueit is skipped, same as JSONL.
Diff semantics. The matcher is fnmatch — the same glob semantics
SecurityPolicy.check_tool_allowed uses internally, so the suggested
tightened allow-list admits exactly the tools the agent was observed
calling (no surprises at adoption time). Denied-list patterns are
forwarded unchanged to the suggestion: denials are intent, not
usage data.
Strictly observability. No new runtime deps. The new console-script entry
airlock-explainis the project's first installable CLI; existingpython -m agent_airlock.cli.<name>invocations are unaffected.
🩹 Skill-resistant trace redaction + watermark (v0.8.24)
Why traces are an extraction surface: an agent's emitted trace/receipt is a distillation target, not just an audit artifact. A trace that records the tuned thresholds a policy fired on, the exact tool-call arguments, and the recovered intermediate formulas/strategies hands a competitor the recipe — enough to clone the behaviour without paying for the search that found it. The verifier, by contrast, needs only the evidence (the gate ran / the policy fired / pass-fail), never the recipe. This is the RedAct-style threat model — a composition of published behavioural-watermarking work (Agent Guide, arXiv:2504.05871; CoTGuard, arXiv:2505.19405; Distilling the Thought, arXiv:2601.05144). agent-airlock does not reproduce any paper's benchmark.
TraceRedactionPolicy (opt-in, OFF by default for backward compat, ON
under STRICT_POLICY) runs at the non-local sink (e.g. the OTel exporter):
it (a) localizes protected fields with a configurable field-classifier
(tuned thresholds, tool-call args, recovered formulas/strategies), (b)
rewrites them to keep verifier-critical evidence while dropping the recipe,
and (c) embeds a per-tenant behavioural watermark so a leaked trace is
provably yours. Detect it with airlock trace verify-watermark <trace.json>
(cryptographic keyed-HMAC match → high true-detection, low false-alarm); add
--redaction-report to see what was localized / rewritten / preserved.
Stdlib-only — no new runtime dependency.
from agent_airlock import TraceRedactionPolicy, trace_redact, verify_watermark
pol = TraceRedactionPolicy(enabled=True, tenant_id="acme-co", watermark_secret="...")
redacted, report = trace_redact(trace, pol) # tuned_threshold → evidence stub; recipe dropped
assert verify_watermark(redacted, pol).detected # provably yours
✅ Fail-closed terminal-claim guard — no_false_success (v0.8.25)
An honest stall is recoverable; a confident wrong done is not. The
dominant failure mode of unattended long-horizon agents isn't crashing — it's
confidently reporting success they never verified
(Goal-Autopilot, arXiv:2606.11688). The
no_false_success preset enforces that paper's No-False-Success floor: a
terminal/done claim is admitted only if a named, falsifiable check actually
executed and passed THIS run. No receipt, a failed check, or a forged/replayed
receipt → the guard fails closed to a recoverable honest stall (run the
named check and retry), never a fabricated success.
Forgery resistance is structural: the guard mints a per-run token and only
trusts a receipt it stamped by executing the check this run — a receipt that's
merely present (hand-built, or replayed from a prior run) is rejected. Opt in
per-agent with AirlockConfig(require_done_receipt=True) (or
require_done_receipt = true under [airlock] in airlock.toml); OFF by
default. Stdlib-only — no new runtime dependency.
from agent_airlock import no_false_success_defaults, NoFalseSuccessStall
preset = no_false_success_defaults({"tests_green": run_pytest}) # falsifiable check
preset["guard"].run("tests_green") # actually execute the check this run
preset["check"]("tests_green") # raises NoFalseSuccessStall unless it passed
💰 Cost Control
A runaway agent can burn $500 in API costs before you notice.
from agent_airlock import Airlock, AirlockConfig
config = AirlockConfig(
max_output_chars=5000, # Truncate before token explosion
max_output_tokens=2000, # Hard limit on response size
)
@Airlock(config=config)
def query_logs(query: str) -> str:
return massive_log_query(query) # 10MB → 5KB
ROI: 10MB logs = ~2.5M tokens = $25/response. Truncated = ~1.25K tokens = $0.01. 99.96% savings.
Per-model-tier budgets (v0.8.7)
The flat max_output_* caps above apply uniformly to every call. ModelTierBudget caps per-call cost and output tokens per model tier label (e.g. "frontier" / "mid" / "small"), evaluated before the tool runs. Untagged calls fall back to a configurable strict_tier (deny-by-default — the cheapest tier).
from agent_airlock import (
Airlock, ModelTierBudget, SecurityPolicy, TierBudget,
)
policy = SecurityPolicy(
model_tier_budget=ModelTierBudget(
tiers={
"frontier": TierBudget(max_cost_cents=50, max_output_tokens=4000),
"mid": TierBudget(max_cost_cents=10, max_output_tokens=2000),
"small": TierBudget(max_cost_cents=2, max_output_tokens=1000),
},
strict_tier="small", # untagged → cheapest tier (deny-by-default)
),
)
@Airlock(policy=policy, return_dict=True)
def call_model(prompt: str, **_extra):
return run_my_router(prompt)
# The router tags each call. Airlock blocks before the model fires.
call_model("Draft a tweet", _airlock_tier="small", _airlock_input_tokens=50)
call_model("Deep analysis", _airlock_tier="frontier", _airlock_input_tokens=200_000)
# → AIRLOCK_BLOCK: Tier 'frontier' budget exceeded (worst-case 66¢ > cap 50¢)
Routing logic stays in the user's router. Three tagging routes are supported:
_airlock_tierkwarg — stripped before the tool sees it.context.metadata["airlock_tier"]— set on a contextvar-storedAirlockContextby the router's session middleware.tier_resolvercallback —ModelTierBudget(tier_resolver=fn)wherefn(model_id: str) -> tier_labellives in the caller's code. Airlock invokes the callback whencontext.metadata["model_id"]is set; it carries no vendor-specific model→tier table.
After execution, actual vs estimated cost is reconciled into the global
CostTracker (observability — never blocks). See
examples/model_tier_budget.py for
all four patterns including composition with allow/deny lists.
A ready-to-use strict_tier_budget_policy() preset returns a
SecurityPolicy seeded with the table above.
🔐 PII & Secret Masking
config = AirlockConfig(
mask_pii=True, # SSN, credit cards, phones, emails
mask_secrets=True, # API keys, passwords, JWTs
)
@Airlock(config=config)
def get_user(user_id: str) -> dict:
return db.users.find_one({"id": user_id})
# LLM sees: {"name": "John", "ssn": "[REDACTED]", "api_key": "sk-...XXXX"}
12 PII types detected · 4 masking strategies · Zero data leakage
Opt-in regional PII (pii_locales)
Aadhaar / PAN / UPI / IFSC have always shipped as SensitiveDataType members,
but are not added to the default mask_pii=True set — to keep the surface
zero-dep and US-shaped by default. v0.8.9 adds a pii_locales opt-in
that pulls them in and tightens detection:
config = AirlockConfig(
mask_pii=True,
pii_locales=["in"], # opt in to India-locale detection
)
@Airlock(config=config)
def lookup(query: str) -> str:
return (
"User: राम कुमार, "
"Aadhaar: 234567890124, " # → "23********24" (PARTIAL)
"PAN: ABCDE1234F, " # → "AB******4F" (PARTIAL)
"phone: 555-123-4567" # still masked by existing PHONE regex
)
Two things activate when "in" in pii_locales:
- Aadhaar Verhoeff checksum gate — the existing Aadhaar regex is permissive (any 12-digit number starting 2-9 matches). With the opt-in, each match must also pass the UIDAI Verhoeff checksum, cutting the FP rate ~10x on random IDs / phone numbers.
- Devanagari personal-name detection —
PERSONAL_NAME_DEVANAGARIruns against the Unicode blockU+0900–U+097F, with a small allowlist of common Hindi greetings / pronouns / interrogatives to keep ordinary prose from being masked. Conservative heuristic — production callers who need precise extraction should layer NER on top.
The flag is additive and reversible — pii_locales=[] (the default)
preserves the prior behavior bit-for-bit.
🌐 Network Airgap (V0.3.0)
Block data exfiltration during tool execution:
from agent_airlock import network_airgap, NO_NETWORK_POLICY
# Block ALL network access
with network_airgap(NO_NETWORK_POLICY):
result = untrusted_tool() # Any socket call → NetworkBlockedError
# Or allow specific hosts only
from agent_airlock import NetworkPolicy
INTERNAL_ONLY = NetworkPolicy(
allow_egress=True,
allowed_hosts=["api.internal.com", "*.company.local"],
allowed_ports=[443],
)
💉 Framework Vaccination (V0.3.0)
Secure existing code without changing a single line:
from agent_airlock import vaccinate, STRICT_POLICY
# Before: Your existing LangChain tools are unprotected
vaccinate("langchain", policy=STRICT_POLICY)
# After: ALL @tool decorators now include Airlock security
# No code changes required!
Supported: LangChain, OpenAI Agents SDK, PydanticAI, CrewAI
⚡ Circuit Breaker (V0.4.0)
Prevent cascading failures with fault tolerance:
from agent_airlock import CircuitBreaker, AGGRESSIVE_BREAKER
breaker = CircuitBreaker("external_api", config=AGGRESSIVE_BREAKER)
@breaker
def call_external_api(query: str) -> dict:
return external_service.query(query)
# After 5 failures → circuit OPENS → fast-fails for 30s
# Then HALF_OPEN → allows 1 test request → recovers or reopens
📈 OpenTelemetry Observability (V0.4.0)
Enterprise-grade monitoring:
from agent_airlock import configure_observability, observe
configure_observability(
service_name="my-agent",
otlp_endpoint="http://otel-collector:4317",
)
@observe(name="critical_operation")
def process_data(data: dict) -> dict:
# Automatic span creation, metrics, and audit logging
return transform(data)
🔌 Framework Compatibility
The Golden Rule:
@Airlockmust be closest to the function definition.
@framework_decorator # ← Framework sees secured function
@Airlock() # ← Security layer (innermost)
def my_function(): # ← Your code
More frameworks: LlamaIndex, AutoGen, smolagents, Anthropic
LlamaIndex
from llama_index.core.tools import FunctionTool
from agent_airlock import Airlock
@Airlock()
def calculate(expression: str) -> int:
return eval(expression, {"__builtins__": {}})
calc_tool = FunctionTool.from_defaults(fn=calculate)
AutoGen
from autogen import ConversableAgent
from agent_airlock import Airlock
@Airlock()
def analyze_data(dataset: str) -> str:
return f"Analysis of {dataset}: mean=42.5"
assistant = ConversableAgent(name="analyst", llm_config={"model": "gpt-4o"})
assistant.register_for_llm()(analyze_data)
smolagents
from smolagents import tool
from agent_airlock import Airlock
@tool
@Airlock(sandbox=True)
def run_code(code: str) -> str:
"""Execute in E2B sandbox."""
exec(code)
return "Executed"
Anthropic (Direct API)
from agent_airlock import Airlock
@Airlock()
def get_weather(city: str) -> str:
return f"Weather in {city}: 22°C"
# Use in tool handler
def handle_tool_call(name, inputs):
if name == "get_weather":
return get_weather(**inputs) # Airlock validates
Adapter-shipped vs example-only (honest split)
Both paths use the same
@Airlock()decorator placement. "Adapter-shipped" means there's a dedicatedsrc/agent_airlock/integrations/<framework>.pymodule with framework-specific glue (signature preservation, tool registry rewrites, request-shape adapters). "Example-only" means the decorator is compatible out of the box — no extra adapter required.
Adapter-shipped (11): LangChain (integrations/langchain.py),
LangGraph (integrations/langgraph_toolnode_compat.py),
OpenAI Agents SDK (integrations/openai_guardrails.py),
Anthropic Messages API (integrations/anthropic.py),
Anthropic Claude Agent SDK (integrations/anthropic_claude_agent_sdk.py, v0.6.1+),
smolagents (integrations/smolagents_wrapper.py),
Gemini 3 Agent Mode (integrations/gemini3_tool_shape_adapter.py),
GPT-5.5 (integrations/gpt5_5_tool_shape_adapter.py),
PydanticAI (integrations/pydantic_ai.py, v0.7.1+),
CrewAI (integrations/crewai.py, v0.7.2+),
FastMCP (mcp.py).
Example-only (2): AutoGen, LlamaIndex —
decorator-compatible without an adapter; see examples/.
Complete Examples
| Framework | Path | Surface |
|---|---|---|
| LangChain | adapter · example | @tool, AgentExecutor |
| LangGraph | adapter · example | StateGraph, ToolNode |
| OpenAI Agents | adapter · example | Handoffs, manager pattern |
| Anthropic API | adapter · example | Direct Messages API |
| Claude Agent SDK | adapter · doc | wrap_agent(agent, policy=...) |
| smolagents | adapter · example | CodeAgent, E2B |
| Gemini 3 | adapter | function_call carrier + thought_signature redaction |
| GPT-5.5 | adapter | gpt_5_5_agent_defaults preset |
| FastMCP | adapter · example | @secure_tool decorator |
| PydanticAI | adapter · doc · example | wrap_agent(agent, policy=...) + output_validate hook |
| CrewAI | adapter · doc · example | wrap_crew(crew, policy=...) + task-level tool overrides |
| LlamaIndex | example only | ReActAgent |
| AutoGen | example only | ConversableAgent |
⚡ FastMCP Integration
from fastmcp import FastMCP
from agent_airlock.mcp import secure_tool, STRICT_POLICY
mcp = FastMCP("production-server")
@secure_tool(mcp, policy=STRICT_POLICY)
def delete_user(user_id: str) -> dict:
"""One decorator: MCP registration + Airlock protection."""
return db.users.delete(user_id)
🏆 Why Not Enterprise Vendors?
| Prompt Security | Pangea | Agent-Airlock | |
|---|---|---|---|
| Pricing | $50K+/year | Enterprise | Free forever |
| Integration | Proxy gateway | Proxy gateway | One decorator |
| Self-Healing | ❌ | ❌ | ✅ |
| E2B Sandboxing | ❌ | ❌ | ✅ Native |
| Your Data | Their servers | Their servers | Never leaves you |
| Source Code | Closed | Closed | MIT Licensed |
We're not anti-enterprise. We're anti-gatekeeping. Security for AI agents shouldn't require a procurement process.
📦 Installation
# Core (validation + policies + sanitization)
pip install agent-airlock
# With E2B sandbox support
pip install agent-airlock[sandbox]
# With FastMCP integration
pip install agent-airlock[mcp]
# Everything
pip install agent-airlock[all]
# E2B key for sandbox execution
export E2B_API_KEY="your-key-here"
🛡️ OWASP Compliance
Agent-Airlock maps to the OWASP Top 10 for Agentic Applications (2026) — the agentic-era successor to the old LLM Top 10. Coverage is reported honestly: Full means the primitive ships and blocks the class in tests; Partial means agent-airlock covers the runtime leg but something upstream (client UI, IAM, training data) is out of scope; Monitor-only means we surface the signal but do not actually prevent the risk.
| Risk | Implemented in agent-airlock | Module / preset | Coverage |
|---|---|---|---|
| ASI01 Agent Goal Hijack | Pydantic strict validation + ghost-arg rejection + UnknownArgsMode.BLOCK |
validator, unknown_args, core |
Partial |
| ASI02 Tool Misuse and Exploitation | Deny-by-default SecurityPolicy, RBAC, rate limits, SafePath / SafeURL, Flowise Function()/eval token ban (CVE-2025-59528), MCPwn destructive-auth check (CVE-2026-33032), Mobile MCP intent-URL guard (CVE-2026-35394) |
policy, safe_types, filesystem, network, policy_presets.flowise_cve_2025_59528_defaults, policy_presets.mcpwn_cve_2026_33032_defaults, policy_presets.mobile_mcp_intent_guard_2026_05 |
Full |
| ASI03 Identity and Privilege Abuse | AgentIdentity, MCPProxyGuard token-passthrough prevention, CredentialScope, OAuth-app audit (Vercel 2026-04-19), MCP Attested Tool-Server Admission (arXiv:2605.24248) |
policy, mcp_proxy_guard, mcp_spec.oauth_audit, mcp_spec.attested_admission, policy_presets.oauth_audit_vercel_2026_defaults, policy_presets.mcp_attested_admission_defaults |
Partial |
| ASI04 Agentic Supply Chain Vulnerabilities | Ox MCP STDIO sanitizer + CVE regression suite (11+ CVEs tracked) + session-snapshot integrity guard + spawn-time MCP config pin (CVE-2026-30615, policy_presets.mcp_config_pin) |
mcp_spec.stdio_guard, mcp_spec.session_guard, mcp_spec.zero_click_config_guard, policy_presets.stdio_guard_ox_defaults, policy_presets.mcp_config_pin, tests/cves/ |
Partial |
| ASI05 Unexpected Code Execution (RCE) | E2B Firecracker sandbox, pluggable SandboxBackend, capability gating for PROCESS_SHELL, Flowise eval-token ban (CVE-2025-59528) |
sandbox, sandbox_backend, capabilities, policy_presets.flowise_cve_2025_59528_defaults |
Full |
| ASI06 Memory & Context Poisoning | AirlockContext contextvars isolation, ConversationConstraints budget caps, audit logging |
context, conversation, sanitizer |
Partial |
| ASI07 Insecure Inter-Agent Communication | A2A middleware Pydantic strict validation, method allow-lists | a2a |
Partial |
| ASI08 Cascading Failures | CircuitBreaker, RetryPolicy, token-bucket rate limits |
circuit_breaker, retry, policy |
Full |
| ASI09 Human-Agent Trust Exploitation | Honeypot deception, audit-log attribution, structured fix_hints |
honeypot, audit_otel |
Partial |
| ASI10 Rogue Agents | Audit telemetry + anomaly detector; no quarantine primitive | observability, anomaly |
Monitor-only |
MCP-specific mapping
The OWASP MCP Top 10 (2026 beta)
is covered end-to-end by the OWASP_MCP_TOP_10_2026 policy preset:
| MCP risk | Ships in agent-airlock |
|---|---|
| MCP01 Token Mismanagement | MCPProxyGuard rejects passthrough headers, enforces audience |
| MCP02 Excessive Permissions | SecurityPolicy + CredentialScope |
| MCP03 Tool Poisoning | ghost-arg rejection + SafePath/SafeURL |
| MCP04 Supply Chain | stdio_guard_ox_defaults() (Ox 2026-04-16 advisory) |
| MCP05 Command Injection | stdio_guard shell-metachar + deny-pattern rules |
| MCP07 Insufficient Authentication | OAuth 2.1 + PKCE S256 helpers in mcp_spec.oauth |
| MCP10 Context Oversharing | PII/secret sanitizer + workspace-scoped config |
Use it directly:
from agent_airlock import Airlock
from agent_airlock.policy_presets import owasp_mcp_top_10_2026_policy
@Airlock(policy=owasp_mcp_top_10_2026_policy())
def my_mcp_tool(...):
...
Ox Security STDIO advisory (2026-04-16, CVE-2026-30616): see
docs/cves/index.md#cve-2026-30616and thestdio_guard_ox_defaults()preset above. agent-airlock blocks 3 of 4 Ox attack classes at the runtime seam.
🏢 Used By
Agent-Airlock secures AI agent systems in production:
| Project | Use Case |
|---|---|
| FerrumDeck | AgentOps control plane — deny-by-default tool execution |
| Mnemo | MCP-native memory database — secure tool call validation |
Using Agent-Airlock in production? Open a PR to add your project!
📊 Performance
Test count and coverage are published by the TEST-BADGE block at the top of this file, regenerated from pytest on every release via
python scripts/update_test_badge.py. That block is the source of truth; this table tracks latency and surface area only.
| Metric | Value |
|---|---|
| Validation overhead | <50ms |
| Sandbox cold start | ~125ms |
| Sandbox warm pool | <200ms |
| Framework integrations | 13 |
| Core dependencies | 0 (Pydantic only) |
📖 Documentation
| Resource | Description |
|---|---|
| AGENTS.md | v0.6.1 — repo-root entrypoint for agentic IDEs (Cursor, Claude Code, Windsurf, Mintlify) |
| Anthropic Claude Agent SDK adapter | v0.6.1 — AnthropicClaudeAgentSDKAdapter.wrap_agent(agent, policy=...); canonical-list trio |
airlock manifest enforce |
v0.6.1 — fail-closed CLI runtime allowlist gate against signed manifests; CI exits 0/2/3 |
| Managed Agents Outcomes-rubric guard | v0.7.4 — fail-closed gate on the Anthropic Managed Agents 2026-05-06 Outcomes rubric ID; ManagedAgentsOutcomesGuard.evaluate(provenance) + managed_agents_outcomes_2026_05_06_defaults factory; no SDK dep |
| Filter-Eval RCE guard (CVE-2026-25592 + CVE-2026-26030) | v0.7.5 — regex detector for the Semantic-Kernel-class lambda-filter / template-expression eval RCE primitive (MSRC 2026-05-07); FilterEvalRCEGuard.evaluate(args) + semantic_kernel_filter_eval_rce_2026_25592_26030_defaults factory; framework-agnostic |
| OIDC publish-window guard (TanStack 2026-05-11) | v0.7.6 — known-bad blast-list guard for the TanStack/Mini-Shai-Hulud npm OIDC trusted-publisher class (postmortem 2026-05-11; 42 pkgs × 84 versions); OIDCPublishWindowGuard.evaluate(args) + npm_oidc_publish_window_guard_defaults factory; pure-data preset, no runtime npm calls |
| MCP STDIO command-injection guard | v0.7.6 — shell metachar + opt-in path-traversal denier for MCP STDIO argv vectors (HelpNetSecurity 2026-05-05); StdioCommandInjectionGuard.evaluate(args) + mcp_stdio_command_injection_preset_defaults factory; no mcp SDK dep |
| Eval-RCE guard (CVE-2026-44717) | v0.8.0 — bare-eval()/parse_expr()/exec() invocation detector for the MCP Calculate Server class (NVD 2026-05-15); EvalRCEGuard.evaluate(args) + curated vulnerable-package denylist + parse_expr safe-form exemption + stdio_guard_eval_defaults_2026_05_15 factory |
| MCP Inspector exposure guard (CVE-2026-23744 runtime) | v0.8.0 — Linux runtime listener-scan via stdlib /proc/net/tcp for the MCPJam Inspector public-bind class; complements v0.5.x config-time bind_address_guard; MCP_INSPECTOR_REQUIRE_AUTH=1 operator bypass |
| Agent SDK Credit pool budget | v0.8.0 — per-month USD pool tracker for Anthropic's 2026-06-15 billing split (Zed blog 2026-05-14); AgentSDKCreditBudget.register_call(model, input_tokens, output_tokens) with 90% near-limit + 100% exhausted thresholds; packaged 2026-06 pricing fixture |
| OpenAPI Drift Guard (Hermes 2026-05-13) | v0.8.1 — payload-shape drift detector against an operator-supplied OpenAPI 3.x spec (arXiv:2605.14312); OpenAPIDriftGuard.evaluate(operation_id, args) detects missing_required / unknown_field / type_mismatch; three modes (strict / warn / shadow); vaccinate_openapi(spec) decorator + openapi_doc_drift_guard_defaults factory; caller supplies spec dict, no PyYAML dep |
| MCP Calc-Server bundle preset | v0.8.1 — composition factory mcp_calc_server_bundle_defaults_2026_05_15() wires v0.8.0 EvalRCEGuard + v0.7.6 StdioCommandInjectionGuard under a single preset_id (CVE-2026-44717 anchor) scoped to calc/calculate/evaluate/sympy_eval/math_eval tool-name patterns; pure config composition, no new detector module |
| Metis-inspired corpus block-rate regression | v0.8.2 — release-gate primitive MetisInspiredCorpusBlockRateGuard runs a deterministic 25-entry exploit-shape corpus (CVE-2026-44717 + 2026-05-05 STDIO injection) through EvalRCEGuard + StdioCommandInjectionGuard; one-sided gate fires when block rate drops below baseline − 5%; NOT a reproduction of the Metis paper's POMDP attacker (arXiv:2605.10067 cited as motivation, not as prompt source); airlock corpus-bench CLI ships text/json/md reports |
| Corpus per-category coverage | v0.8.3 — extends the v0.8.2 corpus-bench with HarnessAudit-Bench (arXiv:2605.14271) two-category taxonomy (resource_access, info_transfer); CorpusEntry.violation_category field + CategoryCount decision field; airlock corpus-bench reports per-category coverage in text/json/md; NOT a reproduction of HarnessAudit-Bench (artifacts not yet public — taxonomy adopted as schema, scoring is not) |
| Stainless SDK provenance classifier | v0.8.3 — pure-function classify_sdk_lineage(user_agent, response_body_head) building block flags MCP servers generated by the deprecated Stainless SDK toolchain (Anthropic acquired Stainless 2026-05-13, hosted generator winding down); operator-callable from own audit hooks — NOT an automatic HTTP probe (decorator-in-process architecture, see ROADMAP §1); stainless_provenance_probe_defaults() preset is default_action=tag_only, visibility not enforcement |
| Human-oversight decorator | v0.8.4 — @requires_human_oversight(approver=...) gates a tool function on an operator-supplied approval callable (Code-as-Harness arXiv:2605.18747 anchor); GRANT → call wrapped fn, DENY → OversightDeniedError, TIMEOUT → OversightTimeoutError; composes with @Airlock(...); protocol shapes + InProcessRecordedApprover testing helper; NOT a bidirectional audit-emitter RPC channel — operator owns the transport (Slack/PagerDuty/CLI), agent-airlock owns the gate + the protocol |
| Layer-contract receipt block | v0.8.5 — opt-in LayerContract (assume/guarantee) block on signed airlock attest receipt payloads (arXiv:2605.18672 anchor); --contract derives per-guard pass_rate from the verdicts list, --assumes id1,id2 declares upstream-layer dependencies; receipt schema v1 unchanged (additive field); pass_rate is a measured statistic over the sample (not a proof) — every Guarantee carries sample_size so verifiers can weight low-N appropriately; NOT backed by a window-counter store (that infrastructure doesn't exist yet — derived from the operator-supplied verdicts list, no new abstraction) |
| MCP Attested Tool-Server Admission (arXiv:2605.24248) | v0.8.10 — opt-in admission gate for MCP tool servers per Metere (May 2026). Host fetches a JWS-compact clearance from {server_url}/.well-known/mcp-clearance, verifies its signature against an operator-pinned trust root (Ed25519 / RSA-PSS / JWKS — never network-fetched on the hot path), and enforces a deny-by-default per-server tool allowlist parsed from the verified clearance. Flavor-gated ENFORCE (hard-deny) / WARN (log only) modes. Every decision emits a ReceiptVerdict on the guard="mcp_attested_admission" channel — reuses the existing airlock attest DSSE path, does not invent a new log. mcp_attested_admission_defaults() factory + MCPProxyGuard.audit_tool_admission() integration; signature verification gated behind pip install agent-airlock[attested]. |
| Mobile MCP intent-URL guard (CVE-2026-35394) | v0.8.8 — defensive bundle for the Mobilenexthq Mobile MCP mobile_open_url intent-injection RCE class (< 0.0.50). mobile_mcp_intent_guard_2026_05() returns a pre-configured SafeURLValidator(allowed_schemes=["http", "https"]) (blocks intent:, content:, file:, app:, data:, javascript:, vbscript:), an AirlockConfig(unknown_args=UnknownArgsMode.BLOCK), and the canonical Mobile MCP tool-name corpus (mobile_open_url, open_url, mobile_launch_url). DIFF-COMPATIBLE with the existing SafeURL type — no new validator invented. Also fixes a pre-existing block_private_ips=True no-op in SafeURLValidator (RFC1918 ranges were not actually blocked because the validator's own SafeURLValidationError raise was caught by except ValueError). |
| Capsule ShareLeak / PipeLeak (CVE-2026-21520) | v0.8.14 — defensive bundle for the Capsule Security-disclosed indirect-prompt-injection class hitting Microsoft Copilot Studio (ShareLeak, CVE-2026-21520, CVSS 7.5 HIGH, CWE-77, patched 2026-01-15) and Salesforce Agentforce (PipeLeak, parallel pattern). Both vectors share the same architecture: untrusted form input (SharePoint form / Web-to-Lead form) is concatenated into the agent's context with no boundary, while the agent simultaneously holds outbound exfil tools (Outlook send / Salesforce email-case). capsule_indirect_injection_cve_2026_21520_defaults() composes existing primitives — default_deny=True + canonical exfil-sink denied_tools (send_email, outlook_*, create_case, share_*, export_*, post_to_*, webhook_*, ...) + reauth_on_untrusted_reinvocation=True (v0.8.6 debate-amplification guard at threshold=1) + AirlockConfig(unknown_args=UnknownArgsMode.BLOCK). Opt-in only — no new validator invented, no default-priority-chain entry. Pairs with airlock-explain --unused-scopes (v0.8.13) so operators populate the read-side allow-list from a real trace before deploying. |
| Flowise MCP-stdio adapter RCE (CVE-2026-40933) | v0.8.16 — defensive control for the Flowise authenticated-RCE-via-MCP-stdio-adapter class (CVSS 9.9, fixed upstream in Flowise 3.1.0). Flowise ≤ 3.0.x serialises a user-defined CustomMCP command+args straight into a child-process spawn with no sandbox or argv sanitisation — importing a crafted chatflow is a one-click path to OS-level RCE. flowise_mcp_stdio_guard_2026_defaults() is a per-tool-class projection of the v0.7.6 StdioCommandInjectionGuard (no new detector invented), scoped to the Flowise CustomMCP stdio surface. Fail-closed on shell metachars (;, &&, ||, |, newline, backtick, $() in the command/args path + opt-in path-traversal outside a cwd_allowlist; check(args) raises FlowiseMcpStdioInjectionError. OWASP MCP05 Command Injection. Wired into ox_mcp_supply_chain_2026_04_defaults() — corrects a prior mis-attribution where CVE-2026-40933 was recorded as a "Semantic Kernel auth-header leak". |
MCP description-vs-manifest guard (mcp_description_manifest_guard) |
v0.8.18 — runtime consistency gate that asserts a tool's model-facing description (declared input schema + advertised capability/security boundary) matches its registered manifest before the tool is admitted, failing closed per the deny-by-default posture. Anchored on the DCIChecker study (arXiv:2606.04769), which measured Description-Code Inconsistency at 9.93% of 19,200 tool pairs across 2,214 MCP servers. DescriptionManifestGuard.evaluate(description) detects described_arg_not_in_manifest (description claims a ghost argument), undisclosed_side_effect (manifest has a side effect the description hides — the tool-poisoning direction), and overclaimed_capability (description advertises a capability absent from the manifest); three modes (strict / warn / shadow); vaccinate_description_manifest(manifests) decorator + mcp_description_manifest_guard_defaults() factory. Composes above ghost-arg stripping + Pydantic type-validation (which govern the call payload) — it does not replace them. OWASP MCP03 Tool Poisoning. Pydantic-only core, no new runtime deps. |
| LeRobot pickle-deserialization RCE (CVE-2026-25874) | v0.8.19 — deny-by-default posture for the HuggingFace LeRobot unauthenticated-RCE class (CVSS 9.3). LeRobot's async-inference PolicyServer / robot-client pickle.loads() payloads received over an unauthenticated, non-TLS gRPC channel (SendObservations / SendPolicyInstructions / GetActions) — an unauthenticated, network-reachable attacker reaches arbitrary OS command execution. Ships a reusable UnsafeDeserializationGuard (in safe_types, next to SafePath/SafeURL) that fails closed on pickle magic bytes (0x80 PROTO), base64-encoded pickle, and pickle/marshal/shelve/dill/jsonpickle marker tokens in string args — plus an airgap pairing that refuses serialized-object (bytes) args unless the call declares an authenticated and TLS transport. Wired into SecurityPolicy.deserialization_guard and run at the @Airlock seam (Step 2.7) before the tool body; the block carries a fix_hint naming CVE-2026-25874. lerobot_cve_2026_25874_defaults() is the per-CVE projection (deny-by-name globs for *deserialize*/*pickle.loads*/torch_load/the gRPC methods + the wired content guard). Composes above ghost-arg stripping + Pydantic type-validation. Pydantic-only core, no new runtime deps. |
| MCP server-URL env-interpolation secret leak (CVE-2026-32625) | v0.8.20 — deny-by-default guard for the LibreChat MCP-server-URL credential-disclosure class (CVSS 9.6, CWE-200, OWASP MCP01). A user-supplied MCP server connection template (URL / header / arg) carrying an env-interpolation token (${VAR}, bare $VAR, or %VAR%) is expanded server-side against the host process.env and leaks a secret (${JWT_SECRET} / ${CREDS_KEY} / ${MONGO_URI}) into the outbound request. MCPServerEnvInterpolationGuard.evaluate(config) (in mcp_spec/env_interpolation_guard.py) scans the URL/headers/args recursively and refuses any interpolation token unless its variable is on an operator-declared allowed_vars allowlist of explicitly non-secret vars (empty default = deny all). It never reads os.environ or expands anything — token-match only, so it cannot itself leak. mcp_server_env_interpolation_guard_defaults() factory + check(config) raising MCPServerEnvInterpolationError; escaped \$/$$ are not flagged. Pydantic-only core, no new runtime deps. |
| Codegen triple-quote / delimiter break-out RCE (CVE-2026-11393) | v0.8.21 — deny-by-default guard for the AWS AgentCore CLI code-injection class (CVSS 9, CWE-94, OWASP ASI05). The CLI splices a model-/user-controlled collaborationInstruction into generated Python without neutralising triple-quote characters, so a crafted """ closes the generated string literal and injects statements that execute on agent import — RCE on the AgentCore Runtime + the importer's machine. CodegenDelimiterInjectionGuard.evaluate(args) (in mcp_spec/codegen_delimiter_guard.py) recursively scans args bound for a codegen / template / exec/eval sink and fails closed on triple-quote tokens (""" / '''), quote break-out tokens ("); / ') / " + / ']), and raw newlines — unless the field is on an operator-declared allowed_literal_fields allowlist of safe literal contexts. It never generates or executes code — token-match only. codegen_delimiter_injection_guard_defaults() factory + check(args) raising CodegenDelimiterInjectionError; composes one layer above the v0.8.0 EvalRCEGuard (which gates the sink itself). Pydantic-only core, no new runtime deps. |
| MCP-bridge subprocess command/args/env RCE (CVE-2026-42271, CISA KEV) | v0.8.22 — deny-by-default guard for the LiteLLM MCP-preview-endpoint command-injection class (CVSS 8.7, CWE-78, OWASP ASI05; on the CISA KEV catalog as of 2026-06-09, actively exploited). LiteLLM's POST /mcp-rest/test/connection + /mcp-rest/test/tools/list accepted a full MCP server config (command / args / env) in the request body and spawned it as a subprocess with no validation — any low-privilege API key reached host command execution (unauthenticated RCE when chained with the Starlette Host-header bypass CVE-2026-48710). McpSubprocessArgInjectionGuard.evaluate(config) (in mcp_spec/subprocess_arg_guard.py) treats spawn-shaped MCP-bridge args (command/cmd/args/argv/env) as untrusted and refuses them unless the resolved program is on an operator-declared allowed_commands allowlist of safe static commands (empty default = deny all); an env carrying a code-loading var (LD_PRELOAD/PATH/PYTHONPATH/…) is refused regardless, and a config with no spawn-shaped fields passes. Never spawns anything — config inspection only. mcp_subprocess_arg_injection_guard_defaults() factory + check(config) raising McpSubprocessArgInjectionError; composes one layer above the v0.7.6 StdioCommandInjectionGuard (which scans an allowed argv for shell metachars). Pydantic-only core, no new runtime deps. |
| Cline cross-origin WebSocket hijack (CVE-2026-44211) | v0.8.27 — deny-by-default guard for the Cline Kanban cross-origin WebSocket-hijack class (npm kanban < 2.13.0, CVSS 9.7, CWE-1385 Missing Origin Validation in WebSockets + CWE-306 Missing Authentication, OWASP ASI05). Cline runs a control WebSocket server on 127.0.0.1:3484 that accepts every upgrade without validating the Origin header; because browsers do not apply same-origin/CORS to ws://, any website the developer visits can drive the agent — leak workspace data, inject prompts into the agent terminal (RCE), or kill tasks. Binding to loopback is not a mitigation. WebSocketOriginGuard (in mcp_spec/ws_origin_guard.py) has two surfaces: audit_endpoint(host=…, origin_allowlist_enforced=…) flags a control endpoint that enforces no Origin allow-list (the misconfiguration), and check_upgrade(origin) / enforce_upgrade(origin) / wrap_handler(handler) form a runtime gate that rejects a WebSocket upgrade whose Origin is missing or outside an explicit allow-list (empty allow-list = deny all). The guard never opens a socket — descriptor / single-Origin inspection only. cline_cve_2026_44211_defaults(allowed_origins=[…]) factory + check(origin) raising WebSocketOriginHijackError. Pydantic-only core, no new runtime deps. |
| SSRF egress guard — alternate-encoding loopback / rebinding (CVE-2026-47390) | v0.8.29 — deny-by-default egress guard for the SSRF-protection-bypass class (CWE-918, OWASP ASI02). An egress filter that checks the literal hostname string instead of the resolved IP is bypassed by encoding loopback / link-local / cloud-metadata in a form ipaddress rejects but the HTTP client connects to: 127.1, decimal 2130706433, octal 0177.0.0.1, hex 0x7f000001, ::ffff:127.0.0.1, or a public hostname whose DNS record points at 169.254.169.254 (rebinding). SSRFEgressGuard (in ssrf_egress_guard.py) reduces every target to its canonical IP(s) — decoding the alternate encodings via socket.inet_aton and resolving hostnames at check time (so a rebind to loopback is caught at connect time, not just parse time) — and fails closed on any loopback / link-local / metadata / unspecified address or an RFC1918 range not on allow_internal_hosts, with a 3-line explain audit trace (rule / resolved IP / encoding) on every denial. Composes with the v0.5.5 is_blocked_ipv6_range set (IPv4-mapped / NAT64 / 6to4 / ULA). ssrf_egress_guard_defaults(allow_internal_hosts=…) factory + check(url) raising SSRFEgressBlocked. Pydantic-only core, no new runtime deps. |
| MCP Origin/Host DNS-rebinding guard (CVE-2026-11624) | v0.8.30 — deny-by-default Origin/Host validation for MCP HTTP/SSE/streamable transports (CWE-346 Origin Validation Error, CVSS 9.4, OWASP-MCP MCP07). Google MCP Toolbox for Databases < 0.25.0 served a local HTTP transport that did not validate the Origin or Host header, so a browser the developer visits can DNS-rebind to 127.0.0.1 and script MCP tool calls at the local server (file reads, command execution, DB access). Fixed upstream in 0.25.0 with a new --allowed-hosts flag alongside --allowed-origins, warning on the * wildcard. McpOriginHostGuard (in mcp_spec/mcp_origin_host_guard.py) validates the inbound Host (always) and Origin (when present) against explicit allowed_origins / allowed_hosts allow-lists; with none configured it falls back to loopback-only and records a startup warning, and a * wildcard allows all but also warns — mirroring the upstream fix (stdio transports have no Origin and are out of scope). check_headers(headers) / validate(headers) + a startup_warnings list; mcp_origin_host_guard_defaults(allowed_origins=…, allowed_hosts=…) factory + check(headers) raising McpOriginHostRebindingError. Pydantic-only core, no new runtime deps. |
| Examples | 13 framework integrations (11 adapter-shipped + 2 example-only) with copy-paste code |
| Security Guide | Production deployment checklist |
| API Reference | Every function, every parameter |
| Egress Bench | CVE fixture walker — every payload previously blocked stays blocked |
| OX MCP Supply-Chain preset | Umbrella for the 2026-04-20 OX dossier (10 CVEs) |
Elicitation guard (mcp_elicitation_guard_2026_04) |
v0.6.0 — runtime mitigation for the MCP tool/elicitation round-trip (spec PR #1487, draft 2026-04-r1); blocks credential-request and policy-override classes |
| Config-path guard (CVE-2026-31402) | v0.6.0 — Claude Desktop MCP-server-registration path-traversal mitigation (CVSS 8.8) |
| Gemini 3 Agent Mode adapter | v0.6.0 — function_call carrier normalisation + thought_signature redaction; pinned SUPPORTED_VERSIONS set |
OAuth state entropy guard |
v0.6.0 — base64/hex/JSON decode + prompt-injection scan on the OAuth state parameter (BlackHat Asia 2026 vector) |
airlock console |
v0.6.0 — three-pane Textual TUI with live verdict stream + replay-on-edit; gated behind airlock[console] extra |
airlock attest receipt |
v0.6.0 — Sigstore-compatible signed agent-run receipts; emit + verify subcommands |
policy_bundle.lock |
v0.6.0 — hash-pinned preset bundles with Cargo.lock semantics; airlock pack lock + airlock replay --bundle-lock |
airlock studio |
v0.6.0 — local stdlib HTTP rehearsal sandbox; paste-a-transcript verdicts + diff between runs |
| smolagents wrapper | v0.6.0 — wrap_agent(agent, policy_bundle) for HuggingFace smolagents 1.18+ (4th first-class framework) |
STDIO meta-guard (mcp_stdio_meta_cve_2026_04) |
v0.5.9 — bundles every airlock STDIO defence into one chain; recommended default for any MCP server registered after 2026-04-26 |
| LangGraph 1.0.11 ToolNode compat shim | v0.5.9 — silent unwrap survives the prebuilt 1.0.11 list-vs-dict shape break |
| GPT-5.5 ("Spud") agent defaults + tool-shape adapter | v0.5.9 — caps fan-out at 8 / context at 900k / per-call egress at 512 KB |
Capability caps (agent_capability_default_caps) |
v0.5.9 — programmatic caps for SIGN_CONTRACT / DELEGATE_TO_AGENT / INVOKE_TOOL / WRITE_FILE / NETWORK_EGRESS |
| OWASP Agentic 2026-Q1 coverage matrix | v0.5.9 — 10/10 mapping risk_id → guard + preset + test, CI gate fails on stale entries |
Short-form-video corpus (wild-2026-04/short_form_video) |
v0.5.9 — 5 transcript / on-screen / RTL PoCs; airlock replay --namespace short_form_video |
airlock graph serve |
v0.5.9 — local web UI of the live agent → tool → MCP-server topology with verdict overlay |
airlock policy compile / explain |
v0.5.9 — natural-language policy authoring with hash-pinned prompt + deterministic cache |
airlock kill-switch |
v0.5.9 — HMAC-signed cluster-wide freeze with 2-of-3 quorum reset |
| Comment-and-Control PR-metadata guard | v0.5.8 — neutralises CVSS 9.4 cross-vendor PR-title prompt injection |
airlock pack |
v0.5.8 — signed policy bundles; airlock pack install claude-code-ci@2026.04 |
airlock baseline |
v0.5.8 — per-agent 7-day rolling profile + drift score |
airlock attest |
v0.5.8 — DSSE provenance per verdict |
| Cloudflare Mesh compat | v0.5.8 — runs alongside Mesh; de-duplicates overlapping policies |
| Manifest-only STDIO mode | v0.5.7 — signed-manifest registry; argv never originates from runtime input |
| STDIO-taint CI gate | v0.5.7 — AST taint analyzer; flags remote→Popen flows at PR time |
| Declarative preset YAML | v0.5.7 — composite presets via stdlib-only YAML parser |
| CVE-2026-30615 Windsurf zero-click | v0.5.7 — diff-on-demand mcp.json auto-load guard; v0.8.23 adds mcp_config_pin — a spawn-time {name, command, args, env-keys} fingerprint pin (McpConfigPinSet.check()) that fails closed (raises, never warns) on an injected (unpinned) or mutated STDIO server even when the injection never touched a watched config file; emits on the structlog + JSON-Lines audit channels |
| CVE-2026-6980 GitPilot-MCP | v0.5.7 — repo_path injection (vendor unresponsive) |
| DockerBackend | v0.5.1 hardening + known gaps |
Regulatory engagement
- Public comment draft — NIST AI RMF v2.0 Agentic-AI Security (window: 2026-04-18 → mid-June)
👤 About
Built by Sattyam Jain — AI infrastructure engineer.
This started as an internal tool after watching an agent hallucinate its way through a production database. Now it's yours.
🤝 Contributing
We review every PR within 48 hours.
git clone https://github.com/sattyamjjain/agent-airlock
cd agent-airlock
pip install -e ".[dev]"
pytest tests/ -v
- Bug? Open an issue
- Feature idea? Start a discussion
- Want to contribute? See open issues
💖 Support
If Agent-Airlock saved your production database:
- ⭐ Star this repo — Helps others discover it
- 🐛 Report bugs — Open an issue
- 📣 Spread the word — Tweet, blog, share
⭐ Star History
Sources: This README follows best practices from awesome-readme, Best-README-Template, and the GitHub Blog.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_airlock-0.8.30.tar.gz.
File metadata
- Download URL: agent_airlock-0.8.30.tar.gz
- Upload date:
- Size: 541.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e4f57af76ca4cc69310e27368f1e19a5ac48d14b34b1541c3632226047e8f8c
|
|
| MD5 |
e41bf99e151556ab4f71174f327a0596
|
|
| BLAKE2b-256 |
2fe811d4654beae7ab2508c0b2a372be78fca8b1a3009e3e670c17a554da87c0
|
File details
Details for the file agent_airlock-0.8.30-py3-none-any.whl.
File metadata
- Download URL: agent_airlock-0.8.30-py3-none-any.whl
- Upload date:
- Size: 620.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a36677a4f874156453602cc39bed7dd546f67861c97b046ce9a73fd7e327e9f0
|
|
| MD5 |
7bf579c8f65417f23e6a0606d0eee793
|
|
| BLAKE2b-256 |
465974f63238110faf5bfe3f23ab709f7770ef00fce4098e271feaea607b206f
|