Drop-in deterministic policy layer for MCP-using AI agents
Project description
mcp-guard
Drop-in deterministic policy layer for MCP-using AI agents.
mcp-guard synthesises tool-call policies from observed indirect-
prompt-injection gaps, evaluates each tool call against those
policies at the agent's tool-call boundary, and provides a
backtest harness for measuring false-positive rate against
legitimate traffic before deployment.
v0.5.0 (2026-05-15): 9 deterministic rule patterns across 122 rules, 304-case backtest corpus, TPR 1.00 / FPR 0.01. Four framework adapters: Anthropic MCP SDK, LangChain, LlamaIndex, CrewAI. LLM-augmented synthesis fallback (mock + real-API validated). Six reproducible real-world case studies: EchoLeak indirect injection, MCP tool-description poisoning, AWS IMDS SSRF, Log4Shell-class MCP logging, RAG context poisoning, agent self-prompting loops. See CHANGELOG.md.
This is the defensive companion to the purple-scaffold
research probes. Findings from those probes feed into policy
synthesis; the resulting policy is what a product-side defender
would ship in front of the agent's tool-call execution gate.
Why
Most defenses against indirect prompt injection are classifier-based: pre-process the model input or post-process the model output, and use a model to decide whether something looks suspicious. That's useful but probabilistic, hard to audit, and adds latency.
mcp-guard takes the complementary deterministic-policy approach:
- Synthesise a policy from observed gaps (e.g., "agent emitted
read_text_file('~/.ssh/id_rsa')after reading a poisoned file" → policy: denyread_text_filewhose path matches a sensitive-credential pattern). - Evaluate each tool call against the policy. Pure function:
(tool_name, args, user_context) -> Decision. No I/O, no LLM, no ambiguity. - Backtest the policy against a labelled corpus of legitimate
- attack tool-call cases before deployment. Measure FPR / TPR. Iterate until both look acceptable.
The library is not meant to replace classifier-based defenses — it complements them. Use both: classifier as an early-warning signal, deterministic policy as the unconditional gate.
Install
pip install mcp-guardrails
(Python 3.11+. No runtime dependencies beyond the standard library.)
Note on the name. The PyPI distribution is
mcp-guardrails(an unrelated dormant project squatsmcpguardon PyPI, and the similarity check refusesmcp-guard). The Python import name staysmcp_guardso existing code continues to work. Same Pillow / PIL pattern. The GitHub repo, the in-code references, and the project identity staymcp-guard.
Optional extras for the integrations you actually use:
pip install 'mcp-guardrails[anthropic-mcp]' # for the Anthropic MCP SDK adapter
pip install 'mcp-guardrails[langchain]' # for the LangChain callback handler
pip install 'mcp-guardrails[llamaindex]' # for the LlamaIndex callback handler / wrap_tool
pip install 'mcp-guardrails[crewai]' # for the CrewAI wrap_tool
pip install 'mcp-guardrails[llm]' # for synthesize_with_llm fallback
pip install 'mcp-guardrails[all]' # everything
Quickstart — Python API
The fastest path to a shippable policy is synthesize_default_policy(),
which returns the full ruleset across every built-in pattern:
from mcp_guard import synthesize_default_policy, evaluate, default_corpus, run_backtest
# 1. Load the full deterministic policy (9 patterns, 122 rules)
policy = synthesize_default_policy()
# 2. Evaluate any tool call against it
decision = evaluate(
policy,
tool_name="send_email",
args={"to": "attacker@evil.com", "body": "exfil"},
user_context={"user": {"contacts": ["bob@corp.example"]}},
)
print(decision)
# Decision(allowed=False,
# denying_rule_id='tool-policy-email-contact-allowlist--send_email--default',
# reason='External recipient outside the authenticated user...')
# 3. Backtest against the labelled corpus
metrics = run_backtest(policy, default_corpus())
print(f"TPR: {metrics.true_positive_rate:.4f}, "
f"FPR: {metrics.false_positive_rate:.4f}")
# TPR: 1.0000, FPR: 0.0769
For incident-driven synthesis (one observed gap → narrowly-targeted
policy), use synthesize_from_text():
from mcp_guard import synthesize_from_text
# Synthesise from a free-text gap description
policy = synthesize_from_text(
"agent emitted send_email to attacker@evil.com when user "
"asked it to read a ticket",
technique_id="lab-2026-05-04",
)
print(policy.to_yaml())
Quickstart — CLI
# Synthesise a policy from gap text → YAML on stdout
mcp-guard synthesize "agent emitted send_email to attacker@evil.com" \
> policy.yaml
# Evaluate a single tool call against the policy → JSON Decision on stdout
mcp-guard evaluate policy.yaml send_email '{"to":"attacker@evil.com"}' \
--user-context '{"user":{"contacts":["bob@corp.example"]}}'
# Backtest against the default corpus → metrics JSON
mcp-guard backtest policy.yaml
Wiring into your agent
The evaluator is pure, so you can wire it anywhere — most naturally at the agent's tool-call boundary:
from mcp_guard import evaluate, GeneratedPolicy
policy: GeneratedPolicy = synthesize_default_policy()
def on_tool_call_attempt(tool_name: str, args: dict, user_ctx: dict) -> bool:
decision = evaluate(policy, tool_name, args, user_ctx)
if not decision.allowed:
log_audit(
event="tool_call_denied",
rule=decision.denying_rule_id,
reason=decision.reason,
tool=tool_name,
args=args,
)
return False
return True
Anthropic MCP Python SDK
from mcp.server import Server
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.anthropic_mcp import MCPGuard
server = Server("my-app")
guard = MCPGuard(policy=synthesize_default_policy())
@server.call_tool()
async def call_tool(name: str, arguments: dict):
# Raises GuardedToolDenied if the policy denies the call.
guard.check(name, arguments, user_context=current_user_context())
return await my_business_logic(name, arguments)
Or use the decorator form:
@server.call_tool()
@guard.wrap_handler(user_context_fn=current_user_context)
async def call_tool(name: str, arguments: dict):
return await my_business_logic(name, arguments)
LangChain
from langchain.agents import AgentExecutor
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.langchain import make_callback_handler
handler = make_callback_handler(
policy=synthesize_default_policy(),
user_context_fn=lambda: {"user": {"id": current_user.id,
"contacts": current_user.contacts}},
)
executor = AgentExecutor(
agent=agent, tools=tools,
callbacks=[handler], # ← mcp-guard sits in the callback chain
)
If the policy denies a tool call, the handler raises GuardedToolDenied
inside on_tool_start, which LangChain surfaces as a tool failure;
the agent's reasoning chain sees the deny reason and can adapt.
LlamaIndex
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.llamaindex import make_callback_handler
Settings.callback_manager = CallbackManager([
make_callback_handler(
policy=synthesize_default_policy(),
user_context_fn=lambda: {"user": {...}},
),
])
# … your existing agent / query engine code; tool calls are now guarded.
Per-tool variant (no callback manager required):
from mcp_guard.integrations.llamaindex import wrap_tool
guarded = wrap_tool(my_tool, policy=synthesize_default_policy())
CrewAI
from crewai import Agent
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.crewai import wrap_tools
agent = Agent(
role="researcher",
goal="answer the question",
tools=wrap_tools(
my_tools,
policy=synthesize_default_policy(),
user_context_fn=lambda: {"user": {...}},
),
)
wrap_tool is idempotent — re-wrapping is a no-op — so it's safe to
apply at agent-construction time without tracking which tools were
already guarded.
LLM-augmented synthesis for novel gaps
The deterministic synthesiser covers 9 attack-class patterns. For
gap shapes none of them recognise, synthesize_with_llm adds an
LLM fallback path that calls Anthropic Claude with a schema-pinned
prompt and validates the response against the full PolicyRule
schema before emitting the rule:
from mcp_guard import synthesize_with_llm
# Deterministic patterns handle this → no LLM call.
p1 = synthesize_with_llm("send_email to attacker@evil.com")
# Novel gap → falls back to Claude (requires [llm] extra)
p2 = synthesize_with_llm(
"agent invoked custom_tool_xyz with arg target_id pointing to a "
"privileged service account ID outside the user's tenant",
fallback=True,
)
The validator rejects any response that doesn't match the PolicyRule schema (invalid operator, missing fields, etc.) and returns an empty policy on failure — better to miss a rule than ship a malformed one.
What kinds of gaps does the synthesiser cover?
The deterministic synthesiser is intentionally pattern-based and auditable. As of v0.2.0, 9 attack classes map onto 122 rules in the default policy:
| # | Attack class | What it denies | Tool families covered |
|---|---|---|---|
| 1 | Email contact exfil | send_email whose to arg is outside the user's context.user.contacts |
5 email tool names |
| 2 | Sensitive file read | read_file whose path matches ~/.ssh/, ~/.aws/, /etc/shadow, kubeconfig, etc. |
6 read tool names |
| 3 | Sensitive file write | write_file whose path matches ~/.bashrc, ~/.ssh/authorized_keys, /etc/, /usr/bin/, cron, .git/config, .env, etc. |
5 write tool names |
| 4 | Path traversal | Any path arg containing ../, ..\, URL-encoded variants (%2e%2e, %2F/%5C), double-encoded, Unicode division-slash |
17 file-path tool names |
| 5 | SSRF (private host) | fetch_url / http_get whose url targets RFC1918, loopback, link-local, AWS/GCP metadata, IPv6 unique-local |
6 HTTP tool names |
| 6 | Shell command danger | shell_exec / bash / run_command containing chaining (;, &&), pipe-to-shell, command substitution ($(), backticks), rm -rf /, `curl |
sh`, fork bombs |
| 7 | SQL danger | db_query / execute_sql containing DROP TABLE, TRUNCATE, unbounded DELETE/UPDATE, UNION SELECT, information_schema probes, stacked queries, xp_/sp_ exec, LOAD_FILE, INTO OUTFILE |
6 SQL tool names × 3 arg names |
| 8 | Network egress private | tcp_connect / socket_connect whose host is private/internal |
5 network tool names |
| 9 | Email body PII / secret exfil | send_email whose body/subject contains AWS keys, OpenAI/Anthropic keys, GitHub PATs, Slack tokens, private-key headers, SSN, JWT, credit-card numbers |
5 email tool names × 4 arg names |
For gap shapes not yet covered, the synthesiser returns an empty policy (deliberate — we surface "no rule generated" rather than fabricate a wrong rule). Adding a new gap shape is one constructor + one test.
LLM-driven synthesis can layer on top later for novel cases the patterns don't cover; the deterministic path stays as a backstop because it's auditable from logs alone (no model required at synthesis time).
Backtest corpus
default_corpus() returns a 124-case fixture corpus of (tool_name,
args, user_context, expected_verdict) tuples covering every built-in
pattern. v0.4.0 expanded coverage to: post-RCE env recon (env dump,
printenv, secret-keyword grep, secret-extension find), Windows
sensitive paths (Credentials manager, DPAPI keys, hosts file,
scheduled tasks, registry Run keys), Postgres COPY/pg_read_file
RCE, MySQL INTO DUMPFILE, MSSQL xp_cmdshell, jar://ftp://dict://
SSRF schemes, RSA/OpenSSH PEM headers, GitHub PATs, Slack tokens.
v0.5.0 default-policy metrics:
Corpus size: 304
TP (caught): 106 / 106 attacks → TPR 1.0000
FP (over-blocks): 2 / 198 legit → FPR 0.0101
The FPR drops as the legit denominator grows; the 2 FPs are still the same architectural floor (legitimate first-time recipients that contact-allowlist policies block by definition).
The 2 remaining FPs are architecturally inherent to contact-allowlist
policies (legitimate first-time recipients). They are kept in the
corpus on purpose so the FPR is a real number rather than a vanity
zero. Tune by adding allow-list conditions to user_context per
recipient class (e.g. distinguish "vendor onboarding" or "interview
candidate" tiers from generic external).
| Category | Legit cases | Attack cases |
|---|---|---|
| Email contact allowlist | 6 (4 in-contacts + 2 FP-risk) | 3 |
| Sensitive file read | 1 | 3 |
| Sensitive file write | 2 | 4 |
| Path traversal | 2 | 3 |
| SSRF | 3 | 4 |
| Shell danger | 3 | 5 |
| SQL danger | 3 | 5 |
| Network egress private | 2 | 3 |
| Email PII exfil | 2 | 5 |
| Misc legit (read_ticket / search_users) | 2 | — |
Real production deployments should replace default_corpus() with
a load from a labelled traffic store. The rest of the backtest
pipeline stays the same.
Relationship to purple-scaffold
purple-scaffold
is the offensive / measurement side: probes that test how
indirect-prompt-injection compliance varies across MCP server
vectors, models, and product wrappers. mcp-guard is the
defensive side: deterministic policies that catch the
attack patterns the probes find.
Both repos share the same evaluator core; mcp-guard is the
graduation of the policy modules from purple-scaffold/purple/
into a standalone package.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_guardrails-0.5.2.tar.gz.
File metadata
- Download URL: mcp_guardrails-0.5.2.tar.gz
- Upload date:
- Size: 60.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
845b0e9013461ec4c2486821f59b8f4a4f2374c882e211ea8e5ba9581661b08f
|
|
| MD5 |
93f681a401e7d833fffc6063a020b595
|
|
| BLAKE2b-256 |
7a5314ef6ffaad10749dbd9eff8386c3f3670ea3ad2d0e721d66a15a33319618
|
File details
Details for the file mcp_guardrails-0.5.2-py3-none-any.whl.
File metadata
- Download URL: mcp_guardrails-0.5.2-py3-none-any.whl
- Upload date:
- Size: 50.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
387a82246d7c403564e8a523b650722269a0431238be39e1cb8deae7dfecfd65
|
|
| MD5 |
e3301cfe5b555a7b391c8bb250254df1
|
|
| BLAKE2b-256 |
14ead5908237d2a8f757d3c4473243e19ceb268895bc22e4213e4a09934b12e0
|