Drop-in deterministic policy layer for MCP-using AI agents

These details have not been verified by PyPI

Project links

Project description

mcp-guard

Drop-in deterministic policy layer for MCP-using AI agents.

mcp-guard synthesises tool-call policies from observed indirect- prompt-injection gaps, evaluates each tool call against those policies at the agent's tool-call boundary, and provides a backtest harness for measuring false-positive rate against legitimate traffic before deployment.

v0.5.0 (2026-05-15): 9 deterministic rule patterns across 122 rules, 304-case backtest corpus, TPR 1.00 / FPR 0.01. Four framework adapters: Anthropic MCP SDK, LangChain, LlamaIndex, CrewAI. LLM-augmented synthesis fallback (mock + real-API validated). Six reproducible real-world case studies: EchoLeak indirect injection, MCP tool-description poisoning, AWS IMDS SSRF, Log4Shell-class MCP logging, RAG context poisoning, agent self-prompting loops. See CHANGELOG.md.

This is the defensive companion to the purple-scaffold research probes. Findings from those probes feed into policy synthesis; the resulting policy is what a product-side defender would ship in front of the agent's tool-call execution gate.

Why

Most defenses against indirect prompt injection are classifier-based: pre-process the model input or post-process the model output, and use a model to decide whether something looks suspicious. That's useful but probabilistic, hard to audit, and adds latency.

mcp-guard takes the complementary deterministic-policy approach:

Synthesise a policy from observed gaps (e.g., "agent emitted read_text_file('~/.ssh/id_rsa') after reading a poisoned file" → policy: deny read_text_file whose path matches a sensitive-credential pattern).
Evaluate each tool call against the policy. Pure function: (tool_name, args, user_context) -> Decision. No I/O, no LLM, no ambiguity.
Backtest the policy against a labelled corpus of legitimate
- attack tool-call cases before deployment. Measure FPR / TPR. Iterate until both look acceptable.

The library is not meant to replace classifier-based defenses — it complements them. Use both: classifier as an early-warning signal, deterministic policy as the unconditional gate.

Install

pip install mcp-guardrails

(Python 3.11+. No runtime dependencies beyond the standard library.)

Note on the name. The PyPI distribution is mcp-guardrails (an unrelated dormant project squats mcpguard on PyPI, and the similarity check refuses mcp-guard). The Python import name stays mcp_guard so existing code continues to work. Same Pillow / PIL pattern. The GitHub repo, the in-code references, and the project identity stay mcp-guard.

Optional extras for the integrations you actually use:

pip install 'mcp-guardrails[anthropic-mcp]'   # for the Anthropic MCP SDK adapter
pip install 'mcp-guardrails[langchain]'       # for the LangChain callback handler
pip install 'mcp-guardrails[llamaindex]'      # for the LlamaIndex callback handler / wrap_tool
pip install 'mcp-guardrails[crewai]'          # for the CrewAI wrap_tool
pip install 'mcp-guardrails[llm]'             # for synthesize_with_llm fallback
pip install 'mcp-guardrails[all]'             # everything

Quickstart — Python API

The fastest path to a shippable policy is synthesize_default_policy(), which returns the full ruleset across every built-in pattern:

from mcp_guard import synthesize_default_policy, evaluate, default_corpus, run_backtest

# 1. Load the full deterministic policy (9 patterns, 122 rules)
policy = synthesize_default_policy()

# 2. Evaluate any tool call against it
decision = evaluate(
    policy,
    tool_name="send_email",
    args={"to": "attacker@evil.com", "body": "exfil"},
    user_context={"user": {"contacts": ["bob@corp.example"]}},
)
print(decision)
# Decision(allowed=False,
#          denying_rule_id='tool-policy-email-contact-allowlist--send_email--default',
#          reason='External recipient outside the authenticated user...')

# 3. Backtest against the labelled corpus
metrics = run_backtest(policy, default_corpus())
print(f"TPR: {metrics.true_positive_rate:.4f}, "
      f"FPR: {metrics.false_positive_rate:.4f}")
# TPR: 1.0000, FPR: 0.0769

For incident-driven synthesis (one observed gap → narrowly-targeted policy), use synthesize_from_text():

from mcp_guard import synthesize_from_text

# Synthesise from a free-text gap description
policy = synthesize_from_text(
    "agent emitted send_email to attacker@evil.com when user "
    "asked it to read a ticket",
    technique_id="lab-2026-05-04",
)
print(policy.to_yaml())

Quickstart — CLI

# Synthesise a policy from gap text → YAML on stdout
mcp-guard synthesize "agent emitted send_email to attacker@evil.com" \
  > policy.yaml

# Evaluate a single tool call against the policy → JSON Decision on stdout
mcp-guard evaluate policy.yaml send_email '{"to":"attacker@evil.com"}' \
  --user-context '{"user":{"contacts":["bob@corp.example"]}}'

# Backtest against the default corpus → metrics JSON
mcp-guard backtest policy.yaml

Wiring into your agent

The evaluator is pure, so you can wire it anywhere — most naturally at the agent's tool-call boundary:

from mcp_guard import evaluate, GeneratedPolicy

policy: GeneratedPolicy = synthesize_default_policy()

def on_tool_call_attempt(tool_name: str, args: dict, user_ctx: dict) -> bool:
    decision = evaluate(policy, tool_name, args, user_ctx)
    if not decision.allowed:
        log_audit(
            event="tool_call_denied",
            rule=decision.denying_rule_id,
            reason=decision.reason,
            tool=tool_name,
            args=args,
        )
        return False
    return True

Anthropic MCP Python SDK

from mcp.server import Server
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.anthropic_mcp import MCPGuard

server = Server("my-app")
guard = MCPGuard(policy=synthesize_default_policy())

@server.call_tool()
async def call_tool(name: str, arguments: dict):
    # Raises GuardedToolDenied if the policy denies the call.
    guard.check(name, arguments, user_context=current_user_context())
    return await my_business_logic(name, arguments)

Or use the decorator form:

@server.call_tool()
@guard.wrap_handler(user_context_fn=current_user_context)
async def call_tool(name: str, arguments: dict):
    return await my_business_logic(name, arguments)

LangChain

from langchain.agents import AgentExecutor
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.langchain import make_callback_handler

handler = make_callback_handler(
    policy=synthesize_default_policy(),
    user_context_fn=lambda: {"user": {"id": current_user.id,
                                       "contacts": current_user.contacts}},
)

executor = AgentExecutor(
    agent=agent, tools=tools,
    callbacks=[handler],   # ← mcp-guard sits in the callback chain
)

If the policy denies a tool call, the handler raises GuardedToolDenied inside on_tool_start, which LangChain surfaces as a tool failure; the agent's reasoning chain sees the deny reason and can adapt.

LlamaIndex

from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.llamaindex import make_callback_handler

Settings.callback_manager = CallbackManager([
    make_callback_handler(
        policy=synthesize_default_policy(),
        user_context_fn=lambda: {"user": {...}},
    ),
])

# … your existing agent / query engine code; tool calls are now guarded.

Per-tool variant (no callback manager required):

from mcp_guard.integrations.llamaindex import wrap_tool

guarded = wrap_tool(my_tool, policy=synthesize_default_policy())

CrewAI

from crewai import Agent
from mcp_guard import synthesize_default_policy
from mcp_guard.integrations.crewai import wrap_tools

agent = Agent(
    role="researcher",
    goal="answer the question",
    tools=wrap_tools(
        my_tools,
        policy=synthesize_default_policy(),
        user_context_fn=lambda: {"user": {...}},
    ),
)

wrap_tool is idempotent — re-wrapping is a no-op — so it's safe to apply at agent-construction time without tracking which tools were already guarded.

LLM-augmented synthesis for novel gaps

The deterministic synthesiser covers 9 attack-class patterns. For gap shapes none of them recognise, synthesize_with_llm adds an LLM fallback path that calls Anthropic Claude with a schema-pinned prompt and validates the response against the full PolicyRule schema before emitting the rule:

from mcp_guard import synthesize_with_llm

# Deterministic patterns handle this → no LLM call.
p1 = synthesize_with_llm("send_email to attacker@evil.com")

# Novel gap → falls back to Claude (requires [llm] extra)
p2 = synthesize_with_llm(
    "agent invoked custom_tool_xyz with arg target_id pointing to a "
    "privileged service account ID outside the user's tenant",
    fallback=True,
)

The validator rejects any response that doesn't match the PolicyRule schema (invalid operator, missing fields, etc.) and returns an empty policy on failure — better to miss a rule than ship a malformed one.

What kinds of gaps does the synthesiser cover?

The deterministic synthesiser is intentionally pattern-based and auditable. As of v0.2.0, 9 attack classes map onto 122 rules in the default policy:

#	Attack class	What it denies	Tool families covered
1	Email contact exfil	`send_email` whose `to` arg is outside the user's `context.user.contacts`	5 email tool names
2	Sensitive file read	`read_file` whose `path` matches `~/.ssh/`, `~/.aws/`, `/etc/shadow`, `kubeconfig`, etc.	6 read tool names
3	Sensitive file write	`write_file` whose `path` matches `~/.bashrc`, `~/.ssh/authorized_keys`, `/etc/`, `/usr/bin/`, cron, `.git/config`, `.env`, etc.	5 write tool names
4	Path traversal	Any path arg containing `../`, `..\`, URL-encoded variants (`%2e%2e`, `%2F`/`%5C`), double-encoded, Unicode division-slash	17 file-path tool names
5	SSRF (private host)	`fetch_url` / `http_get` whose `url` targets RFC1918, loopback, link-local, AWS/GCP metadata, IPv6 unique-local	6 HTTP tool names
6	Shell command danger	`shell_exec` / `bash` / `run_command` containing chaining (`;`, `&&`), pipe-to-shell, command substitution (`$()`, backticks), `rm -rf /`, `curl	sh`, fork bombs
7	SQL danger	`db_query` / `execute_sql` containing `DROP TABLE`, `TRUNCATE`, unbounded `DELETE`/`UPDATE`, `UNION SELECT`, `information_schema` probes, stacked queries, `xp_/sp_` exec, `LOAD_FILE`, `INTO OUTFILE`	6 SQL tool names × 3 arg names
8	Network egress private	`tcp_connect` / `socket_connect` whose `host` is private/internal	5 network tool names
9	Email body PII / secret exfil	`send_email` whose `body`/`subject` contains AWS keys, OpenAI/Anthropic keys, GitHub PATs, Slack tokens, private-key headers, SSN, JWT, credit-card numbers	5 email tool names × 4 arg names

For gap shapes not yet covered, the synthesiser returns an empty policy (deliberate — we surface "no rule generated" rather than fabricate a wrong rule). Adding a new gap shape is one constructor + one test.

LLM-driven synthesis can layer on top later for novel cases the patterns don't cover; the deterministic path stays as a backstop because it's auditable from logs alone (no model required at synthesis time).

Backtest corpus

default_corpus() returns a 124-case fixture corpus of (tool_name, args, user_context, expected_verdict) tuples covering every built-in pattern. v0.4.0 expanded coverage to: post-RCE env recon (env dump, printenv, secret-keyword grep, secret-extension find), Windows sensitive paths (Credentials manager, DPAPI keys, hosts file, scheduled tasks, registry Run keys), Postgres COPY/pg_read_file RCE, MySQL INTO DUMPFILE, MSSQL xp_cmdshell, jar://ftp://dict:// SSRF schemes, RSA/OpenSSH PEM headers, GitHub PATs, Slack tokens.

v0.5.0 default-policy metrics:

Corpus size:      304
TP (caught):      106 / 106 attacks   →  TPR 1.0000
FP (over-blocks):   2 / 198 legit     →  FPR 0.0101

The FPR drops as the legit denominator grows; the 2 FPs are still the same architectural floor (legitimate first-time recipients that contact-allowlist policies block by definition).

The 2 remaining FPs are architecturally inherent to contact-allowlist policies (legitimate first-time recipients). They are kept in the corpus on purpose so the FPR is a real number rather than a vanity zero. Tune by adding allow-list conditions to user_context per recipient class (e.g. distinguish "vendor onboarding" or "interview candidate" tiers from generic external).

Category	Legit cases	Attack cases
Email contact allowlist	6 (4 in-contacts + 2 FP-risk)	3
Sensitive file read	1	3
Sensitive file write	2	4
Path traversal	2	3
SSRF	3	4
Shell danger	3	5
SQL danger	3	5
Network egress private	2	3
Email PII exfil	2	5
Misc legit (read_ticket / search_users)	2	—

Real production deployments should replace default_corpus() with a load from a labelled traffic store. The rest of the backtest pipeline stays the same.

Relationship to `purple-scaffold`

purple-scaffold is the offensive / measurement side: probes that test how indirect-prompt-injection compliance varies across MCP server vectors, models, and product wrappers. mcp-guard is the defensive side: deterministic policies that catch the attack patterns the probes find.

Both repos share the same evaluator core; mcp-guard is the graduation of the policy modules from purple-scaffold/purple/ into a standalone package.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.2

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_guardrails-0.5.2.tar.gz (60.2 kB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_guardrails-0.5.2-py3-none-any.whl (50.5 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file mcp_guardrails-0.5.2.tar.gz.

File metadata

Download URL: mcp_guardrails-0.5.2.tar.gz
Upload date: May 15, 2026
Size: 60.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mcp_guardrails-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`845b0e9013461ec4c2486821f59b8f4a4f2374c882e211ea8e5ba9581661b08f`
MD5	`93f681a401e7d833fffc6063a020b595`
BLAKE2b-256	`7a5314ef6ffaad10749dbd9eff8386c3f3670ea3ad2d0e721d66a15a33319618`

See more details on using hashes here.

File details

Details for the file mcp_guardrails-0.5.2-py3-none-any.whl.

File metadata

Download URL: mcp_guardrails-0.5.2-py3-none-any.whl
Upload date: May 15, 2026
Size: 50.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mcp_guardrails-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`387a82246d7c403564e8a523b650722269a0431238be39e1cb8deae7dfecfd65`
MD5	`e3301cfe5b555a7b391c8bb250254df1`
BLAKE2b-256	`14ead5908237d2a8f757d3c4473243e19ceb268895bc22e4213e4a09934b12e0`

See more details on using hashes here.

mcp-guardrails 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mcp-guard

Why

Install

Quickstart — Python API

Quickstart — CLI

Wiring into your agent

Anthropic MCP Python SDK

LangChain

LlamaIndex

CrewAI

LLM-augmented synthesis for novel gaps

What kinds of gaps does the synthesiser cover?

Backtest corpus

Relationship to purple-scaffold

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Relationship to `purple-scaffold`