Whitelist NLP intent enforcement for MCP agents — pre-execution tool call validation
Project description
MCP Guardian
Agent intent enforcement for MCP tool calls — pre-execution security for AI agents.
MCP Guardian is not a firewall for MCP servers. It's a declarative intent guardrail for agent behavior. It validates every tool call against declared intent policies before execution. If the call doesn't match the policy, the MCP server never sees it.
Install
pip install mcp-guardian-ai
This pulls in all dependencies: openai-agents, pydantic, pyyaml.
Or install everything explicitly:
pip install mcp-guardian-ai openai-agents pydantic pyyaml
Set your OpenAI API key (used by the LLM intent evaluator — the fast check tier runs without it):
export OPENAI_API_KEY=sk-...
Note: The PyPI package is
mcp-guardian-ai. The Python import ismcp_guardian.
For development from source:
git clone https://github.com/mcp-guardian/mcp-guardian.git
cd mcp-guardian
pip install -e ".[dev]"
Three Ways to Use It
Path 1: Pure Python (no files needed)
import asyncio
from agents import Agent, Runner
from agents.mcp import MCPServerStreamableHttp
from mcp_guardian import GuardianToolGuardrail, IntentPolicy
policy = IntentPolicy(
name="read-only",
description="Read files only — no writes, no shell",
expected_workflow="Read and list files to answer user questions",
forbidden_tools=["write_*", "execute_*", "delete_*"],
)
guardrail = GuardianToolGuardrail(policy=policy)
async def main():
async with MCPServerStreamableHttp(
name="my-server",
params={"url": "https://my-mcp-server.example.com/mcp"},
) as server:
tools = await guardrail.wrap_mcp_tools([server])
agent = Agent(name="Worker", model="gpt-4o", tools=tools)
result = await Runner.run(agent, "List all files")
print(result.final_output)
# Print audit log
for entry in guardrail.audit_log:
verdict = str(entry.verdict)
icon = "✓" if verdict == "allow" else "✗"
print(f" {icon} {entry.tool_name} → {verdict.upper()} "
f"(conf={entry.confidence:.2f}, {entry.method}, {entry.elapsed_ms:.0f}ms)")
if verdict != "allow":
print(f" Reason: {entry.reason}")
asyncio.run(main())
Path 2: YAML policy file (recommended)
Define a policy.yaml:
name: read-only
description: Read-only file access
expected_workflow: Read and list files to answer user questions
allowed_tools: ["read_*", "list_*"]
forbidden_tools: ["write_*", "execute_*", "delete_*"]
allowed_transitions:
list_directory: [read_file, list_directory]
read_file: [read_file, list_directory]
constraints:
- Do not access files outside the working directory
escalation_threshold: 0.7
Load it:
policy = IntentPolicy.from_file("policy.yaml")
guardrail = GuardianToolGuardrail(policy=policy)
Path 3: guardian.yaml + policy files (multi-server / production)
A single guardian.yaml ties together multiple servers, per-server policies, auth headers, and model settings:
model: gpt-4o
guardian_model: gpt-4o
default_policy: policies/default.yaml
servers:
- name: filesystem
url: https://fs-server.example.com/mcp
policy: policies/read-only.yaml
- name: database
url: https://db-server.example.com/mcp
policy: policies/db-read-only.yaml
headers:
Authorization: "Bearer ${DB_TOKEN}"
config = GuardianConfig.from_file("guardian.yaml")
See the Quick Start for complete examples of all three paths.
Tool Sources: MCP, Local Functions, or Both
The three policy paths above are about how you configure the policy. This section is about what tools you attach the policy to. mcp-guardian is tool-source agnostic — it runs against the SDK's general FunctionTool type, of which MCP-discovered tools are one case. Three entry points cover the spectrum:
| Entry point | Use when |
|---|---|
guardrail.wrap_mcp_tools(servers) |
Connecting to one or more MCP servers. Handles discovery, schema sanitization for OpenAI strict mode, and emits a WARNING on non-canonical tool names (defense against case-perturbation evasion). |
guardrail.attach_to_tools(tools) |
You already hold a list of FunctionTool objects — locally decorated with @function_tool, constructed from JSON schemas, or returned from somewhere other than MCP. Attaches the guardrail's ToolInputGuardrail to each tool's tool_input_guardrails list. |
guardrail.make_input_guardrail() |
Full control. Returns the SDK's ToolInputGuardrail directly; wire it onto any subset of tools you choose. |
Local @function_tool example
from agents import Agent, Runner, function_tool
from mcp_guardian import GuardianToolGuardrail, IntentPolicy
@function_tool
def read_file(path: str) -> str:
"""Read a file from disk."""
return open(path).read()
@function_tool
def write_file(path: str, content: str) -> None:
"""Write content to a file."""
open(path, "w").write(content)
policy = IntentPolicy(
name="read-only",
description="Read-only access",
expected_workflow="Answer the user by reading files; never modify.",
allowed_tools=["read_*"],
forbidden_tools=["write_*"],
fast_path_allow=True,
)
guardrail = GuardianToolGuardrail(policy=policy)
tools = guardrail.attach_to_tools([read_file, write_file])
agent = Agent(name="Assistant", model="gpt-4o", tools=tools)
A complete runnable version is in example_local.py. Same policy shape and same defenses (fast-path block, transition graph, case_insensitive_patterns, audit log) as the MCP-based example.py.
What is MCP-specific
Three behaviours only apply on the wrap_mcp_tools path because they exist at the trust boundary with an untrusted server:
- Schema sanitization strips JSON-Schema constructs that MCP servers emit but OpenAI's strict mode rejects (
format,default,$ref,allOf, etc.). Locally-decorated tools usually produce clean schemas from Python signatures and don't need this. - Discovery-time
WARNINGfires on tool names outside^[a-z0-9_\-./]+$. Local tools have names the developer chose; non-canonical there is self-inflicted, not an evasion attempt. MCPUtil.to_function_toolconversion translates MCP tool descriptors into SDKFunctionToolinstances.
Out of scope
OpenAI's hosted tools — WebSearchTool, FileSearchTool, ComputerTool, CodeInterpreterTool — are not FunctionTool instances and don't expose the tool_input_guardrails hook. mcp-guardian can't gate them as currently designed. Use the SDK's own approval / filter mechanisms for those, and consider them outside the trust boundary mcp-guardian enforces.
How It Works
Three-tier enforcement pipeline on every tool call:
- Fast check (0ms) — forbidden tools, whitelists, glob patterns, transition graph. Deterministic, no LLM, impossible to bypass with prompt injection.
- LLM intent evaluation (1–5s) — analyzes the call against policy constraints and workflow context.
- Escalation — low-confidence decisions flagged for human review.
The transition graph (allowed_transitions) is a state machine over tool calls — similar to LangGraph, but enforced externally on the agent rather than built into the agent's own execution graph. After tool A, only tools B and C are allowed. Everything else is blocked deterministically at 0ms.
This makes MCP Guardian a reasoning guardrail, not just a tool filter. Anyone can do allow/block lists. The LLM intent evaluation layer supervises the agent's reasoning — catching an allowed tool called with suspicious arguments, or a permitted call that doesn't fit the declared intent. A second LLM evaluating the first LLM's decisions.
The guardian LLM defaults to gpt-4o-mini (fast, cheap) but can point at any OpenAI-compatible endpoint — Ollama, vLLM, Azure OpenAI, or a fine-tuned model:
# Use a local Ollama model for the guardian
guardrail = GuardianToolGuardrail(
policy=policy,
guardian_model="llama3.2",
guardian_base_url="http://localhost:11434/v1",
)
Or in guardian.yaml:
guardian_model: llama3.2
guardian_base_url: http://localhost:11434/v1
Every evaluation is logged with verdict, confidence, timing, and reasoning.
Policy Fields
| Field | Purpose |
|---|---|
allowed_tools |
Whitelist with glob patterns (read_*, list_*) |
forbidden_tools |
Blacklist — always blocked (write_*, execute_*) |
allowed_transitions |
State machine: tool A → [tool B, C] |
constraints |
Free-text rules for the LLM evaluator |
expected_workflow |
What the agent should be doing (LLM context) |
escalation_threshold |
Below this confidence → ask human |
Demo: Exfiltration Prevention
A working demo blocks a data exfiltration attack across two MCP servers. The agent reads a secret (allowed), then an adversarial prompt tries to send it to an attacker URL — blocked at Tier 1 by the transition graph (0ms) and independently at Tier 2 by the LLM constraints.
See demos/exfiltration/ for details.
Documentation
The full docs are built with MkDocs Material. Run them locally with Docker:
docker build -f Dockerfile.docs -t mcp-guardian-docs .
docker run -p 8000:8000 -v $(pwd)/docs:/docs/docs mcp-guardian-docs
Then open http://localhost:8000. The -v mount gives you live reload as you edit.
Or without Docker:
pip install mkdocs-material
mkdocs serve
Roadmap
The core engine (policies, fast check, transition graphs, LLM evaluation, audit log) is SDK-agnostic. Currently we ship an adapter for the OpenAI Agents SDK. Future adapters under consideration:
| Runtime | Hook point | Status |
|---|---|---|
| OpenAI Agents SDK | ToolInputGuardrail |
Shipped |
| Anthropic Claude | PreToolUse hook |
Planned |
| Microsoft Agent Framework | FunctionInvocationFilter |
Planned |
Same YAML policies, same pip install, any runtime. Feedback welcome — open an issue if your framework isn't listed.
Built On
- OpenAI Agents SDK —
ToolInputGuardrail,AgentHooksBase - Model Context Protocol (MCP) — tool server standard
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_guardian_ai-0.2.0.tar.gz.
File metadata
- Download URL: mcp_guardian_ai-0.2.0.tar.gz
- Upload date:
- Size: 62.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73cf3f618c61db9d54d1ae23e19178e170f417ed5978e72536849d8eb3f266c7
|
|
| MD5 |
25984ca69e8c6809ce959b928a0e21f7
|
|
| BLAKE2b-256 |
b76cb26e2e9578cc1862282de6f85dde41435ca8cefb291c147cb3967ca53434
|
File details
Details for the file mcp_guardian_ai-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mcp_guardian_ai-0.2.0-py3-none-any.whl
- Upload date:
- Size: 44.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4292a0a674d951eb9408cd388009f1894322817fb3367611d0b8a1c6a162c5d4
|
|
| MD5 |
495adcd6708bf8e9c219c9ba62f979d2
|
|
| BLAKE2b-256 |
6741b318dbda7e9cb9fdaa9dc6861ef6dc5d91887b174ea09606ef1219d7b22a
|