MCP tool-call guardrails with static and semantic policy checks
Project description
IntentGuard
IntentGuard is a Python guardrail layer for MCP tool calls. It runs as a proxy between an agent client and an MCP server, enforcing both static policy checks and optional semantic intent checks before a tool call is forwarded.
What is implemented (MVP)
The current implementation covers all 4 roadmap phases from agent.md:
- CLI Interceptor (Phase 1)
intent_guard/proxy.py+intent_guard/sdk/mcp_proxy.pyimplement a stdio proxy that interceptstools/callJSON-RPC requests and can block/allow calls. - Static Engine (Phase 2)
intent_guard/sdk/engine.pyloads YAML policy and enforces:forbidden_toolsprotected_paths(glob/fnmatch style)max_tokens_per_callcustom_policies(tool-specific argument requirements/forbidden arguments)
- Semantic Guardrail Providers (Phase 3)
intent_guard/sdk/providers.pysupports:OllamaProvider(POST /api/generate)LiteLLMProvider(litellm.completion) usingLLM_MODELandOPENAI_API_KEY/ANTHROPIC_API_KEYfrom env Both providers include retries (exponential backoff + jitter) and a circuit breaker.
- Pause & Resume Feedback Loop (Phase 4)
terminal_approval_promptprovides interactive approval for flagged calls (Allow? [y/N]).
Repository layout
intent_guard/
├── __init__.py
├── proxy.py
└── sdk/
├── __init__.py
├── engine.py
├── mcp_proxy.py
└── providers.py
schema/
└── policy.yaml
tests/
├── conftest.py
└── test_integration_phases.py
Installation
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
Run tests
.venv/bin/pytest -q
Run live Ollama semantic tests only (requires local Ollama + llama3.1:8b available):
.venv/bin/pytest -q -m runOllamaProvider
If local model responses are slow, increase timeout (seconds):
OLLAMA_TIMEOUT_SECONDS=120 .venv/bin/pytest -q -m runOllamaProvider
The live semantic suite defaults to OLLAMA_RAW=false and bounded generation tuned for llama3.1:8b. You can tune:
OLLAMA_TIMEOUT_SECONDS=60 OLLAMA_NUM_PREDICT=256 OLLAMA_RAW=false \
.venv/bin/pytest -q -m runOllamaProvider
Integration tests cover all phases:
- phase 1: interception and logging behavior
- phase 2: static policy blocking
- phase 3: semantic provider flow (mocked Ollama HTTP call)
- phase 4: approval allow/deny behavior
Policy file
Use schema/policy.yaml as a starting point:
static_rules:
forbidden_tools: ["delete_database", "purge_all"]
protected_paths: ["/etc/*", ".env", "src/auth/*"]
max_tokens_per_call: 4000
rate_limits:
enabled: 1 # required to turn rate limiting on; 0/false bypasses checks
default:
max_calls: 60
window_seconds: 60
by_tool:
write_file:
max_calls: 10
window_seconds: 60
custom_policies:
- tool_name: write_file
args:
all_present: ["path", "content"]
should_not_present: ["sudo"]
semantic_rules:
provider: ollama # or litellm
mode: enforce # off | enforce | advisory
prompt_version: "v2"
guardrail_model: llama3.1:8b
critical_intent_threshold: 0.85
retry_attempts: 2
retry_base_delay_seconds: 0.25
retry_max_delay_seconds: 2.0
retry_jitter_ratio: 0.2
circuit_breaker_failures: 3
circuit_breaker_reset_seconds: 30
provider_fail_mode:
default: advisory # fail-open
by_tool:
delete_database: enforce # fail-closed
constraints:
- intent: modify_source_code
allowed_scope: Actions must only affect UI components or styles.
forbidden_scope: Should not modify database schemas or auth logic.
Rubric scoring (v2)
Set prompt_version: "v2" to switch from opaque LLM-assigned scores to
multi-signal rubric scoring. Instead of asking the LLM for a single confidence
number, the engine asks concrete yes/no questions across multiple dimensions
and computes the score deterministically from the answers.
semantic_rules:
prompt_version: "v2"
critical_intent_threshold: 0.85
scoring:
dimensions:
tool_task_alignment:
weight: 0.25
argument_scope_compliance:
weight: 0.30
no_forbidden_scope_violation:
weight: 0.30
no_side_effect_risk:
weight: 0.15
Default dimensions (used when scoring is omitted):
| Dimension | Question | Default weight |
|---|---|---|
tool_task_alignment |
Is this tool appropriate for the stated task? | 0.25 |
argument_scope_compliance |
Are arguments within the allowed scope? | 0.30 |
no_forbidden_scope_violation |
Do arguments avoid the forbidden scope? | 0.30 |
no_side_effect_risk |
Is the call free of destructive/exfil risk? | 0.15 |
Score formula: Σ(weight × pass) / Σ(weight). With 4 equal-pass dimensions
the score is 1.0; any single failure drops below the 0.85 threshold.
Decisions include dimension_scores with per-dimension passed and
evidence for full auditability.
CLI usage
INTENT_GUARD_TASK="Only update frontend styles" \
python -m intent_guard.proxy \
--policy schema/policy.yaml \
--target "npx @modelcontextprotocol/server-filesystem /path/to/repo" \
--model llama3.1:8b \
--approval-webhook "https://approval.internal/intent-guard" \
--approval-timeout 10 \
--approval-default-action deny
Flags
--policy: YAML policy path--target: target MCP server command--model: optional Ollama model name for semantic checks--task: optional task context (or setINTENT_GUARD_TASK)--ask-approval: prompt user before allowing flagged calls--approval-webhook: call this webhook for non-interactive approval decisions--approval-timeout: timeout (seconds) for webhook approvals--approval-default-action:allowordenywhen webhook approval times out/fails
Native hook integration
IntentGuard can run as the policy engine behind native hooks in Claude Code, Copilot, and Cursor.
Evaluate command
Use the unified command:
intent-guard evaluate --policy schema/policy.yaml
Input:
- Reads a hook payload JSON object from stdin
- Supports generic keys like
tool_name,arguments,task_context - Also supports nested payloads (
params.name,params.arguments) and common aliases (tool_input,args,prompt)
Output:
- Prints a
GuardDecisionJSON object to stdout - Exit code
0for allow,1for block,2for invalid input
Hook config templates
Template files are shipped under hooks/:
hooks/claude-code/settings.jsonhooks/copilot/hooks.jsonhooks/cursor/hooks.json
Each template invokes:
cat | intent-guard evaluate --policy schema/policy.yaml
This lets platform-native hooks call IntentGuard directly instead of wrapping only MCP servers.
Encoded payload detection
Static checks can decode and normalize argument payloads before matching:
- URL decoding
- Unicode normalization (NFKC)
- Base64 decoding (when valid)
Enable or disable via:
static_rules:
decode_arguments: true
When enabled, injection, sensitive-data, and protected-path checks run against decoded variants to catch obfuscated bypasses.
Response-side inspection
IntentGuard can inspect MCP server responses before forwarding them to the client.
Configure response_rules in policy:
response_rules:
action: block # block | warn | redact
detect_base64: true
patterns:
- name: "GitHub Token"
pattern: "gh[ps]_[A-Za-z0-9_]{36,}"
Behavior:
block: return JSON-RPC error and suppress original responsewarn: forward response and log warning decisionredact: redact matched text and forward sanitized response
Tool description change detection (rug-pull protection)
IntentGuard can snapshot MCP tools/list metadata and detect changes over time.
Configure:
tool_change_rules:
enabled: true
action: warn # warn | block
Behavior:
- On first
tools/list, stores snapshot in.intent-guard/tool-snapshots/<server-hash>.json - On subsequent
tools/list, comparesname,description, andinputSchema warn: log warning and continueblock: block response when drift is detected
Semantic mode and provider failure behavior
semantic_rules.mode controls normal semantic enforcement:
off: semantic check disabledenforce: semantic failures block tool callsadvisory: semantic failures are logged as warnings but calls are allowed
semantic_rules.provider_fail_mode controls behavior when semantic provider is unavailable:
- supports
defaultand per-toolby_tooloverride - values use the same mode set:
off|enforce|advisory
Behavior matrix for tool criticality tiers (example mapping):
| Tool tier | provider_fail_mode |
Outcome on provider outage |
|---|---|---|
| Critical tools | enforce |
Fail-closed (block + approval required) |
| Standard tools | advisory |
Fail-open with warning decision |
| Low-risk tools | off |
Fail-open without warning severity |
Define tiers by assigning tools in provider_fail_mode.by_tool.
semantic_rules.prompt_version is copied into every semantic decision and log entry as semantic_prompt_version so prompt changes are auditable.
Semantic decision caching
To reduce repeated provider calls for identical semantic evaluations:
semantic_rules:
decision_cache:
enabled: true
max_size: 256
ttl_seconds: 300
Cache key uses (tool_name, arguments, task_context). Static checks always run; only semantic verdicts are cached.
LiteLLM provider
To use the API provider, set in .env (or process env):
LLM_MODEL=claude-3-5-sonnet-20241022
ANTHROPIC_API_KEY=...
# or OPENAI_API_KEY=...
Then set semantic_rules.provider: litellm (or just set LLM_MODEL and omit explicit provider).
CI break-glass options
INTENT_GUARD_BREAK_GLASS_TOKEN: if set, flagged calls are auto-approved with override metadata.INTENT_GUARD_BREAK_GLASS_SIGNED_TOKEN+INTENT_GUARD_BREAK_GLASS_SIGNING_KEY: optional HMAC-signed break-glass token for CI. Token format is<base64url(json payload)>.<base64url(signature)>where signature isHMAC-SHA256(payload_part, signing_key)and payload contains futureexp(unix timestamp), for example{"exp": 4102444800}.INTENT_GUARD_APPROVAL_AUTH_TOKEN: bearer token added to webhook approval requests.
SDK usage (Python)
from intent_guard import IntentGuardSDK
guard = IntentGuardSDK(
policy_path="schema/policy.yaml",
local_model="llama3.1:8b",
task_context="Only modify UI components"
)
decision = guard.evaluate("write_file", {"path": "src/auth/config.py"})
print(decision.allowed, decision.reason)
GuardDecision contract (stable)
GuardDecision now includes machine-readable metadata for enforcement and analytics:
decision_id(UUID)codeseveritypolicy_namepolicy_versionrule_idtimestamp(UTC ISO-8601)override(who/why/ttl, when manually approved)semantic_prompt_version(when semantic checks are applied)
Backward compatibility:
- Existing fields
allowed,reason,requires_approval,semantic_scoreare unchanged. - New fields are always present with safe defaults, so existing consumers can ignore them.
Semantic eval harness
IntentGuard ships a lightweight semantic eval harness used in tests to measure model behavior on known-safe and known-unsafe tool calls.
- Dataset fixtures:
tests/fixtures/semantic_eval_dataset.json - Replay verdicts:
tests/fixtures/semantic_eval_verdicts.json - Metrics computed: precision, recall, accuracy
This enables reproducible regression checks for semantic policy quality.
Versioning/migration strategy:
- Keep parsing logic tolerant of unknown fields.
- Use
policy_version+code+rule_idfor downstream contract evolution and dashboards. - Prefer adding new fields over changing/removing existing field semantics.
Usage examples with popular tools
1) Claude Code (MCP server proxy)
Configure the MCP server command to run through IntentGuard:
{
"mcpServers": {
"filesystem": {
"command": "python",
"args": [
"-m",
"intent_guard.proxy",
"--policy",
"schema/policy.yaml",
"--target",
"npx @modelcontextprotocol/server-filesystem /path/to/repo",
"--ask-approval"
],
"env": {
"INTENT_GUARD_TASK": "Refactor UI only; do not touch auth or database"
}
}
}
}
2) Codex (MCP command wrapping)
For Codex setups that support MCP server command configuration, point the server command to IntentGuard first, then to your real MCP server as --target:
python -m intent_guard.proxy \
--policy schema/policy.yaml \
--target "npx @modelcontextprotocol/server-filesystem /path/to/repo" \
--ask-approval
Use that command as the configured MCP server entry in your Codex environment.
3) LangSmith / LangChain workflows
Use IntentGuard before each tool execution and keep normal LangSmith tracing:
from langsmith import traceable
from intent_guard import IntentGuardSDK
guard = IntentGuardSDK(
policy_path="schema/policy.yaml",
task_context="Only update docs and UI text"
)
@traceable(name="guarded_tool_call")
def guarded_call(tool_name: str, args: dict, tool_callable):
decision = guard.evaluate(tool_name, args)
if not decision.allowed:
raise PermissionError(f"IntentGuard blocked: {decision.reason}")
return tool_callable(**args)
This keeps execution decisions visible in traces while enforcing IntentGuard policy at runtime.
Build and publish (pip / Artifactory)
Build source and wheel distributions:
python3 -m venv .venv
.venv/bin/pip install -U pip build twine
.venv/bin/python -m build
Publish to your Artifactory PyPI repository:
export TWINE_USERNAME="<artifactory-username>"
export TWINE_PASSWORD="<artifactory-password-or-token>"
.venv/bin/python -m twine upload \
--repository-url "https://<artifactory-host>/artifactory/api/pypi/<pypi-repo>/local" \
dist/*
Integration testing and Docker
Current integration tests are in-process (tests/test_integration_phases.py) and do not require a database or cache service.
If a future change adds external DB/cache dependencies, run those services in Docker for tests (same pattern as temp-noob/rule-engine) so test setup remains reproducible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_intent_guard-0.1.0.tar.gz.
File metadata
- Download URL: agent_intent_guard-0.1.0.tar.gz
- Upload date:
- Size: 54.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0c2df040ef6794e6334ccb2ebb6beb2a7029b8cf0d096071d708a287ec1d3fd
|
|
| MD5 |
c3b710ba627bca98b0ccc00868097820
|
|
| BLAKE2b-256 |
5cc2cd3280ce265809294a053e4895aa621b049456b5a51cc505853a71c242d9
|
File details
Details for the file agent_intent_guard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agent_intent_guard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 35.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc24ae14cf2f6f74d490660d4d1e25e250152714c69d5829123a499c0cf1effd
|
|
| MD5 |
649b9b722ad094cc4f9de089d55991cd
|
|
| BLAKE2b-256 |
775c4b91fca682f92ee3e9120231eeed2674d7c739d162b1c11beed74d5a5fc1
|