Skip to main content

MCP tool-call guardrails with static and semantic policy checks

Project description

IntentGuard

IntentGuard is a Python guardrail layer for MCP tool calls. It runs as a proxy between an agent client and an MCP server, enforcing both static policy checks and optional semantic intent checks before a tool call is forwarded.

What is implemented (MVP)

The current implementation covers all 4 roadmap phases from agent.md:

  1. CLI Interceptor (Phase 1)
    intent_guard/proxy.py + intent_guard/sdk/mcp_proxy.py implement a stdio proxy that intercepts tools/call JSON-RPC requests and can block/allow calls.
  2. Static Engine (Phase 2)
    intent_guard/sdk/engine.py loads YAML policy and enforces:
    • forbidden_tools
    • protected_paths (glob/fnmatch style)
    • max_tokens_per_call
    • custom_policies (tool-specific argument requirements/forbidden arguments)
  3. Semantic Guardrail Providers (Phase 3)
    intent_guard/sdk/providers.py supports:
    • OllamaProvider (POST /api/generate)
    • LiteLLMProvider (litellm.completion) using LLM_MODEL and OPENAI_API_KEY / ANTHROPIC_API_KEY from env Both providers include retries (exponential backoff + jitter) and a circuit breaker.
  4. Pause & Resume Feedback Loop (Phase 4)
    terminal_approval_prompt provides interactive approval for flagged calls (Allow? [y/N]).

Repository layout

intent_guard/
├── __init__.py
├── proxy.py
└── sdk/
    ├── __init__.py
    ├── engine.py
    ├── mcp_proxy.py
    └── providers.py
schema/
└── policy.yaml
tests/
├── conftest.py
└── test_integration_phases.py

Installation

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt

Run tests

.venv/bin/pytest -q

Run live Ollama semantic tests only (requires local Ollama + llama3.1:8b available):

.venv/bin/pytest -q -m runOllamaProvider

If local model responses are slow, increase timeout (seconds):

OLLAMA_TIMEOUT_SECONDS=120 .venv/bin/pytest -q -m runOllamaProvider

The live semantic suite defaults to OLLAMA_RAW=false and bounded generation tuned for llama3.1:8b. You can tune:

OLLAMA_TIMEOUT_SECONDS=60 OLLAMA_NUM_PREDICT=256 OLLAMA_RAW=false \
  .venv/bin/pytest -q -m runOllamaProvider

Integration tests cover all phases:

  • phase 1: interception and logging behavior
  • phase 2: static policy blocking
  • phase 3: semantic provider flow (mocked Ollama HTTP call)
  • phase 4: approval allow/deny behavior

Policy file

Use schema/policy.yaml as a starting point:

static_rules:
  forbidden_tools: ["delete_database", "purge_all"]
  protected_paths: ["/etc/*", ".env", "src/auth/*"]
  max_tokens_per_call: 4000
  rate_limits:
    enabled: 1 # required to turn rate limiting on; 0/false bypasses checks
    default:
      max_calls: 60
      window_seconds: 60
    by_tool:
      write_file:
        max_calls: 10
        window_seconds: 60

custom_policies:
  - tool_name: write_file
    args:
      all_present: ["path", "content"]
      should_not_present: ["sudo"]

semantic_rules:
  provider: ollama # or litellm
  mode: enforce # off | enforce | advisory
  prompt_version: "v2"
  guardrail_model: llama3.1:8b
  critical_intent_threshold: 0.85
  retry_attempts: 2
  retry_base_delay_seconds: 0.25
  retry_max_delay_seconds: 2.0
  retry_jitter_ratio: 0.2
  circuit_breaker_failures: 3
  circuit_breaker_reset_seconds: 30
  provider_fail_mode:
    default: advisory # fail-open
    by_tool:
      delete_database: enforce # fail-closed
  constraints:
    - intent: modify_source_code
      allowed_scope: Actions must only affect UI components or styles.
      forbidden_scope: Should not modify database schemas or auth logic.

Rubric scoring (v2)

Set prompt_version: "v2" to switch from opaque LLM-assigned scores to multi-signal rubric scoring. Instead of asking the LLM for a single confidence number, the engine asks concrete yes/no questions across multiple dimensions and computes the score deterministically from the answers.

semantic_rules:
  prompt_version: "v2"
  critical_intent_threshold: 0.85
  scoring:
    dimensions:
      tool_task_alignment:
        weight: 0.25
      argument_scope_compliance:
        weight: 0.30
      no_forbidden_scope_violation:
        weight: 0.30
      no_side_effect_risk:
        weight: 0.15

Default dimensions (used when scoring is omitted):

Dimension Question Default weight
tool_task_alignment Is this tool appropriate for the stated task? 0.25
argument_scope_compliance Are arguments within the allowed scope? 0.30
no_forbidden_scope_violation Do arguments avoid the forbidden scope? 0.30
no_side_effect_risk Is the call free of destructive/exfil risk? 0.15

Score formula: Σ(weight × pass) / Σ(weight). With 4 equal-pass dimensions the score is 1.0; any single failure drops below the 0.85 threshold.

Decisions include dimension_scores with per-dimension passed and evidence for full auditability.

CLI usage

INTENT_GUARD_TASK="Only update frontend styles" \
python -m intent_guard.proxy \
  --policy schema/policy.yaml \
  --target "npx @modelcontextprotocol/server-filesystem /path/to/repo" \
  --model llama3.1:8b \
  --approval-webhook "https://approval.internal/intent-guard" \
  --approval-timeout 10 \
  --approval-default-action deny

Flags

  • --policy: YAML policy path
  • --target: target MCP server command
  • --model: optional Ollama model name for semantic checks
  • --task: optional task context (or set INTENT_GUARD_TASK)
  • --ask-approval: prompt user before allowing flagged calls
  • --approval-webhook: call this webhook for non-interactive approval decisions
  • --approval-timeout: timeout (seconds) for webhook approvals
  • --approval-default-action: allow or deny when webhook approval times out/fails

Native hook integration

IntentGuard can run as the policy engine behind native hooks in Claude Code, Copilot, and Cursor.

Evaluate command

Use the unified command:

intent-guard evaluate --policy schema/policy.yaml

Input:

  • Reads a hook payload JSON object from stdin
  • Supports generic keys like tool_name, arguments, task_context
  • Also supports nested payloads (params.name, params.arguments) and common aliases (tool_input, args, prompt)

Output:

  • Prints a GuardDecision JSON object to stdout
  • Exit code 0 for allow, 1 for block, 2 for invalid input

Hook config templates

Template files are shipped under hooks/:

  • hooks/claude-code/settings.json
  • hooks/copilot/hooks.json
  • hooks/cursor/hooks.json

Each template invokes:

cat | intent-guard evaluate --policy schema/policy.yaml

This lets platform-native hooks call IntentGuard directly instead of wrapping only MCP servers.

Encoded payload detection

Static checks can decode and normalize argument payloads before matching:

  • URL decoding
  • Unicode normalization (NFKC)
  • Base64 decoding (when valid)

Enable or disable via:

static_rules:
  decode_arguments: true

When enabled, injection, sensitive-data, and protected-path checks run against decoded variants to catch obfuscated bypasses.

Response-side inspection

IntentGuard can inspect MCP server responses before forwarding them to the client.

Configure response_rules in policy:

response_rules:
  action: block # block | warn | redact
  detect_base64: true
  patterns:
    - name: "GitHub Token"
      pattern: "gh[ps]_[A-Za-z0-9_]{36,}"

Behavior:

  • block: return JSON-RPC error and suppress original response
  • warn: forward response and log warning decision
  • redact: redact matched text and forward sanitized response

Tool description change detection (rug-pull protection)

IntentGuard can snapshot MCP tools/list metadata and detect changes over time.

Configure:

tool_change_rules:
  enabled: true
  action: warn # warn | block

Behavior:

  • On first tools/list, stores snapshot in .intent-guard/tool-snapshots/<server-hash>.json
  • On subsequent tools/list, compares name, description, and inputSchema
  • warn: log warning and continue
  • block: block response when drift is detected

Semantic mode and provider failure behavior

semantic_rules.mode controls normal semantic enforcement:

  • off: semantic check disabled
  • enforce: semantic failures block tool calls
  • advisory: semantic failures are logged as warnings but calls are allowed

semantic_rules.provider_fail_mode controls behavior when semantic provider is unavailable:

  • supports default and per-tool by_tool override
  • values use the same mode set: off|enforce|advisory

Behavior matrix for tool criticality tiers (example mapping):

Tool tier provider_fail_mode Outcome on provider outage
Critical tools enforce Fail-closed (block + approval required)
Standard tools advisory Fail-open with warning decision
Low-risk tools off Fail-open without warning severity

Define tiers by assigning tools in provider_fail_mode.by_tool.

semantic_rules.prompt_version is copied into every semantic decision and log entry as semantic_prompt_version so prompt changes are auditable.

Semantic decision caching

To reduce repeated provider calls for identical semantic evaluations:

semantic_rules:
  decision_cache:
    enabled: true
    max_size: 256
    ttl_seconds: 300

Cache key uses (tool_name, arguments, task_context). Static checks always run; only semantic verdicts are cached.

LiteLLM provider

To use the API provider, set in .env (or process env):

LLM_MODEL=claude-3-5-sonnet-20241022
ANTHROPIC_API_KEY=...
# or OPENAI_API_KEY=...

Then set semantic_rules.provider: litellm (or just set LLM_MODEL and omit explicit provider).

CI break-glass options

  • INTENT_GUARD_BREAK_GLASS_TOKEN: if set, flagged calls are auto-approved with override metadata.
  • INTENT_GUARD_BREAK_GLASS_SIGNED_TOKEN + INTENT_GUARD_BREAK_GLASS_SIGNING_KEY: optional HMAC-signed break-glass token for CI. Token format is <base64url(json payload)>.<base64url(signature)> where signature is HMAC-SHA256(payload_part, signing_key) and payload contains future exp (unix timestamp), for example {"exp": 4102444800}.
  • INTENT_GUARD_APPROVAL_AUTH_TOKEN: bearer token added to webhook approval requests.

SDK usage (Python)

from intent_guard import IntentGuardSDK

guard = IntentGuardSDK(
    policy_path="schema/policy.yaml",
    local_model="llama3.1:8b",
    task_context="Only modify UI components"
)

decision = guard.evaluate("write_file", {"path": "src/auth/config.py"})
print(decision.allowed, decision.reason)

GuardDecision contract (stable)

GuardDecision now includes machine-readable metadata for enforcement and analytics:

  • decision_id (UUID)
  • code
  • severity
  • policy_name
  • policy_version
  • rule_id
  • timestamp (UTC ISO-8601)
  • override (who/why/ttl, when manually approved)
  • semantic_prompt_version (when semantic checks are applied)

Backward compatibility:

  • Existing fields allowed, reason, requires_approval, semantic_score are unchanged.
  • New fields are always present with safe defaults, so existing consumers can ignore them.

Semantic eval harness

IntentGuard ships a lightweight semantic eval harness used in tests to measure model behavior on known-safe and known-unsafe tool calls.

  • Dataset fixtures: tests/fixtures/semantic_eval_dataset.json
  • Replay verdicts: tests/fixtures/semantic_eval_verdicts.json
  • Metrics computed: precision, recall, accuracy

This enables reproducible regression checks for semantic policy quality.

Versioning/migration strategy:

  • Keep parsing logic tolerant of unknown fields.
  • Use policy_version + code + rule_id for downstream contract evolution and dashboards.
  • Prefer adding new fields over changing/removing existing field semantics.

Usage examples with popular tools

1) Claude Code (MCP server proxy)

Configure the MCP server command to run through IntentGuard:

{
  "mcpServers": {
    "filesystem": {
      "command": "python",
      "args": [
        "-m",
        "intent_guard.proxy",
        "--policy",
        "schema/policy.yaml",
        "--target",
        "npx @modelcontextprotocol/server-filesystem /path/to/repo",
        "--ask-approval"
      ],
      "env": {
        "INTENT_GUARD_TASK": "Refactor UI only; do not touch auth or database"
      }
    }
  }
}

2) Codex (MCP command wrapping)

For Codex setups that support MCP server command configuration, point the server command to IntentGuard first, then to your real MCP server as --target:

python -m intent_guard.proxy \
  --policy schema/policy.yaml \
  --target "npx @modelcontextprotocol/server-filesystem /path/to/repo" \
  --ask-approval

Use that command as the configured MCP server entry in your Codex environment.

3) LangSmith / LangChain workflows

Use IntentGuard before each tool execution and keep normal LangSmith tracing:

from langsmith import traceable
from intent_guard import IntentGuardSDK

guard = IntentGuardSDK(
    policy_path="schema/policy.yaml",
    task_context="Only update docs and UI text"
)

@traceable(name="guarded_tool_call")
def guarded_call(tool_name: str, args: dict, tool_callable):
    decision = guard.evaluate(tool_name, args)
    if not decision.allowed:
        raise PermissionError(f"IntentGuard blocked: {decision.reason}")
    return tool_callable(**args)

This keeps execution decisions visible in traces while enforcing IntentGuard policy at runtime.

Build and publish (pip / Artifactory)

Build source and wheel distributions:

python3 -m venv .venv
.venv/bin/pip install -U pip build twine
.venv/bin/python -m build

Publish to your Artifactory PyPI repository:

export TWINE_USERNAME="<artifactory-username>"
export TWINE_PASSWORD="<artifactory-password-or-token>"
.venv/bin/python -m twine upload \
  --repository-url "https://<artifactory-host>/artifactory/api/pypi/<pypi-repo>/local" \
  dist/*

Integration testing and Docker

Current integration tests are in-process (tests/test_integration_phases.py) and do not require a database or cache service. If a future change adds external DB/cache dependencies, run those services in Docker for tests (same pattern as temp-noob/rule-engine) so test setup remains reproducible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_intent_guard-0.1.0.tar.gz (54.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_intent_guard-0.1.0-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file agent_intent_guard-0.1.0.tar.gz.

File metadata

  • Download URL: agent_intent_guard-0.1.0.tar.gz
  • Upload date:
  • Size: 54.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for agent_intent_guard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a0c2df040ef6794e6334ccb2ebb6beb2a7029b8cf0d096071d708a287ec1d3fd
MD5 c3b710ba627bca98b0ccc00868097820
BLAKE2b-256 5cc2cd3280ce265809294a053e4895aa621b049456b5a51cc505853a71c242d9

See more details on using hashes here.

File details

Details for the file agent_intent_guard-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_intent_guard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fc24ae14cf2f6f74d490660d4d1e25e250152714c69d5829123a499c0cf1effd
MD5 649b9b722ad094cc4f9de089d55991cd
BLAKE2b-256 775c4b91fca682f92ee3e9120231eeed2674d7c739d162b1c11beed74d5a5fc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page