MCP tool-call guardrails with static and semantic policy checks

These details have not been verified by PyPI

Project description

IntentGuard

IntentGuard is a Python guardrail layer for MCP tool calls. It runs as a proxy between an agent client and an MCP server, enforcing both static policy checks and optional semantic intent checks before a tool call is forwarded.

What is implemented (MVP)

The current implementation covers all 4 roadmap phases from agent.md:

CLI Interceptor (Phase 1)
intent_guard/proxy.py + intent_guard/sdk/mcp_proxy.py implement a stdio proxy that intercepts tools/call JSON-RPC requests and can block/allow calls.
Static Engine (Phase 2)
intent_guard/sdk/engine.py loads YAML policy and enforces:
- forbidden_tools
- protected_paths (glob/fnmatch style)
- max_tokens_per_call
- custom_policies (tool-specific argument requirements/forbidden arguments)
Semantic Guardrail Providers (Phase 3)
intent_guard/sdk/providers.py supports:
- OllamaProvider (POST /api/generate)
- LiteLLMProvider (litellm.completion) using LLM_MODEL and OPENAI_API_KEY / ANTHROPIC_API_KEY from env Both providers include retries (exponential backoff + jitter) and a circuit breaker.
Pause & Resume Feedback Loop (Phase 4)
terminal_approval_prompt provides interactive approval for flagged calls (Allow? [y/N]).

Repository layout

intent_guard/
├── __init__.py
├── proxy.py
└── sdk/
    ├── __init__.py
    ├── engine.py
    ├── mcp_proxy.py
    └── providers.py
schema/
└── policy.yaml
tests/
├── conftest.py
└── test_integration_phases.py

Installation

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt

Run tests

.venv/bin/pytest -q

Run live Ollama semantic tests only (requires local Ollama + llama3.1:8b available):

.venv/bin/pytest -q -m runOllamaProvider

If local model responses are slow, increase timeout (seconds):

OLLAMA_TIMEOUT_SECONDS=120 .venv/bin/pytest -q -m runOllamaProvider

The live semantic suite defaults to OLLAMA_RAW=false and bounded generation tuned for llama3.1:8b. You can tune:

OLLAMA_TIMEOUT_SECONDS=60 OLLAMA_NUM_PREDICT=256 OLLAMA_RAW=false \
  .venv/bin/pytest -q -m runOllamaProvider

Integration tests cover all phases:

phase 1: interception and logging behavior
phase 2: static policy blocking
phase 3: semantic provider flow (mocked Ollama HTTP call)
phase 4: approval allow/deny behavior

Policy file

Use schema/policy.yaml as a starting point:

static_rules:
  forbidden_tools: ["delete_database", "purge_all"]
  protected_paths: ["/etc/*", ".env", "src/auth/*"]
  max_tokens_per_call: 4000
  rate_limits:
    enabled: 1 # required to turn rate limiting on; 0/false bypasses checks
    default:
      max_calls: 60
      window_seconds: 60
    by_tool:
      write_file:
        max_calls: 10
        window_seconds: 60

custom_policies:
  - tool_name: write_file
    args:
      all_present: ["path", "content"]
      should_not_present: ["sudo"]

semantic_rules:
  provider: ollama # or litellm
  mode: enforce # off | enforce | advisory
  prompt_version: "v2"
  guardrail_model: llama3.1:8b
  critical_intent_threshold: 0.85
  retry_attempts: 2
  retry_base_delay_seconds: 0.25
  retry_max_delay_seconds: 2.0
  retry_jitter_ratio: 0.2
  circuit_breaker_failures: 3
  circuit_breaker_reset_seconds: 30
  provider_fail_mode:
    default: advisory # fail-open
    by_tool:
      delete_database: enforce # fail-closed
  constraints:
    - intent: modify_source_code
      allowed_scope: Actions must only affect UI components or styles.
      forbidden_scope: Should not modify database schemas or auth logic.

Rubric scoring (v2)

Set prompt_version: "v2" to switch from opaque LLM-assigned scores to multi-signal rubric scoring. Instead of asking the LLM for a single confidence number, the engine asks concrete yes/no questions across multiple dimensions and computes the score deterministically from the answers.

semantic_rules:
  prompt_version: "v2"
  critical_intent_threshold: 0.85
  scoring:
    dimensions:
      tool_task_alignment:
        weight: 0.25
      argument_scope_compliance:
        weight: 0.30
      no_forbidden_scope_violation:
        weight: 0.30
      no_side_effect_risk:
        weight: 0.15

Default dimensions (used when scoring is omitted):

Dimension	Question	Default weight
`tool_task_alignment`	Is this tool appropriate for the stated task?	0.25
`argument_scope_compliance`	Are arguments within the allowed scope?	0.30
`no_forbidden_scope_violation`	Do arguments avoid the forbidden scope?	0.30
`no_side_effect_risk`	Is the call free of destructive/exfil risk?	0.15

Score formula: Σ(weight × pass) / Σ(weight). With 4 equal-pass dimensions the score is 1.0; any single failure drops below the 0.85 threshold.

Decisions include dimension_scores with per-dimension passed and evidence for full auditability.

CLI usage

INTENT_GUARD_TASK="Only update frontend styles" \
python -m intent_guard.proxy \
  --policy schema/policy.yaml \
  --target "npx @modelcontextprotocol/server-filesystem /path/to/repo" \
  --model llama3.1:8b \
  --approval-webhook "https://approval.internal/intent-guard" \
  --approval-timeout 10 \
  --approval-default-action deny

Flags

--policy: YAML policy path
--target: target MCP server command
--model: optional Ollama model name for semantic checks
--task: optional task context (or set INTENT_GUARD_TASK)
--ask-approval: prompt user before allowing flagged calls
--approval-webhook: call this webhook for non-interactive approval decisions
--approval-timeout: timeout (seconds) for webhook approvals
--approval-default-action: allow or deny when webhook approval times out/fails

Native hook integration

IntentGuard can run as the policy engine behind native hooks in Claude Code, Copilot, and Cursor.

Evaluate command

Use the unified command:

intent-guard evaluate --policy schema/policy.yaml

Input:

Reads a hook payload JSON object from stdin
Supports generic keys like tool_name, arguments, task_context
Also supports nested payloads (params.name, params.arguments) and common aliases (tool_input, args, prompt)

Output:

Prints a GuardDecision JSON object to stdout
Exit code 0 for allow, 1 for block, 2 for invalid input

Hook config templates

Template files are shipped under hooks/:

hooks/claude-code/settings.json
hooks/copilot/hooks.json
hooks/cursor/hooks.json

Each template invokes:

cat | intent-guard evaluate --policy schema/policy.yaml

This lets platform-native hooks call IntentGuard directly instead of wrapping only MCP servers.

Encoded payload detection

Static checks can decode and normalize argument payloads before matching:

URL decoding
Unicode normalization (NFKC)
Base64 decoding (when valid)

Enable or disable via:

static_rules:
  decode_arguments: true

When enabled, injection, sensitive-data, and protected-path checks run against decoded variants to catch obfuscated bypasses.

Response-side inspection

IntentGuard can inspect MCP server responses before forwarding them to the client.

Configure response_rules in policy:

response_rules:
  action: block # block | warn | redact
  detect_base64: true
  patterns:
    - name: "GitHub Token"
      pattern: "gh[ps]_[A-Za-z0-9_]{36,}"

Behavior:

block: return JSON-RPC error and suppress original response
warn: forward response and log warning decision
redact: redact matched text and forward sanitized response

Tool description change detection (rug-pull protection)

IntentGuard can snapshot MCP tools/list metadata and detect changes over time.

Configure:

tool_change_rules:
  enabled: true
  action: warn # warn | block

Behavior:

On first tools/list, stores snapshot in .intent-guard/tool-snapshots/<server-hash>.json
On subsequent tools/list, compares name, description, and inputSchema
warn: log warning and continue
block: block response when drift is detected

Semantic mode and provider failure behavior

semantic_rules.mode controls normal semantic enforcement:

off: semantic check disabled
enforce: semantic failures block tool calls
advisory: semantic failures are logged as warnings but calls are allowed

semantic_rules.provider_fail_mode controls behavior when semantic provider is unavailable:

supports default and per-tool by_tool override
values use the same mode set: off|enforce|advisory

Behavior matrix for tool criticality tiers (example mapping):

Tool tier	`provider_fail_mode`	Outcome on provider outage
Critical tools	`enforce`	Fail-closed (block + approval required)
Standard tools	`advisory`	Fail-open with warning decision
Low-risk tools	`off`	Fail-open without warning severity

Define tiers by assigning tools in provider_fail_mode.by_tool.

semantic_rules.prompt_version is copied into every semantic decision and log entry as semantic_prompt_version so prompt changes are auditable.

Semantic decision caching

To reduce repeated provider calls for identical semantic evaluations:

semantic_rules:
  decision_cache:
    enabled: true
    max_size: 256
    ttl_seconds: 300

Cache key uses (tool_name, arguments, task_context). Static checks always run; only semantic verdicts are cached.

LiteLLM provider

To use the API provider, set in .env (or process env):

LLM_MODEL=claude-3-5-sonnet-20241022
ANTHROPIC_API_KEY=...
# or OPENAI_API_KEY=...

Then set semantic_rules.provider: litellm (or just set LLM_MODEL and omit explicit provider).

CI break-glass options

INTENT_GUARD_BREAK_GLASS_TOKEN: if set, flagged calls are auto-approved with override metadata.
INTENT_GUARD_BREAK_GLASS_SIGNED_TOKEN + INTENT_GUARD_BREAK_GLASS_SIGNING_KEY: optional HMAC-signed break-glass token for CI. Token format is <base64url(json payload)>.<base64url(signature)> where signature is HMAC-SHA256(payload_part, signing_key) and payload contains future exp (unix timestamp), for example {"exp": 4102444800}.
INTENT_GUARD_APPROVAL_AUTH_TOKEN: bearer token added to webhook approval requests.

SDK usage (Python)

from intent_guard import IntentGuardSDK

guard = IntentGuardSDK(
    policy_path="schema/policy.yaml",
    local_model="llama3.1:8b",
    task_context="Only modify UI components"
)

decision = guard.evaluate("write_file", {"path": "src/auth/config.py"})
print(decision.allowed, decision.reason)

GuardDecision contract (stable)

GuardDecision now includes machine-readable metadata for enforcement and analytics:

decision_id (UUID)
code
severity
policy_name
policy_version
rule_id
timestamp (UTC ISO-8601)
override (who/why/ttl, when manually approved)
semantic_prompt_version (when semantic checks are applied)

Backward compatibility:

Existing fields allowed, reason, requires_approval, semantic_score are unchanged.
New fields are always present with safe defaults, so existing consumers can ignore them.

Semantic eval harness

IntentGuard ships a lightweight semantic eval harness used in tests to measure model behavior on known-safe and known-unsafe tool calls.

Dataset fixtures: tests/fixtures/semantic_eval_dataset.json
Replay verdicts: tests/fixtures/semantic_eval_verdicts.json
Metrics computed: precision, recall, accuracy

This enables reproducible regression checks for semantic policy quality.

Versioning/migration strategy:

Keep parsing logic tolerant of unknown fields.
Use policy_version + code + rule_id for downstream contract evolution and dashboards.
Prefer adding new fields over changing/removing existing field semantics.

Usage examples with popular tools

1) Claude Code (MCP server proxy)

Configure the MCP server command to run through IntentGuard:

{
  "mcpServers": {
    "filesystem": {
      "command": "python",
      "args": [
        "-m",
        "intent_guard.proxy",
        "--policy",
        "schema/policy.yaml",
        "--target",
        "npx @modelcontextprotocol/server-filesystem /path/to/repo",
        "--ask-approval"
      ],
      "env": {
        "INTENT_GUARD_TASK": "Refactor UI only; do not touch auth or database"
      }
    }
  }
}

2) Codex (MCP command wrapping)

For Codex setups that support MCP server command configuration, point the server command to IntentGuard first, then to your real MCP server as --target:

python -m intent_guard.proxy \
  --policy schema/policy.yaml \
  --target "npx @modelcontextprotocol/server-filesystem /path/to/repo" \
  --ask-approval

Use that command as the configured MCP server entry in your Codex environment.

3) LangSmith / LangChain workflows

Use IntentGuard before each tool execution and keep normal LangSmith tracing:

from langsmith import traceable
from intent_guard import IntentGuardSDK

guard = IntentGuardSDK(
    policy_path="schema/policy.yaml",
    task_context="Only update docs and UI text"
)

@traceable(name="guarded_tool_call")
def guarded_call(tool_name: str, args: dict, tool_callable):
    decision = guard.evaluate(tool_name, args)
    if not decision.allowed:
        raise PermissionError(f"IntentGuard blocked: {decision.reason}")
    return tool_callable(**args)

This keeps execution decisions visible in traces while enforcing IntentGuard policy at runtime.

Build and publish (pip / Artifactory)

Build source and wheel distributions:

python3 -m venv .venv
.venv/bin/pip install -U pip build twine
.venv/bin/python -m build

Publish to your Artifactory PyPI repository:

export TWINE_USERNAME="<artifactory-username>"
export TWINE_PASSWORD="<artifactory-password-or-token>"
.venv/bin/python -m twine upload \
  --repository-url "https://<artifactory-host>/artifactory/api/pypi/<pypi-repo>/local" \
  dist/*

Integration testing and Docker

Current integration tests are in-process (tests/test_integration_phases.py) and do not require a database or cache service. If a future change adds external DB/cache dependencies, run those services in Docker for tests (same pattern as temp-noob/rule-engine) so test setup remains reproducible.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_intent_guard-0.1.0.tar.gz (54.3 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_intent_guard-0.1.0-py3-none-any.whl (35.3 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file agent_intent_guard-0.1.0.tar.gz.

File metadata

Download URL: agent_intent_guard-0.1.0.tar.gz
Upload date: Apr 5, 2026
Size: 54.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for agent_intent_guard-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a0c2df040ef6794e6334ccb2ebb6beb2a7029b8cf0d096071d708a287ec1d3fd`
MD5	`c3b710ba627bca98b0ccc00868097820`
BLAKE2b-256	`5cc2cd3280ce265809294a053e4895aa621b049456b5a51cc505853a71c242d9`

See more details on using hashes here.

File details

Details for the file agent_intent_guard-0.1.0-py3-none-any.whl.

File metadata

Download URL: agent_intent_guard-0.1.0-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 35.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for agent_intent_guard-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc24ae14cf2f6f74d490660d4d1e25e250152714c69d5829123a499c0cf1effd`
MD5	`649b9b722ad094cc4f9de089d55991cd`
BLAKE2b-256	`775c4b91fca682f92ee3e9120231eeed2674d7c739d162b1c11beed74d5a5fc1`

See more details on using hashes here.

agent-intent-guard 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

IntentGuard

What is implemented (MVP)

Repository layout

Installation

Run tests

Policy file

Rubric scoring (v2)

CLI usage

Flags

Native hook integration

Evaluate command

Hook config templates

Encoded payload detection

Response-side inspection

Tool description change detection (rug-pull protection)

Semantic mode and provider failure behavior

Semantic decision caching

LiteLLM provider

CI break-glass options

SDK usage (Python)

GuardDecision contract (stable)

Semantic eval harness

Usage examples with popular tools

1) Claude Code (MCP server proxy)

2) Codex (MCP command wrapping)

3) LangSmith / LangChain workflows

Build and publish (pip / Artifactory)

Integration testing and Docker

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes