Skip to main content

MCP proxy server with behavioral profiling, security scanning, risk gating, and safe execution

Project description

MCP Safety Warden

MCP safety warden is a proxy server that wraps any MCP server and adds behavioral profiling, security scanning, risk gating, and safe execution to its tools.

Overview

Most MCP servers expose tools with no information about what those tools actually do at runtime: whether they write data, call external services, delete things, or produce outputs that contain adversarial content.

Instead of calling a wrapped server's tools directly, you route calls through this wrapper. It classifies each tool, builds a behavior profile from observed runs, checks for injection attacks, and blocks or gates risky tools before they execute.

Behavioral profiling

  • Static classification of effect class (read_only, additive_write, mutating_write, external_action, destructive), retry safety, and destructiveness.
  • LLM-assisted classification via Anthropic, OpenAI, Gemini, or Ollama - LLM and rule-based signals are combined via weighted voting, producing higher confidence across all tools.
  • Observed stats updated after every proxied call: p50/p95 latency, failure rate, output size, schema stability.

Security scanning

  • mcpsafety+ five-stage pipeline: Recon, Planner, Hacker (live probing), Auditor (CVE/Arxiv research), Supervisor (final report). Enhanced over mcpsafetyscanner (Radosevich & Halloran, arxiv 2504.03767).
  • LLM provider choice for mcpsafety+: Anthropic, OpenAI, Gemini, or Ollama (local, no API key).
  • Multi-server scan: run the full pipeline against every registered server in one call via scan_all_servers.
  • Cisco AI Defense: AST and taint analysis, YARA rules, optional cloud ML engine.
  • Snyk: prompt injection, tool shadowing, toxic data flows, hardcoded secrets.
  • Kali MCP integration: if a Kali Linux MCP server is registered, quick_scan, vulnerability_scan, and traceroute run against the target host at the start of the pipeline. The results are embedded in the Recon output so the Planner can ground its attack hypotheses in real port and service data rather than guessing from tool schemas alone.
  • Burp Suite MCP integration: if a Burp Suite MCP server is registered, the Hacker stage sends raw HTTP/1.1 probes directly to the MCP endpoint (malformed JSON, missing headers, oversized payloads), triggers Collaborator out-of-band payloads to detect blind SSRF (Pro edition), and pulls automated scanner findings (Pro edition). Proxy history feeds the Auditor as raw evidence. Community edition tools run automatically; Pro-only tools are tried and silently skipped if unavailable.
  • All findings stored and surfaced automatically in subsequent preflight assessments.

Safe execution

  • Argument scanning on every tool call: 20+ attack categories (SSRF, SQL/NoSQL/LDAP/XPath injection, command injection, path traversal, XXE, template injection, prompt injection, deserialization payloads, base64-encoded variants, Windows-specific paths). When an LLM key is set, flagged args get a second-pass LLM verification to clear false positives.
  • Two-layer injection scanning on every tool output: 40+ regex patterns then LLM deep scan.
  • Injection-flagged output is quarantined and never returned to the caller.
  • Risk gating with per-tool permanent policies (allow/block) or per-call approval flow.
  • Alternatives suggestion: when a tool is blocked, the LLM ranks safer substitutes by risk reduction and functional coverage.

CLI

  • 16 subcommands covering all 17 MCP tools (list covers both list_servers and list_server_tools).
  • Interactive risk menu for call: pick an alternative, approve the original, or abort.
  • scan-all runs the full pentest pipeline across all registered servers in one command.
  • --json flag on every command for scripting and pipelines.
  • --yes / -y flag on confirmation prompts for CI use.

Transport

  • stdio (default), SSE, and streamable_http.
  • Bearer token auth middleware for HTTP transports.

Use it when you need to audit what third-party or internal MCP tools actually do before trusting them in an agent workflow.


Architecture

MCP Client (Claude Desktop, agent, mcpsafetywarden CLI)
        |
        v
  mcpsafetywarden/server.py  (FastMCP, 17 tools, rate limiting, bearer auth)
        |
        +---> mcpsafetywarden/client_manager.py  (connects to wrapped servers, records telemetry, injection scan)
        |
        +---> mcpsafetywarden/database.py        (SQLite: servers, tools, runs, profiles, scans, policies)
        |
        +---> mcpsafetywarden/classifier.py      (rule-based + LLM tool classification)
        |
        +---> mcpsafetywarden/profiler.py        (computes behavior profiles from run history)
        |
        +---> mcpsafetywarden/scanner.py         (LLM, Cisco, Snyk scan orchestration)
        |
        +---> mcpsafetywarden/mcpsafety_scanner.py (five-stage pentest pipeline)
        |
        +---> mcpsafetywarden/security_utils.py  (redaction, normalisation, injection detection helpers)

mcpsafetywarden/cli.py imports from mcpsafetywarden/server.py and mcpsafetywarden/database.py directly. It does not use the MCP protocol; it calls the same Python functions that the MCP tools call, which means no network hop for CLI usage.

Request flow for safe_tool_call:

  1. Lookup tool record and behavior profile in SQLite.
  2. Check permanent policy (allow/block).
  3. Run _preflight_assessment: compute risk level from profile and latest security scan findings.
  4. If low or medium-low risk: scan args for threats -> forward call to wrapped server via client_manager -> scan output -> record telemetry -> return result.
  5. If medium/high risk and not approved: fetch LLM-ranked alternatives, return blocked response with numbered menu.
  6. If approved or alternative selected: scan args for threats -> execute -> scan output -> record telemetry -> return result.

Prerequisites

  • Python 3.10 or later.
  • pip for dependency installation.
  • At least one wrapped MCP server to proxy (stdio subprocess, SSE endpoint, or streamable_http endpoint).
  • Recommended: an API key for at least one LLM provider (Anthropic, OpenAI, Gemini, or a local Ollama instance).

Why an LLM key matters:

The wrapper has two operating modes depending on whether an LLM is available:

Capability Without LLM key With LLM key
Tool classification Rule-based heuristics only - low confidence on ambiguous tool names LLM resolves ambiguous cases; higher confidence across the board
Injection scanning Regex patterns only (40+ rules) Regex + LLM deep scan - catches obfuscated and novel injections
Risk gate alternatives None - gate shows "More options" only LLM ranks safer substitute tools by risk reduction and functional coverage
Security scanning Snyk and Cisco only (metadata/static analysis, no LLM needed) Full 5-stage pentest: Recon, Planner, Hacker, Auditor, Supervisor

Set at minimum ANTHROPIC_API_KEY, OPENAI_API_KEY, or GEMINI_API_KEY before starting the server. For a fully local setup with no API keys, run Ollama and set OLLAMA_MODEL - then pass --provider ollama (or scan_provider="ollama") explicitly on every command, as Ollama is not auto-detected from environment variables.


Installation

git clone <YOUR_REPO_URL>
cd mcpsafetywarden
pip install .

With all optional LLM providers and scanners:

pip install ".[all]"

Or pick specific extras:

pip install ".[anthropic,snyk]"

Verify the install:

mcpsafetywarden --help
mcpsafetywarden-server --help

The SQLite database is created automatically on first run in the platform user data directory (e.g. ~/.local/share/mcpsafetywarden/ on Linux, ~/Library/Application Support/mcpsafetywarden/ on macOS, %APPDATA%\mcpsafetywarden\ on Windows). Set MCP_DB_PATH to override the location.

Optional: at-rest encryption for stored credentials

The wrapper stores server env vars and HTTP headers in the database. To encrypt them at rest:

pip install cryptography
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Set the printed key as MCP_DB_ENCRYPTION_KEY before starting the server. Keep this key safe; losing it makes stored credentials unrecoverable.


Configuration

All configuration is via environment variables. No config file is required.

Variable Default Purpose
MCP_TRANSPORT stdio Transport mode: stdio, sse, or streamable_http
MCP_HOST 127.0.0.1 Bind address for HTTP transports
MCP_PORT 8000 Bind port for HTTP transports
MCP_AUTH_TOKEN (unset) Bearer token for HTTP transport auth. Unset means no auth (log warning is emitted).
MCP_DB_ENCRYPTION_KEY (unset) Fernet key to encrypt env_json and headers_json at rest
ANTHROPIC_API_KEY (unset) Enables Anthropic as LLM provider for classification and scanning
OPENAI_API_KEY (unset) Enables OpenAI as LLM provider
GEMINI_API_KEY (unset) Enables Gemini as LLM provider
GOOGLE_API_KEY (unset) Legacy alias for GEMINI_API_KEY
OLLAMA_MODEL (unset) Model name for Ollama provider (e.g. llama3.1, mistral)
OLLAMA_BASE_URL http://localhost:11434/v1 Ollama API base URL (OpenAI-compatible)
SNYK_TOKEN (unset) Enables Snyk E001 prompt-injection detection
MCP_SCANNER_API_KEY (unset) Cisco AI Defense API key for cloud ML engine
MCP_SCANNER_LLM_API_KEY (unset) LLM key for Cisco internal AST analysis (falls back to OPENAI_API_KEY)
MCP_DB_PATH (unset) Override the SQLite database file path

Example .env for local development:

MCP_TRANSPORT=stdio
ANTHROPIC_API_KEY=sk-ant-...
MCP_DB_ENCRYPTION_KEY=<generated_fernet_key>

Security note: Never commit API keys or the encryption key to version control. Pass them via environment variables or a secrets manager. The wrapper strips its own secrets (MCP_AUTH_TOKEN, MCP_DB_ENCRYPTION_KEY, and all LLM/scanner API keys) from the child process environment before spawning stdio servers. Other variables present in the parent environment are passed through.


Auxiliary Security Tool Integrations

The wrapper detects Kali and Burp by looking for registered servers whose server_id contains "kali" or "burp" (case-insensitive). Registration is the only setup step - once registered, the tools activate automatically on every scan, ping, and replay test.

Kali Linux MCP (ccq1/awsome_kali_MCPServers)

Docker-based, Apache 2.0, no auth. Adds real network reconnaissance to the Recon stage and network data to ping_server.

What it contributes:

Pipeline stage / tool Kali tools called What it adds
Recon (before Planner) quick_scan(target), vulnerability_scan(target), traceroute(target) Open ports, running services, OS fingerprint, network path - Planner uses this to craft targeted hypotheses
ping_server quick_scan(target), traceroute(target) Network reachability detail beyond the MCP protocol ping (sse/streamable_http only — no network target for stdio)

Setup:

# 1. Install Docker Desktop (if not already installed)
#    Windows: winget install Docker.DockerDesktop
#    macOS:   brew install --cask docker
#    Linux:   https://docs.docker.com/engine/install/

# 2. Clone and build the image
git clone https://github.com/ccq1/awsome_kali_MCPServers
cd awsome_kali_MCPServers
docker build -t kali-mcps:latest .

# 3. Register with the wrapper (server_id must contain "kali")
mcpsafetywarden register kali-mcp \
  --transport stdio \
  --command docker \
  --args '["run", "-i", "kali-mcps:latest"]'

Note: vulnerability_scan runs nmap vuln scripts which can take 60-90 seconds per target. On scan-all across many servers this adds up. Register only when you want network recon in your scans.

Burp Suite MCP (PortSwigger/mcp-server)

Kotlin, GPL-3.0, no auth, runs as an SSE server on port 9876. Community edition tools run always; Pro-only tools (Collaborator, scanner) are tried and silently skipped on failure.

What it contributes:

Pipeline stage / tool Burp tools called Edition What it adds
Hacker (after LLM probing) SendHttp1Request x3 Community Raw HTTP probes: malformed JSON body, missing Content-Type, oversized method field
Hacker GenerateCollaboratorPayload, GetCollaboratorInteractions Pro Out-of-band DNS/HTTP callbacks - detects blind SSRF and blind injection
Hacker GetScannerIssues Pro Automated active scanner findings against the MCP endpoint
Auditor GetProxyHttpHistoryRegex Community Raw HTTP traffic evidence for every finding the Auditor validates
run_replay_test GetProxyHttpHistoryRegex Community HTTP traffic captured during both tool calls, appended to the replay result

Setup:

# 1. Install Burp Suite (Community or Professional)
#    Download from https://portswigger.net/burp/releases

# 2. Build the MCP extension JAR
git clone https://github.com/PortSwigger/mcp-server.git
cd mcp-server
./gradlew embedProxyJar
# produces build/libs/burp-mcp-all.jar

# 3. Load into Burp
#    Burp -> Extensions -> Add -> Java type -> select burp-mcp-all.jar
#    Then go to the "MCP" tab in Burp and enable the server.
#    SSE endpoint starts at http://127.0.0.1:9876/sse

# 4. Register with the wrapper (server_id must contain "burp")
mcpsafetywarden register burp-mcp \
  --transport sse \
  --url http://127.0.0.1:9876/sse

Snyk (snyk-agent-scan)

Python, Apache 2.0, requires a free Snyk account token. Connects to the target MCP server, lists its tools, and runs static analysis on the tool metadata (names, descriptions, schemas). It does not call any tools - it only reads what the server advertises.

What it checks:

Code Severity Check
E001 HIGH Prompt injection strings in tool descriptions or schemas
E002 HIGH Tool shadowing (a tool impersonates another)
E004 HIGH Prompt injection embedded in skill definitions
E005 HIGH Suspicious download URLs in tool metadata
E006 HIGH Malicious code patterns in descriptions
W007 HIGH Insecure credential handling patterns
W008 HIGH Hardcoded secrets in tool metadata
W009 MEDIUM Direct financial execution capabilities
W011 MEDIUM Untrusted third-party content references
W012 HIGH Unverifiable external dependencies
W013 MEDIUM System service modification capabilities
W015 MEDIUM Untrusted content flows
W017 MEDIUM Sensitive data exposure patterns
W019 MEDIUM Destructive capabilities
W001 LOW Suspicious words
W014 LOW Missing skill documentation
W016 LOW Potential untrusted content
W018 LOW Workspace data exposure
W020 LOW Local destructive capabilities

E001 (prompt injection) requires a Snyk token for Snyk's AI-based detection. All other checks run with the token present but also degrade gracefully if the token is invalid - structural and pattern-based checks are fully offline.

How it runs:

Snyk is invoked as a subprocess (snyk-agent-scan) with a temporary config JSON pointing at the target server. The binary opens its own live MCP connection, fetches the tool list, analyzes the metadata, and returns JSON findings. The wrapper normalizes these into its common findings format and stores them in the database, where they are automatically included in future preflight_tool_call responses.

Setup:

pip install snyk-agent-scan

Get a free token at app.snyk.io/account. Set it as an environment variable:

export SNYK_TOKEN=snyk_uat.<your_token>

Or pass it directly on the scan command:

mcpsafetywarden scan my-server --provider snyk --api-key snyk_uat.<your_token> --yes

Unlike Kali and Burp, Snyk is not auto-activated on every scan - it only runs when explicitly chosen as the provider via --provider snyk or provider="snyk".


CLI Reference

Global flags

All commands support --json for machine-readable output. Commands with confirmation prompts support --yes / -y to skip them.

Typical workflow

# Register, inspect, and scan a local stdio server in one step
mcpsafetywarden onboard my-server \
  --transport stdio \
  --command python \
  --args '["my_mcp_server.py"]' \
  --scan-provider anthropic

# Check what tools were discovered
mcpsafetywarden list my-server

# Execute a tool safely
mcpsafetywarden call my-server read_file --args '{"path": "/tmp/data.txt"}'

# Execute a risky tool (interactive menu appears if blocked)
mcpsafetywarden call my-server delete_file --args '{"path": "/tmp/old.txt"}'

call interactive flow when a tool is blocked:

⚠ Blocked  risk: HIGH
  1.  list_files  -- reduction: HIGH  coverage: partial
  2.  More options

Pick: 2

  B.  Proceed with original tool despite risk
  C.  Abort

Pick [B/b/C/c]: B

✓  142ms  [explicit_approval]

To bypass the menu in scripts, pass --approved:

mcpsafetywarden call my-server delete_file \
  --args '{"path": "/tmp/old.txt"}' \
  --approved

Commands

list [server_id] List all registered servers. Pass server_id to list tools on a specific server.

mcpsafetywarden list
mcpsafetywarden list my-server
mcpsafetywarden list my-server --json

onboard <server_id> Register + inspect + security scan in one call. Prompts for authorization before scanning unless --yes is passed.

mcpsafetywarden onboard my-server --transport stdio --command python --args '["server.py"]'
mcpsafetywarden onboard my-server --transport streamable_http --url https://mcp.example.com/mcp \
  --headers '{"Authorization": "Bearer TOKEN"}' \
  --scan-provider anthropic --scan-model claude-opus-4-7 --scan-api-key sk-ant-... --yes

register <server_id> Register only, without scanning.

mcpsafetywarden register my-server --transport stdio --command python --args '["server.py"]'
mcpsafetywarden register my-server --transport stdio --command python --no-inspect
mcpsafetywarden register my-server --transport stdio --command python --args '["server.py"]' --provider anthropic

inspect <server_id> Reconnect to a registered server, refresh tools, re-classify.

mcpsafetywarden inspect my-server --provider anthropic
mcpsafetywarden inspect my-server --provider anthropic --model claude-opus-4-7 --api-key sk-ant-...

scan <server_id> Run a security scan against a single server. Prompts for authorization before probing.

  • anthropic, openai, gemini, ollama - mcpsafety+ 5-stage pipeline (Recon -> Planner -> Hacker -> Auditor -> Supervisor)
  • cisco - Cisco AI Defense: AST taint analysis, YARA rules, optional cloud ML engine
  • snyk - Snyk: prompt injection, tool shadowing, toxic data flows, hardcoded secrets

For Ollama set OLLAMA_MODEL before running. Web research (DuckDuckGo/HackerNews/Arxiv CVE lookup in the Auditor stage) is skipped by default to avoid leaking findings externally; pass --web-research to enable it.

If a Kali MCP server is registered, nmap and traceroute results are shown after the findings table and included in --json output under network_scan. If a Burp Suite MCP server is registered, the number of HTTP-layer findings Burp contributed is shown as a summary line; use --json for the full evidence.

mcpsafetywarden scan my-server --provider anthropic
mcpsafetywarden scan my-server --provider anthropic --model claude-opus-4-7 --api-key sk-ant-...
mcpsafetywarden scan my-server --provider ollama              # local model, no API key
mcpsafetywarden scan my-server --provider cisco
mcpsafetywarden scan my-server --provider anthropic --web-research --destructive --timeout 600 --yes

scan-all Run the full 5-stage mcpsafety+ pipeline against every registered server (or a comma-separated subset via --servers). Results are stored per server and displayed as a combined risk table. Only mcpsafety+ providers are supported (not cisco or snyk). Web research is skipped by default; pass --web-research to enable.

mcpsafetywarden scan-all --provider anthropic
mcpsafetywarden scan-all --provider anthropic --model claude-opus-4-7 --api-key sk-ant-...
mcpsafetywarden scan-all --provider ollama --servers my-server,other-server --yes
mcpsafetywarden scan-all --provider openai --web-research --timeout 600 --json

call <server_id> <tool_name> Execute a tool through the risk gate. Interactive menu appears if the tool is blocked.

Every argument value is scanned for 20+ attack categories (SSRF, SQL/NoSQL/LDAP/XPath injection, command injection, path traversal, XXE, prompt injection, deserialization payloads, base64-encoded variants, and more) before the call is forwarded. If an LLM key is set, a second-pass LLM verification runs on flagged args to clear false positives. Without an LLM key, the CLI prompts you to confirm before proceeding.

mcpsafetywarden call my-server search_web --args '{"query": "site:example.com"}'
mcpsafetywarden call my-server delete_file --args '{"path": "/tmp/x"}' --approved
mcpsafetywarden call my-server run_query --args '{"sql": "SELECT id FROM users"}' --args-scan-override
Flag Effect
--approved Bypass the risk gate for a high-risk tool you have reviewed
--args-scan-override Skip argument safety scanning (use only when you trust the args)
--provider LLM provider for alternatives and arg verification (anthropic|openai|gemini|ollama)

preflight <server_id> <tool_name> Assess risk without executing.

mcpsafetywarden preflight my-server delete_file
mcpsafetywarden preflight my-server delete_file --provider anthropic --model claude-opus-4-7 --api-key sk-ant-...

profile <server_id> <tool_name> Print the full behavior profile.

mcpsafetywarden profile my-server read_file --json

retry-policy <server_id> <tool_name> Print retry and timeout recommendations.

mcpsafetywarden retry-policy my-server call_api
mcpsafetywarden retry-policy my-server call_api --provider anthropic --model claude-opus-4-7 --api-key sk-ant-...

alternatives <server_id> <tool_name> List safer alternatives to a tool.

mcpsafetywarden alternatives my-server delete_file --provider anthropic

replay <server_id> <tool_name> Run the tool twice and compare outputs. Prompts for confirmation. If a Burp Suite MCP server is registered, Burp proxy traffic captured during both calls is appended to the result - useful for spotting network-level differences even when output text is identical.

mcpsafetywarden replay my-server get_status --args '{"id": "123"}' --yes

policy <server_id> <tool_name> Read or set a permanent execution policy. Without --set, prints the current policy.

By default no policy is set and safe_tool_call decides at runtime based on the behavior profile: low or medium-low risk tools run immediately, medium/high-risk tools trigger the approval gate. Setting a policy overrides that completely - allow bypasses the risk gate (argument scanning still runs unless --args-scan-override is also passed), block rejects unconditionally.

mcpsafetywarden policy my-server read_file             # read current policy
mcpsafetywarden policy my-server read_file --set allow  # always execute without preflight
mcpsafetywarden policy my-server drop_table --set block # never execute
mcpsafetywarden policy my-server read_file --set clear  # remove policy, resume normal flow

history <server_id> <tool_name> Show recent execution history.

mcpsafetywarden history my-server delete_file --limit 50

ping <server_id> Check if a server is reachable. If a Kali MCP server is registered and the pinged server uses the sse or streamable_http transport, also runs quick_scan and traceroute against the target host and displays the output in labeled panels. Stdio servers have no network address to scan so Kali recon is skipped.

mcpsafetywarden ping my-server

get-scan <server_id> Print the latest stored security scan report.

mcpsafetywarden get-scan my-server --json

Exit codes:

  • 0: success
  • 1: error (tool not found, blocked by policy, unreachable server, invalid input)

MCP Integration

Connecting with Claude Desktop

Add the wrapper to claude_desktop_config.json:

{
  "mcpServers": {
    "mcpsafetywarden": {
      "command": "mcpsafetywarden-server",
      "args": [],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "MCP_DB_ENCRYPTION_KEY": "<generated_fernet_key>"
      }
    },

    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/yourname/Documents"]
    },

    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_..."
      }
    }
  }
}

The wrapper and the servers it proxies are registered separately in Claude Desktop. Claude sees all of them - but you route calls through mcpsafetywarden (using safe_tool_call, preflight_tool_call, etc.) instead of calling filesystem or github directly. First register each server with the wrapper:

mcpsafetywarden register filesystem --transport stdio \
  --command npx \
  --args '["-y", "@modelcontextprotocol/server-filesystem", "/Users/yourname/Documents"]'

mcpsafetywarden register github --transport stdio \
  --command npx \
  --args '["-y", "@modelcontextprotocol/server-github"]' \
  --env '{"GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_..."}'

Using the wrapper as a mandatory gateway for all tool calls

Instead of adding every MCP server to claude_desktop_config.json, you can add only the wrapper and register all other servers inside it. Claude then has no direct path to any underlying server - every tool call must go through safe_tool_call, making the wrapper a mandatory enforcement point for risk gating, arg scanning, and output inspection across your entire MCP setup.

claude_desktop_config.json - wrapper only:

{
  "mcpServers": {
    "mcpsafetywarden": {
      "command": "mcpsafetywarden-server",
      "args": [],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Register your servers once via CLI before starting Claude Desktop:

mcpsafetywarden register github --transport stdio \
  --command npx \
  --args '["-y", "@modelcontextprotocol/server-github"]' \
  --env '{"GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_..."}'

mcpsafetywarden register slack --transport stdio \
  --command npx \
  --args '["-y", "@modelcontextprotocol/server-slack"]' \
  --env '{"SLACK_BOT_TOKEN": "xoxb-..."}'

Claude sees only the wrapper's 17 tools. To use github or slack it must call safe_tool_call(server_id="github", ...) - there is no other route. Registration is enforced because safe_tool_call rejects any server_id that is not registered.

Field notes:

Field Required Notes
command Yes mcpsafetywarden-server after pip install.
ANTHROPIC_API_KEY Strongly recommended Enables LLM classification, deep injection scanning, risk gate alternatives, and the full mcpsafety+ pentest pipeline. Use OPENAI_API_KEY or GEMINI_API_KEY instead if preferred. Without any key the wrapper operates in rule-based-only mode - see Prerequisites.
MCP_DB_ENCRYPTION_KEY Recommended Encrypts stored server credentials (env vars, headers) at rest. Generate with: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
MCP_TRANSPORT No Defaults to stdio. Leave as-is for Claude Desktop.
MCP_AUTH_TOKEN No Not needed for stdio; only relevant for HTTP deployments. Omit or leave empty.

Restart Claude Desktop. All 17 wrapper tools appear in Claude's tool list.

Connecting with an HTTP client

MCP_TRANSPORT=streamable_http MCP_AUTH_TOKEN=mytoken mcpsafetywarden-server

Configure your MCP client to connect to http://127.0.0.1:8000/mcp with header Authorization: Bearer mytoken.

Available MCP tools

Tool What it does
onboard_server Register + inspect + security scan in one call
register_server Register a server; optionally auto-inspect
inspect_server Refresh tool list and profiles
list_servers List all registered servers
list_server_tools List tools on a server with summary profiles
preflight_tool_call Risk assessment without execution
safe_tool_call Execute with risk gating and interactive alternatives
get_tool_profile Full behavior profile with observed stats
get_retry_policy Retry and timeout recommendations
suggest_safer_alternative LLM-ranked safer substitutes
run_replay_test Idempotency test (runs tool twice); appends Burp proxy traffic if Burp is registered
security_scan_server Live security audit (mcpsafety+, Cisco, Snyk); Kali nmap enriches Recon, Burp adds HTTP-layer probes to Hacker and evidence to Auditor
scan_all_servers Run mcpsafety+ pipeline across all registered servers
get_security_scan Latest stored scan report
set_tool_policy Permanent allow/block policy for a tool
get_run_history Recent execution history
ping_server Reachability check with latency; adds Kali nmap + traceroute if Kali is registered

Project Structure

mcpsafetywarden/
├── mcpsafetywarden/
│   ├── server.py               # FastMCP server, all MCP tools, rate limiting, bearer auth
│   ├── cli.py                  # CLI entry point (typer + rich)
│   ├── client_manager.py       # Connects to wrapped servers, injection scanning, telemetry
│   ├── database.py             # SQLite persistence (servers, tools, runs, profiles, scans, policies)
│   ├── classifier.py           # Static rule-based + LLM tool classification
│   ├── profiler.py             # Builds behavior profiles from run history
│   ├── scanner.py              # LLM, Cisco AI Defense, Snyk scan orchestration
│   ├── mcpsafety_scanner.py    # Five-stage pentest pipeline (Recon, Planner, Hacker, Auditor, Supervisor)
│   └── security_utils.py       # Text normalisation, redaction, credential detection
├── tests/
│   └── test_suite.py
├── docs/
│   └── COMPARISON.md
├── assets/
│   └── logo.png
└── pyproject.toml

The database (behavior_profiles.db) is stored in the platform user data directory, not in the project root. Override with MCP_DB_PATH.


Development

Install in editable mode with all extras:

pip install -e ".[all]"

Run the server in stdio mode and observe logs:

mcpsafetywarden-server 2>server.log

Run the CLI against a test server:

mcpsafetywarden onboard test-server --transport stdio --command python --args '["<YOUR_TEST_SERVER>.py"]'
mcpsafetywarden list test-server
mcpsafetywarden call test-server <tool_name>

Adding a new MCP tool:

  1. Define an async (or sync) function in mcpsafetywarden/server.py decorated with @mcp.tool().
  2. Use db.* for persistence, cm.call_tool_with_telemetry for proxied execution.
  3. Add a corresponding CLI command in mcpsafetywarden/cli.py with @app.command().
  4. Follow the existing pattern: validate input, check rate limit if it is a management operation, return json.dumps(...).

Logging:

Every module uses logging.getLogger(__name__). The server does not call logging.basicConfig itself - configure logging in your entry point or launcher script before importing the server. Example: logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(name)s %(levelname)s %(message)s").


Testing

A test suite is available at tests/test_suite.py. Run it with:

python tests/test_suite.py

Set ANTHROPIC_API_KEY (or another provider key) before running if you want LLM-assisted classification and scanning tests to execute. To validate behavior manually:

Verify tool classification:

mcpsafetywarden onboard test-server --transport stdio --command python --args '["<YOUR_MCP_SERVER>.py"]'
mcpsafetywarden list test-server --json

Check that effect_class values match what you expect for each tool.

Verify injection scanning:

Call a tool that returns text content. Inject a test pattern such as "Ignore all previous instructions" into the tool output (by modifying the wrapped server temporarily) and confirm the wrapper returns a quarantined response.

Verify risk gating:

mcpsafetywarden preflight test-server <high_risk_tool>
mcpsafetywarden call test-server <high_risk_tool>
# Should block and show alternatives menu
mcpsafetywarden call test-server <high_risk_tool> --approved
# Should execute

Verify policy enforcement:

mcpsafetywarden policy test-server <tool_name> --set block
mcpsafetywarden call test-server <tool_name>
# Should return policy_blocked immediately
mcpsafetywarden policy test-server <tool_name> --set clear

Deployment

Starting the server

stdio (default):

mcpsafetywarden-server

The server reads from stdin and writes to stdout. This is the mode used by Claude Desktop and other MCP clients that manage the subprocess.

HTTP (streamable_http):

MCP_TRANSPORT=streamable_http MCP_PORT=8000 mcpsafetywarden-server

Set MCP_AUTH_TOKEN to require bearer auth on all requests:

MCP_TRANSPORT=streamable_http MCP_AUTH_TOKEN=mysecrettoken mcpsafetywarden-server

SSE:

MCP_TRANSPORT=sse MCP_PORT=8000 mcpsafetywarden-server

Local (stdio with Claude Desktop)

Set up claude_desktop_config.json as shown in the MCP Integration section. No additional setup is needed.

Local HTTP server

MCP_TRANSPORT=streamable_http \
MCP_HOST=127.0.0.1 \
MCP_PORT=8000 \
MCP_AUTH_TOKEN=<your_secret_token> \
ANTHROPIC_API_KEY=<your_key> \
mcpsafetywarden-server

Container

A Dockerfile is not included. A minimal setup:

FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir .
ENV MCP_TRANSPORT=streamable_http
ENV MCP_HOST=0.0.0.0
ENV MCP_PORT=8000
EXPOSE 8000
CMD ["mcpsafetywarden-server"]

Pass MCP_AUTH_TOKEN, MCP_DB_ENCRYPTION_KEY, and API keys as container environment variables. Do not bake them into the image.

Production considerations

  • Rate limiting is in-process and resets on restart. For multi-replica deployments, replace the deque-based limiter with a shared store such as Redis.
  • Database is a local SQLite file. For shared deployments, consider replacing with a networked database.
  • Bearer auth covers the HTTP transport layer. For multi-tenant deployments, place an API gateway (nginx, Caddy, AWS API Gateway) in front and leave MCP_AUTH_TOKEN unset.
  • Logging goes to stderr by default via Python's logging module. Redirect and aggregate as needed for your observability stack.
  • Database permissions are set to owner-only (0o600) on POSIX systems. On Windows this is a no-op; use filesystem ACLs.

Troubleshooting

Tool '<name>' not found on server '<id>'. Run mcpsafetywarden inspect <server_id> to refresh the tool list from the live server.

Server '<id>' not registered. Run mcpsafetywarden register or mcpsafetywarden onboard first.

Rate limit exceeded. There are two separate rate limits:

  • Management operations (register, inspect, scan, replay, etc.): 10 calls per 60 seconds per server and 100 globally. Limits are in mcpsafetywarden/server.py (_MGMT_RATE_LIMIT_MAX, _GLOBAL_RATE_LIMIT_MAX).
  • Tool calls via safe_tool_call: 20 calls per 60 seconds per tool. Limit is in mcpsafetywarden/client_manager.py (_RATE_LIMIT_MAX_CALLS).

Wait for the window to expire. For heavy automation, batch operations or increase the relevant limit constants.

URL targets a private or restricted address. The SSRF filter blocked a private IP, localhost, or cloud metadata endpoint. This is intentional. If you are proxying a legitimate internal server over stdio instead, use the stdio transport.

Registering a shell interpreter with an eval flag is not permitted. You tried to register bash -c or similar. Use a dedicated MCP server script as the command instead of a shell one-liner.

LLM classification shows confidence: 0% for all tools. No LLM API key was found. Set ANTHROPIC_API_KEY, OPENAI_API_KEY, or GEMINI_API_KEY. Classification falls back to rule-based when no key is available, which gives lower confidence on ambiguous tool names.

Scan fails immediately with confirm_authorized must be True. The mcpsafety+ scanner requires explicit authorization before sending live probes. Pass --yes on the CLI or confirm_authorized=True on the MCP tool.

snyk-agent-scan not available. Install with pip install snyk-agent-scan. If the binary is installed but not on PATH, the wrapper falls back to the Python module invocation automatically. If both fail, check that the install completed without errors and that pip show snyk-agent-scan shows the package.

SNYK_TOKEN is required for snyk-agent-scan. Set SNYK_TOKEN=snyk_uat.<your_token> in your environment or pass --api-key snyk_uat.<your_token> on the CLI. Get a free token at app.snyk.io/account.

Snyk scan returns 0 findings on a server that has obvious issues. Snyk analyzes tool metadata only - it does not call tools or inspect server-side logic. If the malicious content is not present in tool names, descriptions, or schemas as advertised by the server, Snyk will not detect it. Use --provider anthropic (or another LLM) with --yes for active probing.

MCP_DB_ENCRYPTION_KEY is set but Fernet init failed. The key is malformed. Regenerate it with:

python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Decryption failure logged at ERROR level. The encryption key changed after data was written (key rotation). The affected server's env and headers fields will read as empty until the data is re-written with the new key by re-registering the server.


Security

Secrets in arguments The wrapper redacts credential-shaped values (JWTs, API keys, PEM blocks, long hex and base64 blobs) from tool arguments before storing them. If a secret is detected in an argument, a warning is included in the telemetry response. Prefer setting secrets as environment variables on the wrapped server rather than passing them as tool arguments.

Child process isolation When spawning stdio servers, the wrapper strips its own secrets (MCP_AUTH_TOKEN, MCP_DB_ENCRYPTION_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, GOOGLE_API_KEY, SNYK_TOKEN, MCP_SCANNER_API_KEY, MCP_SCANNER_LLM_API_KEY) from the child process environment. Supply needed env vars explicitly via the env parameter in register_server.

Input validation All server IDs, URLs, commands, and argument values are length-checked before storage. URLs are checked against the SSRF blocklist. Shell interpreters with eval flags are rejected at registration time.

HTTP auth Set MCP_AUTH_TOKEN for any HTTP deployment. The token is compared with hmac.compare_digest to prevent timing attacks. Without a token, the server logs a warning and accepts all connections.

Database Enable at-rest encryption with MCP_DB_ENCRYPTION_KEY to protect stored server credentials. The database file is set to 0o600 on POSIX systems.

Argument scanning Every tool call argument is scanned for 20+ attack categories before the call is forwarded to the wrapped server. If an LLM key is available, flagged values are sent for a second-pass LLM verification to clear false positives. Blocked calls return a structured response showing exactly which argument triggered which category. Pass args_scan_override=True (or --args-scan-override on the CLI) to bypass after manual review.

Injection quarantine Tool output flagged as a prompt injection attempt is stored in the database under the run ID but is never returned to the calling agent. The response contains a quarantine notice and the run ID for forensic review.


Contributing

  1. Fork the repository and create a branch from main.
  2. Make your changes. Keep functions focused. Follow the existing pattern: validation first, then logic, then return json.dumps(...) for MCP tools.
  3. Test manually using the CLI against a real or mock MCP server.
  4. Open a pull request with a clear description of what changed and why.

Code standards:

  • No inline comments unless the reason is non-obvious.
  • No docstring blocks beyond the existing MCP tool docstrings (which are user-facing).
  • Match the surrounding code style: Optional[str] type hints, _log.warning/error for operator-visible events, _log.debug for internal traces.

License

Apache License 2.0. See LICENSE for details.


Roadmap

  • Automated test suite (unit tests for classifier, profiler, and security_utils; integration tests with a mock MCP server).
  • Redis-backed rate limiting for multi-replica deployments.
  • Schema drift detection: alert when a wrapped tool's input or output schema changes between runs.
  • Web dashboard for server health, tool risk overview, and run history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcpsafetywarden-0.1.0.tar.gz (147.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcpsafetywarden-0.1.0-py3-none-any.whl (120.3 kB view details)

Uploaded Python 3

File details

Details for the file mcpsafetywarden-0.1.0.tar.gz.

File metadata

  • Download URL: mcpsafetywarden-0.1.0.tar.gz
  • Upload date:
  • Size: 147.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mcpsafetywarden-0.1.0.tar.gz
Algorithm Hash digest
SHA256 03605d488d3c83b40060b005efa07d4ce6c76b5f431fccde9759755d0ed8a8a3
MD5 f83635ec50a11061ff278b64cc243d6b
BLAKE2b-256 d7bfb2a6108fdfadd326f2d54fc5a1383b09edae59b5824edeea7cc7c15fe28e

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcpsafetywarden-0.1.0.tar.gz:

Publisher: publish.yml on gautamvarmadatla/mcpsafetywarden

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcpsafetywarden-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcpsafetywarden-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 120.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mcpsafetywarden-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dfe48e167c139daf9b548382987a2d81bd1fd23efc14c168b81efa48ea24787e
MD5 78125d12e5560e8dc6377597e916e074
BLAKE2b-256 f9d97420e3fee6bc04a9c095f3d3be78a04233d9bb63cba437fcc84d80699d51

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcpsafetywarden-0.1.0-py3-none-any.whl:

Publisher: publish.yml on gautamvarmadatla/mcpsafetywarden

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page