Skip to main content

Intercepting gateway proxy for MCP clients/servers โ€” real-time PII redaction with regex, NLP, and optional subinterpreter concurrency

Project description

mcp-shield-pii

๐Ÿ›ก๏ธ Real-time PII redaction proxy for MCP clients and servers โ€” zero-latency privacy for Python 3.12+, with optional Python 3.14 subinterpreter acceleration.

mcp-shield-pii is an intercepting gateway proxy that sits between your MCP client (e.g., Claude Desktop) and any downstream MCP server. It detects and masks Personally Identifiable Information in real-time before it reaches the LLM's context window, ensuring GDPR/HIPAA compliance with a single pip install.

Why mcp-shield-pii?

When an AI agent requests data from an MCP server, the raw payload โ€” potentially containing SSNs, medical records, or credit cards โ€” flows directly into the LLM. Organizations face potential GDPR/HIPAA fines exceeding hundreds of millions of dollars. mcp-shield-pii eliminates this risk at the protocol layer.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Claude       โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ mcp-shield-pii  โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ Downstream MCP   โ”‚
โ”‚ Desktop      โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚ (PII Redaction)  โ”‚โ—€โ”€โ”€โ”€โ”€โ”‚ Server           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ–ฒ                           
                    PII masked before               
                    reaching the LLM                

Installation

pip install mcp-shield-pii

For NLP-based detection (names, organizations, addresses):

pip install mcp-shield-pii[nlp]
python -m spacy download en_core_web_sm

Quick Start

1. Scan text for PII

# Simple scan
mcp-shield-pii scan "Contact john@example.com, SSN 123-45-6789"

# JSON output
mcp-shield-pii scan --json "Patient MRN-123456 at 192.168.1.1"

# Different masking strategies
mcp-shield-pii scan --strategy partial "Card: 4111-1111-1111-1111"
mcp-shield-pii scan --strategy hash "Email: secret@corp.com"
mcp-shield-pii scan --strategy pseudo "Call 555-123-4567"

2. Start the proxy

# Basic proxy (stdio transport)
mcp-shield-pii proxy --downstream "npx -y @modelcontextprotocol/server-postgres postgresql://localhost/mydb"

# With config file
mcp-shield-pii proxy --downstream "python my_server.py" --config shield.toml

# Dry-run mode (log detections, don't modify payloads)
mcp-shield-pii proxy --downstream "npx my-mcp-server" --dry-run

3. Claude Desktop integration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "my-server-shielded": {
      "command": "mcp-shield-pii",
      "args": [
        "proxy",
        "--downstream", "npx -y @modelcontextprotocol/server-postgres postgresql://localhost/mydb",
        "--config", "/path/to/shield.toml"
      ]
    }
  }
}

4. Generate a config file

mcp-shield-pii generate-config --output shield.toml

5. Generate a compliance report

mcp-shield-pii report --format markdown --output compliance_report.md

6. Launch the dashboard

mcp-shield-pii dashboard --port 8765
# Open http://127.0.0.1:8765

Features

v1.0 โ€” Core

Feature Description
Stdio Proxy Intercepts MCP stdio transport between client and downstream server
Regex Engine (18 types) Detects SSNs, credit cards, emails, phones, IBANs, API keys, JWTs, and more
NLP Engine Optional spaCy NER for person names, organizations, locations, addresses
Masking Strategies redact (<REDACTED>), partial (***-**-6789), hash (SHA256:a1b2...), pseudo (consistent fakes)
TOML Configuration Per-entity rules, per-tool allow/deny lists, confidence thresholds
CallToolResult Interception Targets JSON-RPC responses while passing non-sensitive RPCs through
Audit Trail JSONL audit log with timestamps, entity types, confidence scores
CLI proxy, scan, report, dashboard, generate-config, version

v1.1 โ€” Hardening

Feature Description
Context-Aware Scoring Reduces false positives by analyzing surrounding text
Confidence Thresholds Per-entity-type configurable minimum confidence
Tool Allow/Deny Lists Skip trusted tools, enforce strict mode on sensitive ones
Dry-Run Mode Log what would be redacted without modifying payloads
Hot-Reload Config Change rules without restarting the proxy
Prometheus Metrics /metrics endpoint with latency percentiles and entity counters

v2.0 โ€” Enterprise

Feature Description
Pseudo-Anonymization Consistent fake-data mapping preserving semantic meaning
Reversible Redaction AES-256 encrypted mapping โ€” authorized key-holders can restore originals
Compliance Dashboard Dark-mode web UI with real-time event table and severity badges
GDPR/HIPAA Reports Auto-generated compliance reports (text, JSON, markdown)
Webhook Alerts Notify Slack/Teams when high-severity PII is detected
Subinterpreter Pool GIL-free parallel detection via concurrent.interpreters (3.14+) or ProcessPoolExecutor (3.12+)

Detected Entity Types

Regex-Based (18 types)

Entity Example Validation
Email user@example.com Regex
Phone +1-555-123-4567 Regex
SSN 123-45-6789 Regex + format validation
Credit Card 4111-1111-1111-1111 Regex + Luhn checksum
IBAN DE89370400440532013000 Regex + country-code length
IPv4 192.168.1.1 Regex
IPv6 2001:0db8::1 Regex
MAC Address 00:1A:2B:3C:4D:5E Regex
AWS API Key AKIA... Regex (prefix)
OpenAI Key sk-... Regex (prefix)
Stripe Key sk_live_... Regex (prefix)
GitHub Token ghp_... Regex (prefix)
Passport A12345678 Regex
Date of Birth 1990-01-15 Regex
Medical ID MRN-123456 Regex
Driver's License D123-4567-8901 Regex
URL with Auth https://user:pass@host Regex
JWT Token eyJhbG... Regex (prefix)

NLP-Based (5 types, requires [nlp] extra)

Entity Example
Person Name John Smith
Organization Acme Corp
Address 123 Main St, Springfield
Location New York City
Medical Condition Type 2 diabetes

Configuration (shield.toml)

[shield]
default_masking_strategy = "redact"
default_confidence_threshold = 0.7
dry_run = false

[detection]
enable_regex = true
enable_nlp = false
enable_context_scoring = true

[entities.SSN]
masking_strategy = "redact"
confidence_threshold = 0.8

[entities.EMAIL]
masking_strategy = "pseudo"
confidence_threshold = 0.7

[tools.trusted_internal_tool]
action = "skip"

[tools.patient_records_api]
action = "strict"
masking_strategy = "redact"

[[webhooks]]
url = "https://hooks.slack.com/services/YOUR/WEBHOOK"
events = ["high_severity"]

[dashboard]
enabled = true
port = 8765

[metrics]
enabled = true
port = 9090

Programmatic API

from mcp_shield_pii.detection.regex_engine import RegexDetectionEngine
from mcp_shield_pii.masking.strategies import get_strategy
from mcp_shield_pii.pipeline import ShieldPipeline
from mcp_shield_pii.config.loader import ShieldConfig

# Simple detection
engine = RegexDetectionEngine()
results = engine.detect("Email john@corp.com, SSN 123-45-6789")
for r in results:
    print(f"{r.entity_type.value}: '{r.text}' (confidence: {r.confidence:.0%})")

# Full pipeline
config = ShieldConfig(default_masking_strategy="partial")
pipeline = ShieldPipeline(config)
masked, summary = pipeline.process_text("Contact admin@secret.org, card 4111-1111-1111-1111")
print(masked)  # "Contact a***@***.org, card ****-****-****-1111"
pipeline.close()

# Pseudo-anonymization
config = ShieldConfig(default_masking_strategy="pseudo")
pipeline = ShieldPipeline(config)
masked, _ = pipeline.process_text("Email alice@corp.com then alice@corp.com again")
print(masked)  # Same fake email both times (consistent mapping)
pipeline.close()

Architecture

src/mcp_shield_pii/
โ”œโ”€โ”€ __init__.py          # Public API exports
โ”œโ”€โ”€ cli.py               # Typer CLI (6 commands)
โ”œโ”€โ”€ pipeline.py          # Orchestration: detect โ†’ score โ†’ filter โ†’ mask โ†’ audit
โ”œโ”€โ”€ compliance.py        # GDPR/HIPAA report generator
โ”œโ”€โ”€ webhooks.py          # Async webhook alerts
โ”œโ”€โ”€ detection/
โ”‚   โ”œโ”€โ”€ base.py          # EntityType enum, DetectionResult, protocols
โ”‚   โ”œโ”€โ”€ regex_engine.py  # 18 regex patterns + Luhn/IBAN validation
โ”‚   โ”œโ”€โ”€ nlp_engine.py    # spaCy NER detection (optional)
โ”‚   โ””โ”€โ”€ context_scorer.py # Context-aware confidence adjustment
โ”œโ”€โ”€ masking/
โ”‚   โ”œโ”€โ”€ strategies.py    # Redact, partial, hash, pseudo-anonymization
โ”‚   โ””โ”€โ”€ reversible.py    # AES-256 Fernet reversible redaction
โ”œโ”€โ”€ config/
โ”‚   โ”œโ”€โ”€ loader.py        # TOML config parser
โ”‚   โ””โ”€โ”€ watcher.py       # Hot-reload file watcher
โ”œโ”€โ”€ proxy/
โ”‚   โ”œโ”€โ”€ __init__.py      # MCP JSON-RPC interceptor
โ”‚   โ””โ”€โ”€ stdio_proxy.py   # Bidirectional stdio transport
โ”œโ”€โ”€ concurrency/
โ”‚   โ””โ”€โ”€ __init__.py      # Subinterpreter pool + ProcessPool fallback
โ”œโ”€โ”€ metrics/
โ”‚   โ””โ”€โ”€ __init__.py      # Prometheus metrics + HTTP server
โ”œโ”€โ”€ audit/
โ”‚   โ””โ”€โ”€ __init__.py      # JSONL audit logger
โ””โ”€โ”€ dashboard/
    โ””โ”€โ”€ __init__.py      # Web UI + REST API

Contributing

See CONTRIBUTING.md

License

MIT โ€” see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_shield_pii-1.0.0.tar.gz (36.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_shield_pii-1.0.0-py3-none-any.whl (40.7 kB view details)

Uploaded Python 3

File details

Details for the file mcp_shield_pii-1.0.0.tar.gz.

File metadata

  • Download URL: mcp_shield_pii-1.0.0.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for mcp_shield_pii-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b20ac6de966b51bcc4ab07b391c8188725d14b356cae21344c631b38676c26e0
MD5 e65c65db7f3b95d8e4ceb733bfef83bd
BLAKE2b-256 63e382b1b5aab38c035aaec84de5d4dfcc701a626f226266acceb76f03430919

See more details on using hashes here.

File details

Details for the file mcp_shield_pii-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_shield_pii-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 40.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for mcp_shield_pii-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e667f048bddf344fb07fc5ae28df2d2c0db54ccf16f7e6f14b2660e2b69749fe
MD5 a5cb9ec019ed246119c80f081addbd3d
BLAKE2b-256 52a609ede79720b1043f3eac4f88559522a7c940fb23745a9219efff84da9630

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page