Intercepting gateway proxy for MCP clients/servers โ real-time PII redaction with regex, NLP, and optional subinterpreter concurrency
Project description
mcp-shield-pii
๐ก๏ธ Real-time PII redaction proxy for MCP clients and servers โ zero-latency privacy for Python 3.12+, with optional Python 3.14 subinterpreter acceleration.
mcp-shield-pii is an intercepting gateway proxy that sits between your MCP client (e.g., Claude Desktop) and any downstream MCP server. It detects and masks Personally Identifiable Information in real-time before it reaches the LLM's context window, ensuring GDPR/HIPAA compliance with a single pip install.
Why mcp-shield-pii?
When an AI agent requests data from an MCP server, the raw payload โ potentially containing SSNs, medical records, or credit cards โ flows directly into the LLM. Organizations face potential GDPR/HIPAA fines exceeding hundreds of millions of dollars. mcp-shield-pii eliminates this risk at the protocol layer.
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ Claude โโโโโโถโ mcp-shield-pii โโโโโโถโ Downstream MCP โ
โ Desktop โโโโโโโ (PII Redaction) โโโโโโโ Server โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โฒ
PII masked before
reaching the LLM
Installation
pip install mcp-shield-pii
For NLP-based detection (names, organizations, addresses):
pip install mcp-shield-pii[nlp]
python -m spacy download en_core_web_sm
Quick Start
1. Scan text for PII
# Simple scan
mcp-shield-pii scan "Contact john@example.com, SSN 123-45-6789"
# JSON output
mcp-shield-pii scan --json "Patient MRN-123456 at 192.168.1.1"
# Different masking strategies
mcp-shield-pii scan --strategy partial "Card: 4111-1111-1111-1111"
mcp-shield-pii scan --strategy hash "Email: secret@corp.com"
mcp-shield-pii scan --strategy pseudo "Call 555-123-4567"
2. Start the proxy
# Basic proxy (stdio transport)
mcp-shield-pii proxy --downstream "npx -y @modelcontextprotocol/server-postgres postgresql://localhost/mydb"
# With config file
mcp-shield-pii proxy --downstream "python my_server.py" --config shield.toml
# Dry-run mode (log detections, don't modify payloads)
mcp-shield-pii proxy --downstream "npx my-mcp-server" --dry-run
3. Claude Desktop integration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"my-server-shielded": {
"command": "mcp-shield-pii",
"args": [
"proxy",
"--downstream", "npx -y @modelcontextprotocol/server-postgres postgresql://localhost/mydb",
"--config", "/path/to/shield.toml"
]
}
}
}
4. Generate a config file
mcp-shield-pii generate-config --output shield.toml
5. Generate a compliance report
mcp-shield-pii report --format markdown --output compliance_report.md
6. Launch the dashboard
mcp-shield-pii dashboard --port 8765
# Open http://127.0.0.1:8765
Features
v1.0 โ Core
| Feature | Description |
|---|---|
| Stdio Proxy | Intercepts MCP stdio transport between client and downstream server |
| Regex Engine (18 types) | Detects SSNs, credit cards, emails, phones, IBANs, API keys, JWTs, and more |
| NLP Engine | Optional spaCy NER for person names, organizations, locations, addresses |
| Masking Strategies | redact (<REDACTED>), partial (***-**-6789), hash (SHA256:a1b2...), pseudo (consistent fakes) |
| TOML Configuration | Per-entity rules, per-tool allow/deny lists, confidence thresholds |
| CallToolResult Interception | Targets JSON-RPC responses while passing non-sensitive RPCs through |
| Audit Trail | JSONL audit log with timestamps, entity types, confidence scores |
| CLI | proxy, scan, report, dashboard, generate-config, version |
v1.1 โ Hardening
| Feature | Description |
|---|---|
| Context-Aware Scoring | Reduces false positives by analyzing surrounding text |
| Confidence Thresholds | Per-entity-type configurable minimum confidence |
| Tool Allow/Deny Lists | Skip trusted tools, enforce strict mode on sensitive ones |
| Dry-Run Mode | Log what would be redacted without modifying payloads |
| Hot-Reload Config | Change rules without restarting the proxy |
| Prometheus Metrics | /metrics endpoint with latency percentiles and entity counters |
v2.0 โ Enterprise
| Feature | Description |
|---|---|
| Pseudo-Anonymization | Consistent fake-data mapping preserving semantic meaning |
| Reversible Redaction | AES-256 encrypted mapping โ authorized key-holders can restore originals |
| Compliance Dashboard | Dark-mode web UI with real-time event table and severity badges |
| GDPR/HIPAA Reports | Auto-generated compliance reports (text, JSON, markdown) |
| Webhook Alerts | Notify Slack/Teams when high-severity PII is detected |
| Subinterpreter Pool | GIL-free parallel detection via concurrent.interpreters (3.14+) or ProcessPoolExecutor (3.12+) |
Detected Entity Types
Regex-Based (18 types)
| Entity | Example | Validation |
|---|---|---|
user@example.com |
Regex | |
| Phone | +1-555-123-4567 |
Regex |
| SSN | 123-45-6789 |
Regex + format validation |
| Credit Card | 4111-1111-1111-1111 |
Regex + Luhn checksum |
| IBAN | DE89370400440532013000 |
Regex + country-code length |
| IPv4 | 192.168.1.1 |
Regex |
| IPv6 | 2001:0db8::1 |
Regex |
| MAC Address | 00:1A:2B:3C:4D:5E |
Regex |
| AWS API Key | AKIA... |
Regex (prefix) |
| OpenAI Key | sk-... |
Regex (prefix) |
| Stripe Key | sk_live_... |
Regex (prefix) |
| GitHub Token | ghp_... |
Regex (prefix) |
| Passport | A12345678 |
Regex |
| Date of Birth | 1990-01-15 |
Regex |
| Medical ID | MRN-123456 |
Regex |
| Driver's License | D123-4567-8901 |
Regex |
| URL with Auth | https://user:pass@host |
Regex |
| JWT Token | eyJhbG... |
Regex (prefix) |
NLP-Based (5 types, requires [nlp] extra)
| Entity | Example |
|---|---|
| Person Name | John Smith |
| Organization | Acme Corp |
| Address | 123 Main St, Springfield |
| Location | New York City |
| Medical Condition | Type 2 diabetes |
Configuration (shield.toml)
[shield]
default_masking_strategy = "redact"
default_confidence_threshold = 0.7
dry_run = false
[detection]
enable_regex = true
enable_nlp = false
enable_context_scoring = true
[entities.SSN]
masking_strategy = "redact"
confidence_threshold = 0.8
[entities.EMAIL]
masking_strategy = "pseudo"
confidence_threshold = 0.7
[tools.trusted_internal_tool]
action = "skip"
[tools.patient_records_api]
action = "strict"
masking_strategy = "redact"
[[webhooks]]
url = "https://hooks.slack.com/services/YOUR/WEBHOOK"
events = ["high_severity"]
[dashboard]
enabled = true
port = 8765
[metrics]
enabled = true
port = 9090
Programmatic API
from mcp_shield_pii.detection.regex_engine import RegexDetectionEngine
from mcp_shield_pii.masking.strategies import get_strategy
from mcp_shield_pii.pipeline import ShieldPipeline
from mcp_shield_pii.config.loader import ShieldConfig
# Simple detection
engine = RegexDetectionEngine()
results = engine.detect("Email john@corp.com, SSN 123-45-6789")
for r in results:
print(f"{r.entity_type.value}: '{r.text}' (confidence: {r.confidence:.0%})")
# Full pipeline
config = ShieldConfig(default_masking_strategy="partial")
pipeline = ShieldPipeline(config)
masked, summary = pipeline.process_text("Contact admin@secret.org, card 4111-1111-1111-1111")
print(masked) # "Contact a***@***.org, card ****-****-****-1111"
pipeline.close()
# Pseudo-anonymization
config = ShieldConfig(default_masking_strategy="pseudo")
pipeline = ShieldPipeline(config)
masked, _ = pipeline.process_text("Email alice@corp.com then alice@corp.com again")
print(masked) # Same fake email both times (consistent mapping)
pipeline.close()
Architecture
src/mcp_shield_pii/
โโโ __init__.py # Public API exports
โโโ cli.py # Typer CLI (6 commands)
โโโ pipeline.py # Orchestration: detect โ score โ filter โ mask โ audit
โโโ compliance.py # GDPR/HIPAA report generator
โโโ webhooks.py # Async webhook alerts
โโโ detection/
โ โโโ base.py # EntityType enum, DetectionResult, protocols
โ โโโ regex_engine.py # 18 regex patterns + Luhn/IBAN validation
โ โโโ nlp_engine.py # spaCy NER detection (optional)
โ โโโ context_scorer.py # Context-aware confidence adjustment
โโโ masking/
โ โโโ strategies.py # Redact, partial, hash, pseudo-anonymization
โ โโโ reversible.py # AES-256 Fernet reversible redaction
โโโ config/
โ โโโ loader.py # TOML config parser
โ โโโ watcher.py # Hot-reload file watcher
โโโ proxy/
โ โโโ __init__.py # MCP JSON-RPC interceptor
โ โโโ stdio_proxy.py # Bidirectional stdio transport
โโโ concurrency/
โ โโโ __init__.py # Subinterpreter pool + ProcessPool fallback
โโโ metrics/
โ โโโ __init__.py # Prometheus metrics + HTTP server
โโโ audit/
โ โโโ __init__.py # JSONL audit logger
โโโ dashboard/
โโโ __init__.py # Web UI + REST API
Contributing
See CONTRIBUTING.md
License
MIT โ see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_shield_pii-1.0.0.tar.gz.
File metadata
- Download URL: mcp_shield_pii-1.0.0.tar.gz
- Upload date:
- Size: 36.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b20ac6de966b51bcc4ab07b391c8188725d14b356cae21344c631b38676c26e0
|
|
| MD5 |
e65c65db7f3b95d8e4ceb733bfef83bd
|
|
| BLAKE2b-256 |
63e382b1b5aab38c035aaec84de5d4dfcc701a626f226266acceb76f03430919
|
File details
Details for the file mcp_shield_pii-1.0.0-py3-none-any.whl.
File metadata
- Download URL: mcp_shield_pii-1.0.0-py3-none-any.whl
- Upload date:
- Size: 40.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e667f048bddf344fb07fc5ae28df2d2c0db54ccf16f7e6f14b2660e2b69749fe
|
|
| MD5 |
a5cb9ec019ed246119c80f081addbd3d
|
|
| BLAKE2b-256 |
52a609ede79720b1043f3eac4f88559522a7c940fb23745a9219efff84da9630
|