MCP server that rewrites SQL to prevent PII/PHI exposure in AI agent queries
Project description
Sanitized DB MCP Server
An MCP server that rewrites SQL queries at the AST level to prevent PII/PHI exposure.
Why This Exists
AI agents write SQL. They also hallucinate column names, ignore access controls, and cheerfully SELECT * from tables full of personal data. This server sits between the agent and your PostgreSQL database, rewriting every query so hidden columns return type-preserving placeholders instead of real values. The agent gets useful results; your users keep their privacy.
Quick Start
1. Create an allowlist
Generate a scaffold from your database schema:
# Install the CLI (skip if using uvx or Docker below)
pip install sanitized-db-mcp
sanitized-db-mcp generate-allowlist --database-url postgresql://user:pass@host:5432/mydb > allowlist.yaml
Edit the YAML to expose only the columns agents should see (see Generating an Allowlist below).
2. Run the server
Pick one of the four methods below and add the config to your .mcp.json.
Method 1: uvx (recommended — zero install)
uvx runs the package in an isolated environment with no permanent installation and no dependency conflicts. Unlike pip, there is nothing to install or manage. Unlike Docker, there are no volume mounts or path mappings to configure.
{
"sanitized-db": {
"type": "stdio",
"command": "uvx",
"args": ["sanitized-db-mcp"],
"env": {
"ALLOWLIST_PATH": "./allowlist.yaml",
"DATABASE_URL": "postgresql://user:pass@host:5432/mydb"
}
}
}
No install needed. uvx downloads and runs the package in an isolated environment.
Method 2: pip install
pip install sanitized-db-mcp
{
"sanitized-db": {
"type": "stdio",
"command": "python3",
"args": ["-m", "sanitized_db_mcp.server"],
"env": {
"ALLOWLIST_PATH": "./allowlist.yaml",
"DATABASE_URL": "postgresql://user:pass@host:5432/mydb"
}
}
}
Method 3: Docker (pre-built image)
{
"sanitized-db": {
"type": "stdio",
"command": "docker",
"args": [
"run", "-i", "--rm",
"-e", "ALLOWLIST_PATH=/app/allowlist.yaml",
"-e", "DATABASE_URL",
"-v", "./allowlist.yaml:/app/allowlist.yaml:ro",
"ghcr.io/ruminaider/sanitized-db-mcp:latest"
],
"env": {
"DATABASE_URL": "postgresql://user:pass@host:5432/mydb"
}
}
}
Method 4: Docker (build from source)
docker build -t sanitized-db-mcp:local .
{
"sanitized-db": {
"type": "stdio",
"command": "docker",
"args": [
"run", "-i", "--rm",
"-e", "ALLOWLIST_PATH=/app/allowlist.yaml",
"-e", "DATABASE_URL",
"-v", "./allowlist.yaml:/app/allowlist.yaml:ro",
"sanitized-db-mcp:local"
],
"env": {
"DATABASE_URL": "postgresql://user:pass@host:5432/mydb"
}
}
}
3. Query through the agent
The server exposes a single MCP tool (query) that accepts raw SQL and returns sanitized results.
Generating an Allowlist
The CLI tool connects to your database, reads the schema, and produces a YAML file where nothing is visible by default. You opt columns in by uncommenting them.
sanitized-db-mcp generate-allowlist --database-url postgresql://user:pass@host:5432/mydb > allowlist.yaml
Add --deny-pii to flag columns that look like PII (email, name, phone, address, etc.):
sanitized-db-mcp generate-allowlist --database-url postgresql://... --deny-pii > allowlist.yaml
Example output:
# Generated by: sanitized-db-mcp generate-allowlist --deny-pii
#
# HOW TO USE:
# - Columns under "columns:" are VISIBLE to agents (currently empty)
# - Commented lines show available columns — uncomment to make visible
# - Lines marked "# PII" were flagged as likely PII/PHI — review carefully
# - After editing, restart the MCP server to apply changes
tables:
users:
columns: {}
# Available columns (uncomment to make visible):
# id: {type: integer, placeholder: 0}
# is_active: {type: boolean, placeholder: false}
# email: {type: varchar, placeholder: '[REDACTED]'} # PII
# password: {type: varchar, placeholder: '[REDACTED]'} # PII
Uncomment the columns you want agents to see. Leave everything else hidden.
Allowlist YAML Format
tables:
<table_name>:
columns:
<column_name>: {type: <pg_type>, placeholder: <sql_literal>}
allowed_functions:
- FUNCTION_NAME
- Listed columns are VISIBLE. Unlisted columns are hidden and replaced with type-preserving placeholders.
type: base PostgreSQL type (integer,varchar,boolean,timestamp,uuid,jsonb, etc.).placeholder: SQL literal that replaces hidden column values. Must be type-compatible.allowed_functions: SQL functions agents can call. All others are rejected.
Configuration
| Variable | Required | Default | Description |
|---|---|---|---|
ALLOWLIST_PATH |
Yes | -- | Path to allowlist.yaml |
MCP_SERVER_NAME |
No | sanitized-db |
MCP server name (affects tool name: mcp__<name>__query) |
DATABASE_URL |
If not using Render | -- | PostgreSQL connection string |
RENDER_POSTGRES_ID |
If using Render | -- | Render Postgres instance ID |
RENDER_API_KEY |
If using Render | -- | Render API bearer token |
The server prefers Render API credentials when both are set. For local development, DATABASE_URL is sufficient.
Framework Integration
- Django: Use a
visible()field decorator to mark safe fields, then generate the allowlist from model metadata. - Rails / Other: Use the CLI tool to scaffold, then curate manually. Or build your own generator that outputs the same YAML format.
- Custom generators: The YAML format is the contract. Any tool that produces conformant YAML works.
SSE Transport (Remote Deployment)
For shared deployments (e.g., Render.com), the server supports SSE transport over HTTP.
Installation
pip install 'sanitized-db-mcp[sse]'
Configuration
| Variable | Required | Default | Description |
|---|---|---|---|
MCP_TRANSPORT |
No | stdio |
Transport mode: stdio or sse |
PORT |
No | 8000 |
HTTP port (Render sets this automatically) |
MCP_API_KEY |
Recommended | — | Bearer token for HTTP authentication |
MCP_MAX_CONNECTIONS |
No | unlimited | Max concurrent connections (uvicorn limit_concurrency) |
MCP_SESSION_TIMEOUT |
No | unlimited | Max SSE session duration in seconds |
Running
export MCP_TRANSPORT=sse
export MCP_API_KEY=your-secret-key
export ALLOWLIST_PATH=./allowlist.yaml
export DATABASE_URL=postgresql://...
python -m sanitized_db_mcp.server
Docker
docker run -e MCP_TRANSPORT=sse -e MCP_API_KEY=secret -e ALLOWLIST_PATH=/app/allowlist.yaml \
-e DATABASE_URL=postgresql://... -p 8000:8000 sanitized-db-mcp
Claude Code Client Configuration
In your MCP client config, point to the SSE endpoint:
{
"mcpServers": {
"sanitized-db": {
"type": "sse",
"url": "https://your-service.onrender.com/sse",
"headers": {
"Authorization": "Bearer your-secret-key"
}
}
}
}
Endpoints
GET /sse— SSE connection (MCP session)POST /messages/— Client-to-server messagesGET /health— Health check (no auth required)
Deployment Guidance
Connection limits: Set MCP_MAX_CONNECTIONS to prevent resource exhaustion under attack or misconfiguration. A reasonable value is 2-5x your expected concurrent clients (e.g., 100 when expecting 10-20 clients). New connections beyond the limit receive HTTP 503. Note: each SSE session AND each /messages/ POST from that session count as separate concurrent connections, so do not set this too low.
Session timeout: Set MCP_SESSION_TIMEOUT (seconds) to close abandoned SSE sessions. 28800 (8 hours) is a reasonable starting point. Clients reconnect automatically after timeout. Without this, abandoned sessions consume resources indefinitely since uvicorn's timeout_keep_alive does not apply to active SSE streams.
CORS: CORS headers are intentionally omitted. The server's clients (Claude Code, MCP CLI tools) are non-browser applications that ignore CORS. The native browser EventSource API cannot send Authorization headers, so browsers cannot authenticate even if they attempt cross-origin connections. If browser-based MCP clients become a supported consumer, add Starlette's CORSMiddleware.
Audit logging: Every query is logged as structured JSON with client IP, request ID, session ID, and user agent for HIPAA compliance. Behind a reverse proxy (Render, nginx), client IP is extracted from X-Forwarded-For. Configure your log aggregator for 6-year retention per HIPAA requirements.
How the Sanitizer Works
The server exposes a single MCP tool (query) that accepts raw SQL and returns sanitized results. Every query passes through an 11-step pipeline:
Agent sends SQL
|
v
1. Parse (pglast) ───── syntax error? → QuerySyntaxError
|
v
2. Statement type ───── not SELECT? → StatementTypeError
| SELECT INTO? → StatementTypeError
| FOR UPDATE/SHARE? → StatementTypeError
v
3. Table validation ─── system catalog? → SystemCatalogError
| not in allowlist? → RestrictedColumnError
| TABLESAMPLE? → unwrap, validate inner table
v
4. Function check ───── always-blocked? → DisallowedFunctionError
| not in allowlist? → DisallowedFunctionError
| FILTER (WHERE ...)? → walk with WHERE rules
| inline OVER clause? → walk with WHERE rules
v
5. WHERE/JOIN check ─── hidden column? → RestrictedColumnError
v
6. Clause check ──────── ORDER BY / GROUP BY / DISTINCT ON / WINDOW
| hidden column? → RestrictedColumnError
v
7. Subquery check ───── hidden column in subquery/CTE SELECT? → RestrictedColumnError
v
8. Rewrite SELECT ───── hidden columns → type-preserving placeholders
| SELECT * → visible columns + redaction marker
v
9. Serialize AST ────── rewritten SQL string
v
10. Execute ───────────── read-only, 5s timeout, SSL
v
11. Audit log ─────────── structured JSON (original, rewritten, outcome)
Security Model
All 200 tests pass with 0 xfails. The pen test suite covers 21 attack categories:
| Category | Tests | Defense |
|---|---|---|
| ORDER BY / GROUP BY / DISTINCT ON | 13 | sortClause, groupClause, distinctClause walked with WHERE rules |
| Window function attacks | 6 | Named WINDOW and inline OVER walked with WHERE rules |
| Aggregate FILTER clause | 4 | agg_filter walked with WHERE rules |
| CTE attacks | 6 | CTE SELECT targets validated; unused CTEs with hidden columns rejected |
| LATERAL join attacks | 3 | Correlated subquery WHERE clauses validated |
| Schema/identifier tricks | 8 | Quoted identifiers, unicode escapes, pg_temp schema handled |
| Composite type / row attacks | 3 | Field selection rejected; ROW() with hidden columns redacted |
| ARRAY attacks | 3 | ARRAY subquery, ARRAY_AGG, ARRAY[] with hidden columns caught |
| JSONB operator attacks | 5 | ->, ->>, #>, @>, ? operators on hidden columns caught |
| Type cast attacks | 4 | Chained casts on hidden columns redacted; casts in WHERE rejected |
| FROM clause variants | 3 | TABLESAMPLE unwrapped, VALUES and generate_series handled |
| Locking clauses | 3 | FOR UPDATE, FOR SHARE rejected at sanitizer level |
| Subquery nesting | 5 | Triple-nested, correlated, NOT EXISTS, ANY all validated |
| Statement type / multi-statement | 5 | SELECT INTO rejected; null byte, comment, dollar quoting tested |
| Ambiguous column resolution | 3 | Unqualified columns conservatively resolved |
| Encoding edge cases | 4 | Cyrillic homoglyphs, unicode escapes, semicolons in aliases |
| Timing / side-channel | 4 | pg_sleep, amplification, cross-join, recursive CTE blocked |
| Error-based extraction | 4 | Generic error messages; no column/table names leaked |
| Connection security | 5 | SSL, timeout, read-only, autocommit, no connection strings in errors |
| Allowlist integrity | 4 | Case-insensitive lookup, unknown columns default to hidden |
| Audit logging | 4 | All outcomes covered, HIPAA fields present, finally-block guarantee |
Running Tests
cd sanitized-db-mcp
# Full test suite (200 tests across 3 suites)
python -m pytest sanitized_db_mcp/tests/ -v
# Individual suites
python -m pytest sanitized_db_mcp/tests/test_sanitizer.py -v # Core rewriting (37 tests)
python -m pytest sanitized_db_mcp/tests/test_bypass.py -v # Bypass resistance (64 tests)
python -m pytest sanitized_db_mcp/tests/test_pentest.py -v # Pen test (99 tests)
python -m pytest sanitized_db_mcp/tests/test_allowlist.py -v # Allowlist loader
Key Files
| File | Purpose |
|---|---|
server.py |
MCP server entry point, query(sql) tool |
sanitizer.py |
AST-level SQL rewriting engine |
allowlist.py |
In-memory allowlist representation |
connection.py |
Render API + static connection management |
errors.py |
Sanitized error classes (no schema leakage) |
audit.py |
HIPAA-compliant query audit logging |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sanitized_db_mcp-0.2.0.tar.gz.
File metadata
- Download URL: sanitized_db_mcp-0.2.0.tar.gz
- Upload date:
- Size: 49.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2ac30ceca2b172cfe82cf0ca4dd517fade4b878ee60b53b39f3832253ac129e
|
|
| MD5 |
e41c5eb104296972040e29e760e39ad6
|
|
| BLAKE2b-256 |
f91f08ed5a2ee9130c6b8e842ff917f19bfc1bdec2dc5495672d1233149a4285
|
Provenance
The following attestation bundles were made for sanitized_db_mcp-0.2.0.tar.gz:
Publisher:
release.yml on ruminaider/sanitized-db-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sanitized_db_mcp-0.2.0.tar.gz -
Subject digest:
b2ac30ceca2b172cfe82cf0ca4dd517fade4b878ee60b53b39f3832253ac129e - Sigstore transparency entry: 1186473058
- Sigstore integration time:
-
Permalink:
ruminaider/sanitized-db-mcp@876c931b19108aa9632e0b79c26a0612152d5937 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/ruminaider
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@876c931b19108aa9632e0b79c26a0612152d5937 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sanitized_db_mcp-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sanitized_db_mcp-0.2.0-py3-none-any.whl
- Upload date:
- Size: 55.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9685c5407ee61139a1ef645c91a1a4135a17272804d5d11585d67310ac3f0617
|
|
| MD5 |
7d245a674858f83ac0d3a3917e540c97
|
|
| BLAKE2b-256 |
a2563c5ec70940974a6c0a9eaea049a18c4b4d559ffce5ba2b27c69f18e4404a
|
Provenance
The following attestation bundles were made for sanitized_db_mcp-0.2.0-py3-none-any.whl:
Publisher:
release.yml on ruminaider/sanitized-db-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sanitized_db_mcp-0.2.0-py3-none-any.whl -
Subject digest:
9685c5407ee61139a1ef645c91a1a4135a17272804d5d11585d67310ac3f0617 - Sigstore transparency entry: 1186473083
- Sigstore integration time:
-
Permalink:
ruminaider/sanitized-db-mcp@876c931b19108aa9632e0b79c26a0612152d5937 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/ruminaider
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@876c931b19108aa9632e0b79c26a0612152d5937 -
Trigger Event:
push
-
Statement type: