Skip to main content

MCP server that rewrites SQL to prevent PII/PHI exposure in AI agent queries

Project description

Sanitized DB MCP Server

PyPI Docker Tests

An MCP server that rewrites SQL queries at the AST level to prevent PII/PHI exposure.

Why This Exists

AI agents write SQL. They also hallucinate column names, ignore access controls, and cheerfully SELECT * from tables full of personal data. This server sits between the agent and your PostgreSQL database, rewriting every query so hidden columns return type-preserving placeholders instead of real values. The agent gets useful results; your users keep their privacy.

Quick Start

1. Create an allowlist

Generate a scaffold from your database schema:

# Install the CLI (skip if using uvx or Docker below)
pip install sanitized-db-mcp

sanitized-db-mcp generate-allowlist --database-url postgresql://user:pass@host:5432/mydb > allowlist.yaml

Edit the YAML to expose only the columns agents should see (see Generating an Allowlist below).

2. Run the server

Pick one of the four methods below and add the config to your .mcp.json.

Method 1: uvx (recommended — zero install)

{
  "sanitized-db": {
    "type": "stdio",
    "command": "uvx",
    "args": ["sanitized-db-mcp"],
    "env": {
      "ALLOWLIST_PATH": "./allowlist.yaml",
      "DATABASE_URL": "postgresql://user:pass@host:5432/mydb"
    }
  }
}

No install needed. uvx downloads and runs the package in an isolated environment.

Method 2: pip install

pip install sanitized-db-mcp
{
  "sanitized-db": {
    "type": "stdio",
    "command": "python3",
    "args": ["-m", "sanitized_db_mcp.server"],
    "env": {
      "ALLOWLIST_PATH": "./allowlist.yaml",
      "DATABASE_URL": "postgresql://user:pass@host:5432/mydb"
    }
  }
}

Method 3: Docker (pre-built image)

{
  "sanitized-db": {
    "type": "stdio",
    "command": "docker",
    "args": [
      "run", "-i", "--rm",
      "-e", "ALLOWLIST_PATH=/app/allowlist.yaml",
      "-e", "DATABASE_URL",
      "-v", "./allowlist.yaml:/app/allowlist.yaml:ro",
      "ghcr.io/ruminaider/sanitized-db-mcp:latest"
    ],
    "env": {
      "DATABASE_URL": "postgresql://user:pass@host:5432/mydb"
    }
  }
}

Method 4: Docker (build from source)

docker build -t sanitized-db-mcp:local .
{
  "sanitized-db": {
    "type": "stdio",
    "command": "docker",
    "args": [
      "run", "-i", "--rm",
      "-e", "ALLOWLIST_PATH=/app/allowlist.yaml",
      "-e", "DATABASE_URL",
      "-v", "./allowlist.yaml:/app/allowlist.yaml:ro",
      "sanitized-db-mcp:local"
    ],
    "env": {
      "DATABASE_URL": "postgresql://user:pass@host:5432/mydb"
    }
  }
}

3. Query through the agent

The server exposes a single MCP tool (query) that accepts raw SQL and returns sanitized results.

Generating an Allowlist

The CLI tool connects to your database, reads the schema, and produces a YAML file where nothing is visible by default. You opt columns in by uncommenting them.

sanitized-db-mcp generate-allowlist --database-url postgresql://user:pass@host:5432/mydb > allowlist.yaml

Add --deny-pii to flag columns that look like PII (email, name, phone, address, etc.):

sanitized-db-mcp generate-allowlist --database-url postgresql://... --deny-pii > allowlist.yaml

Example output:

# Generated by: sanitized-db-mcp generate-allowlist --deny-pii
#
# HOW TO USE:
# - Columns under "columns:" are VISIBLE to agents (currently empty)
# - Commented lines show available columns — uncomment to make visible
# - Lines marked "# PII" were flagged as likely PII/PHI — review carefully
# - After editing, restart the MCP server to apply changes

tables:
  users:
    columns: {}
    # Available columns (uncomment to make visible):
    #   id: {type: integer, placeholder: 0}
    #   is_active: {type: boolean, placeholder: false}
    #   email: {type: varchar, placeholder: '[REDACTED]'}  # PII
    #   password: {type: varchar, placeholder: '[REDACTED]'}  # PII

Uncomment the columns you want agents to see. Leave everything else hidden.

Allowlist YAML Format

tables:
  <table_name>:
    columns:
      <column_name>: {type: <pg_type>, placeholder: <sql_literal>}
allowed_functions:
  - FUNCTION_NAME
  • Listed columns are VISIBLE. Unlisted columns are hidden and replaced with type-preserving placeholders.
  • type: base PostgreSQL type (integer, varchar, boolean, timestamp, uuid, jsonb, etc.).
  • placeholder: SQL literal that replaces hidden column values. Must be type-compatible.
  • allowed_functions: SQL functions agents can call. All others are rejected.

Configuration

Variable Required Default Description
ALLOWLIST_PATH Yes -- Path to allowlist.yaml
MCP_SERVER_NAME No sanitized-db MCP server name (affects tool name: mcp__<name>__query)
DATABASE_URL If not using Render -- PostgreSQL connection string
RENDER_POSTGRES_ID If using Render -- Render Postgres instance ID
RENDER_API_KEY If using Render -- Render API bearer token

The server prefers Render API credentials when both are set. For local development, DATABASE_URL is sufficient.

Framework Integration

  • Django: Use a visible() field decorator to mark safe fields, then generate the allowlist from model metadata.
  • Rails / Other: Use the CLI tool to scaffold, then curate manually. Or build your own generator that outputs the same YAML format.
  • Custom generators: The YAML format is the contract. Any tool that produces conformant YAML works.

How the Sanitizer Works

The server exposes a single MCP tool (query) that accepts raw SQL and returns sanitized results. Every query passes through an 11-step pipeline:

Agent sends SQL
      |
      v
1.  Parse (pglast) ───── syntax error? → QuerySyntaxError
      |
      v
2.  Statement type ───── not SELECT? → StatementTypeError
      |                   SELECT INTO? → StatementTypeError
      |                   FOR UPDATE/SHARE? → StatementTypeError
      v
3.  Table validation ─── system catalog? → SystemCatalogError
      |                   not in allowlist? → RestrictedColumnError
      |                   TABLESAMPLE? → unwrap, validate inner table
      v
4.  Function check ───── always-blocked? → DisallowedFunctionError
      |                   not in allowlist? → DisallowedFunctionError
      |                   FILTER (WHERE ...)? → walk with WHERE rules
      |                   inline OVER clause? → walk with WHERE rules
      v
5.  WHERE/JOIN check ─── hidden column? → RestrictedColumnError
      v
6.  Clause check ──────── ORDER BY / GROUP BY / DISTINCT ON / WINDOW
      |                    hidden column? → RestrictedColumnError
      v
7.  Subquery check ───── hidden column in subquery/CTE SELECT? → RestrictedColumnError
      v
8.  Rewrite SELECT ───── hidden columns → type-preserving placeholders
      |                   SELECT * → visible columns + redaction marker
      v
9.  Serialize AST ────── rewritten SQL string
      v
10. Execute ───────────── read-only, 5s timeout, SSL
      v
11. Audit log ─────────── structured JSON (original, rewritten, outcome)

Security Model

All 200 tests pass with 0 xfails. The pen test suite covers 21 attack categories:

Category Tests Defense
ORDER BY / GROUP BY / DISTINCT ON 13 sortClause, groupClause, distinctClause walked with WHERE rules
Window function attacks 6 Named WINDOW and inline OVER walked with WHERE rules
Aggregate FILTER clause 4 agg_filter walked with WHERE rules
CTE attacks 6 CTE SELECT targets validated; unused CTEs with hidden columns rejected
LATERAL join attacks 3 Correlated subquery WHERE clauses validated
Schema/identifier tricks 8 Quoted identifiers, unicode escapes, pg_temp schema handled
Composite type / row attacks 3 Field selection rejected; ROW() with hidden columns redacted
ARRAY attacks 3 ARRAY subquery, ARRAY_AGG, ARRAY[] with hidden columns caught
JSONB operator attacks 5 ->, ->>, #>, @>, ? operators on hidden columns caught
Type cast attacks 4 Chained casts on hidden columns redacted; casts in WHERE rejected
FROM clause variants 3 TABLESAMPLE unwrapped, VALUES and generate_series handled
Locking clauses 3 FOR UPDATE, FOR SHARE rejected at sanitizer level
Subquery nesting 5 Triple-nested, correlated, NOT EXISTS, ANY all validated
Statement type / multi-statement 5 SELECT INTO rejected; null byte, comment, dollar quoting tested
Ambiguous column resolution 3 Unqualified columns conservatively resolved
Encoding edge cases 4 Cyrillic homoglyphs, unicode escapes, semicolons in aliases
Timing / side-channel 4 pg_sleep, amplification, cross-join, recursive CTE blocked
Error-based extraction 4 Generic error messages; no column/table names leaked
Connection security 5 SSL, timeout, read-only, autocommit, no connection strings in errors
Allowlist integrity 4 Case-insensitive lookup, unknown columns default to hidden
Audit logging 4 All outcomes covered, HIPAA fields present, finally-block guarantee

Running Tests

cd sanitized-db-mcp

# Full test suite (200 tests across 3 suites)
python -m pytest sanitized_db_mcp/tests/ -v

# Individual suites
python -m pytest sanitized_db_mcp/tests/test_sanitizer.py -v    # Core rewriting (37 tests)
python -m pytest sanitized_db_mcp/tests/test_bypass.py -v       # Bypass resistance (64 tests)
python -m pytest sanitized_db_mcp/tests/test_pentest.py -v      # Pen test (99 tests)
python -m pytest sanitized_db_mcp/tests/test_allowlist.py -v    # Allowlist loader

Key Files

File Purpose
server.py MCP server entry point, query(sql) tool
sanitizer.py AST-level SQL rewriting engine
allowlist.py In-memory allowlist representation
connection.py Render API + static connection management
errors.py Sanitized error classes (no schema leakage)
audit.py HIPAA-compliant query audit logging

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanitized_db_mcp-0.1.0.tar.gz (40.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sanitized_db_mcp-0.1.0-py3-none-any.whl (46.0 kB view details)

Uploaded Python 3

File details

Details for the file sanitized_db_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: sanitized_db_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 40.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sanitized_db_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe3481d2ab00c2e90c324c25ec4ff140a252b56b8872db1b55670fd174608737
MD5 0a47086fde61c311ced09fc03056dfca
BLAKE2b-256 31135493ef47c8a069e8999b92367021ca2a4b6f43d247177b1c55470b2dd3d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanitized_db_mcp-0.1.0.tar.gz:

Publisher: release.yml on ruminaider/sanitized-db-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sanitized_db_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sanitized_db_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 251190a937cd03ce20d0aa916e6017f0a7c4a955e939b2110da717ce86c09131
MD5 2c220e8bb9e7b34be178ba2391cefb9f
BLAKE2b-256 f1917281eae97cf436c99d3b260d5e791161c94c06c907f72ac1434d343a11ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanitized_db_mcp-0.1.0-py3-none-any.whl:

Publisher: release.yml on ruminaider/sanitized-db-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page