Skip to main content

Privacy Vault Protocol for MCP: tokenize sensitive data before the LLM sees it

Project description

mcp-pvp — Privacy Vault Protocol for MCP

Tokenize sensitive data before the LLM sees it.
mcp-pvp is a lightweight security/runtime layer for MCP-based agents and workflows that prevents accidental leakage of PII and secrets by design.

Agents operate on references, not raw values.

Maintained by the team behind Hidet (hidet.io), and usable standalone.


Why mcp-pvp

MCP makes tool calling easy. The hard part is handling real user data safely:

  • emails, phone numbers, addresses
  • API keys and tokens
  • IDs, payment-like strings
  • anything you should not put in an LLM prompt, logs, or telemetry

Most systems either:

  • send raw values to an LLM (high risk), or
  • do brittle redaction that breaks workflows (hard to restore safely)

This is not a vulnerability scanner/fuzzer for MCP servers.
It's a privacy vault + policy enforcement runtime for sensitive data in MCP workflows.


Installation

Using uv (recommended)

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/Hidet-io/mcp-pvp.git
cd mcp-pvp

# Create virtual environment and install
uv venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
uv pip install -e ".[all]"

# Or use the Makefile
make install-all

Using pip

# Basic installation (regex detector)
pip install mcp-pvp

# With Presidio detector (recommended for production)
pip install mcp-pvp[presidio]

# All extras
pip install mcp-pvp[all]

Quick Start with Makefile

make help          # Show all available commands
make test          # Run tests
make lint          # Run linter
make format        # Format code
make check         # Run all checks (lint, format, typecheck, test)
make version       # Show current version
make bump-minor    # Bump version (e.g., 0.6.0 -> 0.7.0)

Compatibility & Requirements

Python Versions

  • Required: Python 3.11+
  • Tested: Python 3.11, 3.12

MCP SDK

  • Required: MCP SDK 1.26.0+

Optional Dependencies (Extras)

pip install mcp-pvp[presidio]   # Microsoft Presidio for production-grade PII detection
pip install mcp-pvp[sentry]     # Sentry error tracking with PII protection
pip install mcp-pvp[all]        # Everything above + docs tooling
pip install mcp-pvp[dev]        # Development tools (pytest, ruff, mypy, etc.)

Platform Support

  • Linux: Fully supported ✅
  • macOS: Fully supported ✅
  • Windows: Supported ⚠️ (Presidio may require WSL for some languages)

Quick Start

Library Usage (standalone vault)

from mcp_pvp import Vault, TokenizeRequest, TokenFormat, DeliverRequest, ToolCall, Policy

vault = Vault(policy=Policy(
    sinks={
        "tool:send_email": SinkPolicy(allow=[PolicyAllow(type=PIIType.EMAIL, arg_paths=["to"])])
    }
))

# Tokenize sensitive content
response = vault.tokenize(TokenizeRequest(content="Email me at alice@example.com", token_format=TokenFormat.TEXT))

# Tokenize sensitive content
print(response.redacted)  # "Email me at [[PII:EMAIL:tkn_xyz]]"

# Deliver: inject PII locally into a tool call without returning raw values
deliver_resp = await vault.deliver(
    DeliverRequest(
        vault_session=response.vault_session,
        tool_call=ToolCall(name="send_email", args={"to": response.tokens[0].to_text()})
    )
)
# deliver_resp.tool_result is the tool's return value with PII re-tokenized

MCP Server Integration (FastPvpMCP)

FastPvpMCP is a drop-in subclass of FastMCP that adds automatic PII protection to every tool call. It uses MCP's native lifespan and resource primitives — no hidden arguments, no protocol changes.

from mcp_pvp.bindings.mcp.server import FastPvpMCP
from mcp_pvp.models import Policy, PolicyAllow, PIIType, SinkPolicy
from mcp_pvp.vault import Vault

policy = Policy(
    sinks={
        "tool:send_email": SinkPolicy(
            allow=[PolicyAllow(type=PIIType.EMAIL, arg_paths=["to"])]
        )
    }
)

mcp = FastPvpMCP(name="my-server", vault=Vault(policy=policy))

@mcp.tool()
def send_email(to: str, subject: str, body: str) -> dict:
    """The 'to' argument arrives already resolved — no token handling needed."""
    return {"status": "sent", "to": to}

# Run as a standard FastMCP server
if __name__ == "__main__":
    mcp.run()

How it works end-to-end

Client connects
  └─ Server creates a vault session automatically (lifespan hook)

Client reads  pvp://session  resource
  └─ Receives vault_session_id

Client tokenizes PII:
  vault.tokenize(content="alice@example.com", vault_session=vault_session_id)
  └─ Returns token: [[PII:EMAIL:tkn_abc123]]

Client calls tool:
  session.call_tool("send_email", {"to": "[[PII:EMAIL:tkn_abc123]]", "subject": "Hi"})
  └─ No _vault_session argument — server reads it from the lifespan context

Server, transparently:
  1. Resolves token → "alice@example.com" (policy-checked)
  2. Invokes the real tool with resolved args
  3. Scans result for PII → re-tokenizes it
  4. Returns clean result to client

Client disconnects → vault session ends

Client-side usage with the MCP SDK

No local Vault instance is needed. The server exposes a built-in pvp_tokenize tool so clients can tokenize PII server-side over the MCP protocol itself.

import json
from mcp import ClientSession
from mcp.types import AnyUrl

# ... set up memory streams or stdio transport ...

async with ClientSession(read_stream, write_stream) as session:
    await session.initialize()

    # Step 1: Discover the vault session created for this connection
    resource = await session.read_resource(AnyUrl("pvp://session"))
    vault_session_id = resource.contents[0].text

    # Step 2: Tokenize PII via the built-in server tool — raw value never
    #         leaves this process; the server stores it in its vault.
    tok_result = await session.call_tool(
        "pvp_tokenize",
        {"content": "alice@example.com", "vault_session": vault_session_id},
    )
    tok_data = json.loads(tok_result.content[0].text)
    token = tok_data["tokens"][0]  # "[[PII:EMAIL:tkn_abc123]]"

    # Step 3: Call the real tool with the token — no raw PII on the wire.
    #         The server resolves the token, executes the tool, and
    #         re-tokenizes any PII in the result before returning it.
    result = await session.call_tool("send_email", {"to": token, "subject": "Hi"})
    result_data = json.loads(result.content[0].text)
    # result_data["to"] is a fresh token, not the real email address

Core Concepts

Tokens (references, not values)

Text token (LLM-safe, passes through prompts):

[[PII:EMAIL:tkn_a1b2c3]]

JSON token (preferred for structured tool args):

{ "$pii_ref": "tkn_a1b2c3", "type": "EMAIL", "cap": "cap_..." }

Vault sessions

Tokens are scoped to a short-lived vault session (vs_...) with a TTL.
A token is only valid within its session.

In FastPvpMCP, one vault session is created per MCP connection and lives for the duration of that connection.

Capabilities (caps)

Even if an LLM is tricked into requesting disclosure, the vault requires a signed capability authorizing exactly: which token, for which sink/tool, at which argument path, within which time window.

Sinks + policies

Policies are enforced inside the vault, default-deny:

  • allow specific PII types
  • only for specific tools (sinks), identified as "tool:<name>"
  • optionally restricted to argument paths (e.g. to, email)

Deliver mode (standalone) vs. FastPvpMCP (server mode)

Mode When to use
vault.deliver() Standalone Python workflows, custom executors, non-FastMCP servers
FastPvpMCP FastMCP-based servers — wraps every registered tool transparently

In FastPvpMCP, deliver-mode semantics (resolve → execute → re-tokenize) happen automatically inside call_tool() using the connection-scoped vault session from the lifespan context.


Policy Example

from mcp_pvp.models import Policy, PolicyAllow, PIIType, SinkPolicy, PolicyLimits

policy = Policy(
    sinks={
        "tool:send_email": SinkPolicy(
            allow=[
                PolicyAllow(type=PIIType.EMAIL, arg_paths=["to", "cc", "bcc"]),
            ]
        ),
        "tool:crm_upsert_contact": SinkPolicy(
            allow=[
                PolicyAllow(type=PIIType.EMAIL, arg_paths=["email"]),
                PolicyAllow(type=PIIType.PHONE, arg_paths=["phone"]),
            ]
        ),
    },
    limits=PolicyLimits(
        max_disclosures_per_step=50,
        max_total_disclosed_bytes_per_step=8192,
    ),
)

PII Types

Type Default mode Notes
EMAIL Tokenize
PHONE Tokenize Sanity-checked
IPV4 Tokenize
CC Mask Optional tokenize with Luhn
API_KEY Mask Optional tokenize

Names/addresses are intentionally excluded from the regex detector (too error-prone). Use the Presidio extra for those.


What's Included

Core Features

  • ✅ PII detection (regex built-in; Presidio optional)
  • ✅ Tokenization with typed opaque refs, structured tokens, and session TTLs
  • ✅ Policy enforcement (sink allow-lists + limits) with capability checks
  • ✅ HMAC-signed capabilities paired with audit events (no raw values in logs)
  • ✅ Deliver mode: injects PII locally, re-tokenizes tool results, returns result_tokens
  • FastPvpMCP: drop-in FastMCP subclass with connection-scoped vault sessions
  • pvp://session MCP resource — standard resource protocol for session discovery
  • ✅ Observability (structlog, Prometheus metrics, optional Sentry)

Vault Hardening Features

  • Session Integrity Validation — prevents cross-session token theft
  • Result Tokenization in Same Session — session consistency for result tokens
  • Scanner-Based TEXT Token Parser — O(n) state machine, 10–100× faster than regex
  • Recursive Output Scrubbing — PII detection in exceptions, nested dicts, custom types
  • Audit Coherence — parent-child event tracking for full request/response traceability

Test Coverage: 239 tests, 83% code coverage


Running as MCP Server

# Start with stdio transport (works with Claude Desktop, MCP Inspector, etc.)
mcp-pvp-mcp

Integrating with Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "pvp": {
      "command": "mcp-pvp-mcp",
      "env": {}
    }
  }
}

Testing with MCP Inspector

npx @modelcontextprotocol/inspector
# Connect using command: mcp-pvp-mcp

Threat Model (What This Helps With)

  • Prompt injection: "print the user's email"
  • Accidental logging/telemetry leaks
  • Token spoofing (LLM hallucinates tkn_...)
  • Over-broad restoration ("give me the full mapping")
  • Unsafe tool exfiltration (policy + deliver reduces exposure)

No library can fully protect a compromised device.
mcp-pvp minimizes common leakage paths and enforces least-privilege disclosure.


Feature Matrix

Capability mcp-pvp Presidio LangChain PII Guardrails HashiCorp Vault MCP
High-quality PII detection ✅ (Presidio optional)
Redaction / anonymization
Local vault session (raw PII stays local) ⚠️ ✅ (secrets, not PII)
Typed opaque tokens in prompts/plans ⚠️ ⚠️ ⚠️
Capability-based selective disclosure ✅ (secrets API)
Per-tool / per-arg-path policy enforcement ⚠️
Deliver mode (PII injected locally, never returned to agent)
MCP-native integration (lifespan, resources)
Audit trail (no raw values in logs) ⚠️ ⚠️

Observability & Monitoring

  • Structured Logging: built on structlog with JSON output
  • Audit Trail: complete trail of all PII operations — raw values never logged
  • Metrics: Prometheus-compatible metrics for requests, latency, and disclosures
  • Error Tracking: optional Sentry integration with PII scrubbing
  • Health Checks: ready-to-use health and readiness endpoints

See docs/OBSERVABILITY.md for the full guide and examples/observability/ for production configurations.


Roadmap

v0.6 ✅ (Current)

  • ✅ PVP core: tokenize / resolve / deliver
  • ✅ TTL store with session management
  • ✅ Policy allow-lists + limits
  • ✅ HMAC capabilities for sink-bound tokens
  • FastPvpMCP — FastMCP subclass with transparent PII protection
  • ✅ Connection-scoped vault sessions via MCP lifespan
  • pvp://session MCP resource for standard session discovery
  • ✅ Recursive result scrubbing (dicts, lists, exceptions, Pydantic models)
  • ✅ Vault hardening: session integrity, audit coherence, scanner-based parser
  • ✅ Comprehensive observability (logging, metrics, Sentry)

Future

  • Encrypted local persistence (SQLite)
  • Expanded detectors (IBAN, secrets, configurable patterns)
  • Richer audit queries and compliance reporting
  • Optional proxy mode (protect existing agents without refactoring)
  • Enhanced policy primitives (time-based, context-aware)

Documentation

Full API and architecture docs are generated with MkDocs (Material theme + mkdocstrings).

make docs          # Preview locally
make docs-build    # Build static site
make docs-deploy   # Publish to GitHub Pages (requires GH_TOKEN)

Install docs dependencies: uv pip install -e ".[docs]"


Contributing

We welcome:

  • Detector modules (high precision, low false positives)
  • Policy primitives and safe defaults
  • Examples (email, CRM, ticketing, file access)
  • Interoperability tests with MCP clients/servers
  • Threat model improvements

See CONTRIBUTING.md.


License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_pvp-0.6.17.tar.gz (149.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_pvp-0.6.17-py3-none-any.whl (47.5 kB view details)

Uploaded Python 3

File details

Details for the file mcp_pvp-0.6.17.tar.gz.

File metadata

  • Download URL: mcp_pvp-0.6.17.tar.gz
  • Upload date:
  • Size: 149.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_pvp-0.6.17.tar.gz
Algorithm Hash digest
SHA256 34a7a857bb8d0419bb95cb00be17ff66d09d6e45d71b75606490fc15a7410682
MD5 31f0e31892d2a045e4a3d2d2cbf4570d
BLAKE2b-256 845b7339ba1d6a5bc4062d72bb920eb8a40434e32c4c2126e165e605c7dc9baa

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_pvp-0.6.17.tar.gz:

Publisher: release.yml on Hidet-io/mcp-pvp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_pvp-0.6.17-py3-none-any.whl.

File metadata

  • Download URL: mcp_pvp-0.6.17-py3-none-any.whl
  • Upload date:
  • Size: 47.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_pvp-0.6.17-py3-none-any.whl
Algorithm Hash digest
SHA256 13c001d9048415f3445b7f60225d7610eded71209d52f6b0b265767025d40de8
MD5 45e507a0a8eea01cada78639a9e910fa
BLAKE2b-256 28adff42a9176b8ac39c17a090bffdc17111c9e02d26cbe235e610318526d622

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_pvp-0.6.17-py3-none-any.whl:

Publisher: release.yml on Hidet-io/mcp-pvp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page