Skip to main content

HIPAA-native PHI redaction proxy for AI/LLM interactions

Project description

phi-redactor

HIPAA-native PHI redaction proxy for AI/LLM interactions

Python 3.11+ License CI HIPAA


phi-redactor is an open-source, drop-in PHI redaction proxy that sits between your healthcare AI applications and LLM providers (OpenAI, Anthropic). It automatically detects and masks all 18 HIPAA Safe Harbor identifiers in real-time, then restores original values locally -- so PHI never leaves your infrastructure.

Your App  -->  phi-redactor (localhost:8080)  -->  OpenAI / Anthropic
                    |                                      |
              [detect PHI]                          [masked request]
              [mask with fakes]                     [LLM processes]
              [vault mapping]                       [response back]
                    |                                      |
              [rehydrate response]  <--  [masked response]

Why phi-redactor?

Problem Solution
PHI leaks to cloud LLMs Transparent proxy masks all 18 HIPAA identifiers
Inconsistent fake data Semantic masking generates clinically coherent replacements
No audit trail Tamper-evident hash-chain audit log for every redaction
Complex integration Zero code changes -- just change your base URL
Multi-turn context loss Encrypted vault preserves mappings across conversation turns

Quick Start

Install

pip install phi-redactor
python -m spacy download en_core_web_lg

Start the proxy

phi-redactor serve --port 8080

Use with OpenAI (zero code changes)

from openai import OpenAI

# Just change the base_url -- everything else stays the same
client = OpenAI(
    api_key="your-openai-key",
    base_url="http://localhost:8080/v1",  # <-- only change needed
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{
        "role": "user",
        "content": "Patient John Smith (SSN: 123-45-6789) has Type 2 Diabetes."
    }]
)
print(response.choices[0].message.content)
# PHI is automatically redacted before reaching OpenAI,
# and restored in the response you receive

Use with Anthropic

import anthropic

client = anthropic.Anthropic(
    api_key="your-anthropic-key",
    base_url="http://localhost:8080/anthropic",
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Dr. Maria Garcia (NPI: 1234567890) prescribed metformin."
    }]
)

Library API (no LLM needed)

import httpx

# Redact text directly
resp = httpx.post("http://localhost:8080/api/v1/redact", json={
    "text": "Patient Jane Doe SSN 987-65-4321 seen on 01/15/2026."
})
result = resp.json()
print(result["redacted_text"])  # PHI replaced with synthetic values
session_id = result["session_id"]

# Rehydrate later
resp = httpx.post("http://localhost:8080/api/v1/rehydrate", json={
    "text": result["redacted_text"],
    "session_id": session_id,
})
print(resp.json()["text"])  # Original PHI restored

All 18 HIPAA Safe Harbor Identifiers

phi-redactor detects and masks all 18 identifier types required by the HIPAA Safe Harbor method:

# Category Detection Method Example
1 Person Names NER + Pattern John Smith -> James Wilson
2 Geographic Data NER + Pattern Springfield, IL -> Portland, OR
3 Dates Pattern + NER 03/15/1956 -> 07/22/1955
4 Phone Numbers Pattern (555) 123-4567 -> (555) 987-6543
5 Fax Numbers Pattern Fax: 555-0100 -> Fax: 555-0299
6 Email Addresses Pattern john@test.com -> james@example.net
7 SSN Pattern 123-45-6789 -> 987-65-4321
8 Medical Record Numbers Pattern MRN: 00456789 -> MRN: 00891234
9 Health Plan IDs Pattern BCBS-987654321 -> AETNA-123456789
10 Account Numbers Pattern ACC-00112233 -> ACC-99887766
11 License/DEA/NPI Pattern NPI: 1234567890 -> NPI: 9876543210
12 Vehicle IDs Pattern VIN: 1HGBH41... -> VIN: 2FGCD52...
13 Device IDs (UDI) Pattern UDI: (01)12345... -> UDI: (01)98765...
14 URLs Pattern https://patient-portal.com -> https://example.com
15 IP Addresses Pattern 192.168.1.100 -> 10.0.0.42
16 Biometric IDs Pattern Fingerprint hash -> BIO-a1b2c3d4
17 Photos Detection [REDACTED_PHOTO]
18 Other Unique IDs Pattern ID-12345678 -> ID-87654321

Architecture

+------------------+     +-------------------+     +------------------+
|   Your App       | --> |   phi-redactor    | --> |  LLM Provider    |
|   (OpenAI SDK)   |     |   (localhost)     |     |  (OpenAI/Claude) |
+------------------+     +-------------------+     +------------------+
                          |                 |
                    +-----+-----+     +-----+-----+
                    | Detection |     |  Masking   |
                    | Engine    |     |  Engine    |
                    | (Presidio |     | (Faker +   |
                    |  + spaCy) |     |  Custom)   |
                    +-----------+     +-----------+
                          |                 |
                    +-----+-----+     +-----+-----+
                    | Encrypted |     |   Audit   |
                    | Vault     |     |   Trail   |
                    | (SQLite + |     | (Hash-    |
                    |  Fernet)  |     |  chain)   |
                    +-----------+     +-----------+

Core Components

Component Description
Detection Engine Presidio + spaCy NER + 8 custom HIPAA recognizers
Masking Engine Faker-based semantic replacement with healthcare providers
Encrypted Vault Fernet-encrypted SQLite for PHI-to-synthetic mappings
Proxy Server FastAPI reverse proxy with OpenAI + Anthropic adapters
Audit Trail Append-only hash-chain JSON Lines log (tamper-evident)
Compliance Reports HIPAA Safe Harbor evidence report generator

API Endpoints

Proxy Routes

Method Path Description
POST /v1/chat/completions OpenAI chat proxy (drop-in compatible)
POST /v1/embeddings OpenAI embeddings proxy
POST /anthropic/v1/messages Anthropic Messages API proxy

Library Routes

Method Path Description
POST /api/v1/redact Detect and redact PHI from text
POST /api/v1/rehydrate Restore original PHI from redacted text

Management Routes

Method Path Description
GET /api/v1/health Health check and system info
GET /api/v1/stats Aggregate redaction statistics
GET /api/v1/sessions List all sessions
GET /api/v1/compliance/report Full HIPAA compliance report
GET /api/v1/compliance/summary Quick compliance status
GET /api/v1/audit Query audit trail events

CLI Commands

phi-redactor serve [--port 8080] [--host 0.0.0.0]   # Start the proxy
phi-redactor redact --file patient_notes.txt          # Batch file redaction
phi-redactor report --full --output report.json       # Compliance report
phi-redactor version                                   # Show version

Configuration

All settings can be configured via environment variables with the PHI_REDACTOR_ prefix:

PHI_REDACTOR_PORT=8080              # Proxy port
PHI_REDACTOR_HOST=0.0.0.0          # Bind address
PHI_REDACTOR_SENSITIVITY=0.5       # Detection sensitivity (0.0=aggressive, 1.0=permissive)
PHI_REDACTOR_LOG_LEVEL=INFO        # Logging level
PHI_REDACTOR_VAULT_PASSPHRASE=...  # Optional vault encryption passphrase
PHI_REDACTOR_SESSION_IDLE_TIMEOUT=1800   # Session idle timeout (seconds)
PHI_REDACTOR_SESSION_MAX_LIFETIME=86400  # Session max lifetime (seconds)

Security Design

  • PHI never logged: PHI-safe log formatter scrubs all known patterns
  • Encryption at rest: Fernet encryption (AES-128-CBC) for vault entries
  • Hash-chain audit: Every redaction event is chained via SHA-256 hashes
  • Fail-safe: Detection/masking failures block requests (never pass through)
  • Session isolation: Each session has independent vault mappings
  • Key rotation: Built-in support for encryption key rotation

Development

# Clone and install
git clone https://github.com/dilawar-gopang/phi-redactor.git
cd phi-redactor
pip install -e ".[dev]"
python -m spacy download en_core_web_lg

# Run tests
pytest

# Lint and type check
ruff check src/ tests/
mypy src/

License

Apache License 2.0. See LICENSE for details.

Contributing

Contributions welcome! Please open an issue first to discuss what you'd like to change.


Built for healthcare AI developers who take HIPAA seriously.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phi_redactor-0.1.0.tar.gz (242.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phi_redactor-0.1.0-py3-none-any.whl (98.7 kB view details)

Uploaded Python 3

File details

Details for the file phi_redactor-0.1.0.tar.gz.

File metadata

  • Download URL: phi_redactor-0.1.0.tar.gz
  • Upload date:
  • Size: 242.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for phi_redactor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3031519eac62a37c241476c318a3fa3ce30abc0eb20c35a739b6cdd006f7f67f
MD5 4bc4a79d4b94bef865653cf85e967fa4
BLAKE2b-256 1a179f566947aab791b675c4f1508c3533d80a1ecc020dc94866f3b62686d14e

See more details on using hashes here.

File details

Details for the file phi_redactor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: phi_redactor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 98.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for phi_redactor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5367ad7d06d571066c389713879d9f5ba41acb7422b2fea6b45ae56134f1b657
MD5 e08a32fdd903063ccfdf5ce038d7e8d7
BLAKE2b-256 5902fcbc6f182dfd14269191a17284d540ca9bcf2403c2ba0ed833f536ae8255

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page