HIPAA-native PHI redaction proxy for AI/LLM interactions
Project description
phi-redactor
HIPAA-native PHI redaction proxy for AI/LLM interactions
phi-redactor is an open-source, drop-in PHI redaction proxy that sits between your healthcare AI applications and LLM providers (OpenAI, Anthropic). It automatically detects and masks all 18 HIPAA Safe Harbor identifiers in real-time, then restores original values locally -- so PHI never leaves your infrastructure.
Your App --> phi-redactor (localhost:8080) --> OpenAI / Anthropic
| |
[detect PHI] [masked request]
[mask with fakes] [LLM processes]
[vault mapping] [response back]
| |
[rehydrate response] <-- [masked response]
Why phi-redactor?
| Problem | Solution |
|---|---|
| PHI leaks to cloud LLMs | Transparent proxy masks all 18 HIPAA identifiers |
| Inconsistent fake data | Semantic masking generates clinically coherent replacements |
| No audit trail | Tamper-evident hash-chain audit log for every redaction |
| Complex integration | Zero code changes -- just change your base URL |
| Multi-turn context loss | Encrypted vault preserves mappings across conversation turns |
Quick Start
Install
pip install phi-redactor
python -m spacy download en_core_web_lg
Start the proxy
phi-redactor serve --port 8080
Use with OpenAI (zero code changes)
from openai import OpenAI
# Just change the base_url -- everything else stays the same
client = OpenAI(
api_key="your-openai-key",
base_url="http://localhost:8080/v1", # <-- only change needed
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": "Patient John Smith (SSN: 123-45-6789) has Type 2 Diabetes."
}]
)
print(response.choices[0].message.content)
# PHI is automatically redacted before reaching OpenAI,
# and restored in the response you receive
Use with Anthropic
import anthropic
client = anthropic.Anthropic(
api_key="your-anthropic-key",
base_url="http://localhost:8080/anthropic",
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Dr. Maria Garcia (NPI: 1234567890) prescribed metformin."
}]
)
Library API (no LLM needed)
import httpx
# Redact text directly
resp = httpx.post("http://localhost:8080/api/v1/redact", json={
"text": "Patient Jane Doe SSN 987-65-4321 seen on 01/15/2026."
})
result = resp.json()
print(result["redacted_text"]) # PHI replaced with synthetic values
session_id = result["session_id"]
# Rehydrate later
resp = httpx.post("http://localhost:8080/api/v1/rehydrate", json={
"text": result["redacted_text"],
"session_id": session_id,
})
print(resp.json()["text"]) # Original PHI restored
All 18 HIPAA Safe Harbor Identifiers
phi-redactor detects and masks all 18 identifier types required by the HIPAA Safe Harbor method:
| # | Category | Detection Method | Example |
|---|---|---|---|
| 1 | Person Names | NER + Pattern | John Smith -> James Wilson |
| 2 | Geographic Data | NER + Pattern | Springfield, IL -> Portland, OR |
| 3 | Dates | Pattern + NER | 03/15/1956 -> 07/22/1955 |
| 4 | Phone Numbers | Pattern | (555) 123-4567 -> (555) 987-6543 |
| 5 | Fax Numbers | Pattern | Fax: 555-0100 -> Fax: 555-0299 |
| 6 | Email Addresses | Pattern | john@test.com -> james@example.net |
| 7 | SSN | Pattern | 123-45-6789 -> 987-65-4321 |
| 8 | Medical Record Numbers | Pattern | MRN: 00456789 -> MRN: 00891234 |
| 9 | Health Plan IDs | Pattern | BCBS-987654321 -> AETNA-123456789 |
| 10 | Account Numbers | Pattern | ACC-00112233 -> ACC-99887766 |
| 11 | License/DEA/NPI | Pattern | NPI: 1234567890 -> NPI: 9876543210 |
| 12 | Vehicle IDs | Pattern | VIN: 1HGBH41... -> VIN: 2FGCD52... |
| 13 | Device IDs (UDI) | Pattern | UDI: (01)12345... -> UDI: (01)98765... |
| 14 | URLs | Pattern | https://patient-portal.com -> https://example.com |
| 15 | IP Addresses | Pattern | 192.168.1.100 -> 10.0.0.42 |
| 16 | Biometric IDs | Pattern | Fingerprint hash -> BIO-a1b2c3d4 |
| 17 | Photos | Detection | [REDACTED_PHOTO] |
| 18 | Other Unique IDs | Pattern | ID-12345678 -> ID-87654321 |
Architecture
+------------------+ +-------------------+ +------------------+
| Your App | --> | phi-redactor | --> | LLM Provider |
| (OpenAI SDK) | | (localhost) | | (OpenAI/Claude) |
+------------------+ +-------------------+ +------------------+
| |
+-----+-----+ +-----+-----+
| Detection | | Masking |
| Engine | | Engine |
| (Presidio | | (Faker + |
| + spaCy) | | Custom) |
+-----------+ +-----------+
| |
+-----+-----+ +-----+-----+
| Encrypted | | Audit |
| Vault | | Trail |
| (SQLite + | | (Hash- |
| Fernet) | | chain) |
+-----------+ +-----------+
Core Components
| Component | Description |
|---|---|
| Detection Engine | Presidio + spaCy NER + 8 custom HIPAA recognizers |
| Masking Engine | Faker-based semantic replacement with healthcare providers |
| Encrypted Vault | Fernet-encrypted SQLite for PHI-to-synthetic mappings |
| Proxy Server | FastAPI reverse proxy with OpenAI + Anthropic adapters |
| Audit Trail | Append-only hash-chain JSON Lines log (tamper-evident) |
| Compliance Reports | HIPAA Safe Harbor evidence report generator |
API Endpoints
Proxy Routes
| Method | Path | Description |
|---|---|---|
| POST | /v1/chat/completions |
OpenAI chat proxy (drop-in compatible) |
| POST | /v1/embeddings |
OpenAI embeddings proxy |
| POST | /anthropic/v1/messages |
Anthropic Messages API proxy |
Library Routes
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/redact |
Detect and redact PHI from text |
| POST | /api/v1/rehydrate |
Restore original PHI from redacted text |
Management Routes
| Method | Path | Description |
|---|---|---|
| GET | /api/v1/health |
Health check and system info |
| GET | /api/v1/stats |
Aggregate redaction statistics |
| GET | /api/v1/sessions |
List all sessions |
| GET | /api/v1/compliance/report |
Full HIPAA compliance report |
| GET | /api/v1/compliance/summary |
Quick compliance status |
| GET | /api/v1/audit |
Query audit trail events |
CLI Commands
phi-redactor serve [--port 8080] [--host 0.0.0.0] # Start the proxy
phi-redactor redact --file patient_notes.txt # Batch file redaction
phi-redactor report --full --output report.json # Compliance report
phi-redactor version # Show version
Configuration
All settings can be configured via environment variables with the PHI_REDACTOR_ prefix:
PHI_REDACTOR_PORT=8080 # Proxy port
PHI_REDACTOR_HOST=0.0.0.0 # Bind address
PHI_REDACTOR_SENSITIVITY=0.5 # Detection sensitivity (0.0=aggressive, 1.0=permissive)
PHI_REDACTOR_LOG_LEVEL=INFO # Logging level
PHI_REDACTOR_VAULT_PASSPHRASE=... # Optional vault encryption passphrase
PHI_REDACTOR_SESSION_IDLE_TIMEOUT=1800 # Session idle timeout (seconds)
PHI_REDACTOR_SESSION_MAX_LIFETIME=86400 # Session max lifetime (seconds)
Security Design
- PHI never logged: PHI-safe log formatter scrubs all known patterns
- Encryption at rest: Fernet encryption (AES-128-CBC) for vault entries
- Hash-chain audit: Every redaction event is chained via SHA-256 hashes
- Fail-safe: Detection/masking failures block requests (never pass through)
- Session isolation: Each session has independent vault mappings
- Key rotation: Built-in support for encryption key rotation
Development
# Clone and install
git clone https://github.com/dilawar-gopang/phi-redactor.git
cd phi-redactor
pip install -e ".[dev]"
python -m spacy download en_core_web_lg
# Run tests
pytest
# Lint and type check
ruff check src/ tests/
mypy src/
License
Apache License 2.0. See LICENSE for details.
Contributing
Contributions welcome! Please open an issue first to discuss what you'd like to change.
Built for healthcare AI developers who take HIPAA seriously.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phi_redactor-0.1.0.tar.gz.
File metadata
- Download URL: phi_redactor-0.1.0.tar.gz
- Upload date:
- Size: 242.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3031519eac62a37c241476c318a3fa3ce30abc0eb20c35a739b6cdd006f7f67f
|
|
| MD5 |
4bc4a79d4b94bef865653cf85e967fa4
|
|
| BLAKE2b-256 |
1a179f566947aab791b675c4f1508c3533d80a1ecc020dc94866f3b62686d14e
|
File details
Details for the file phi_redactor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: phi_redactor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 98.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5367ad7d06d571066c389713879d9f5ba41acb7422b2fea6b45ae56134f1b657
|
|
| MD5 |
e08a32fdd903063ccfdf5ce038d7e8d7
|
|
| BLAKE2b-256 |
5902fcbc6f182dfd14269191a17284d540ca9bcf2403c2ba0ed833f536ae8255
|