DE-ID/RE-ID SDK for LLMs - Full-cycle PHI protection with automatic re-identification
Project description
Redact Proxy RE-ID SDK
Full-cycle PHI protection for LLMs with automatic re-identification.
Unlike simple PHI redaction which permanently removes sensitive data, this SDK:
- Tokenizes PHI with unique reversible tokens (
John Smith→[NAME_a1b2c3]) - Sends tokenized text to your LLM (OpenAI, Anthropic, Gemini)
- Re-identifies the response by restoring original PHI values
Your PHI never leaves your environment. LLMs only see anonymized tokens.
Installation
pip install redact-proxy-reid
# With specific LLM support
pip install redact-proxy-reid[openai]
pip install redact-proxy-reid[anthropic]
pip install redact-proxy-reid[all] # All LLM providers
Quick Start (2 minutes)
Get Your API Key
- Sign up at redact.health
- Go to Dashboard → API Keys
- Create a new RE-ID API key (starts with
rr_live_)
Basic Usage
from redact_proxy_reid import PHITokenizer, PHIReidentifier
# Set your API key (or use REDACT_API_KEY env var)
API_KEY = "rr_live_your_key_here"
# 1. Tokenize PHI
tokenizer = PHITokenizer(api_key=API_KEY)
result = tokenizer.tokenize("Patient John Smith, DOB 01/15/1980, SSN 123-45-6789")
print(result.tokenized_text)
# "Patient [NAME_a1b2c3], DOB [DATE_d4e5f6], SSN [SSN_g7h8i9]"
# 2. Send to your LLM (tokens are safe!)
# llm_response = your_llm_call(result.tokenized_text)
# 3. Re-identify the response
reidentifier = PHIReidentifier(api_key=API_KEY)
restored = reidentifier.reidentify(llm_response, result.token_map)
print(restored.text)
# Original PHI values restored
Drop-in OpenAI Wrapper
from openai import OpenAI
from redact_proxy_reid import OpenAIWrapper
# Wrap your existing client
client = OpenAI()
wrapped = OpenAIWrapper(client, api_key="rr_live_your_key_here")
# Or set REDACT_API_KEY env var and omit api_key
# Use exactly like normal - PHI protection is automatic
response = wrapped.chat(
model="gpt-4",
messages=[{
"role": "user",
"content": "Summarize this patient's condition: John Smith, 45yo male, MRN 12345"
}]
)
print(response["content"])
# Response with original PHI restored automatically
Drop-in Anthropic Wrapper
from anthropic import Anthropic
from redact_proxy_reid import AnthropicWrapper
client = Anthropic()
wrapped = AnthropicWrapper(client, api_key="rr_live_your_key_here")
response = wrapped.message(
model="claude-3-opus-20240229",
messages=[{
"role": "user",
"content": "Patient Jane Doe needs a referral for her diabetes management"
}]
)
Drop-in Gemini Wrapper
import google.generativeai as genai
from redact_proxy_reid import GeminiWrapper
genai.configure(api_key="your-gemini-key")
model = genai.GenerativeModel("gemini-pro")
wrapped = GeminiWrapper(model, api_key="rr_live_your_key_here")
response = wrapped.generate(
"Create a care plan for patient Bob Johnson, age 67"
)
Using Environment Variables
# Set once, use everywhere
export REDACT_API_KEY="rr_live_your_key_here"
# No api_key parameter needed - uses env var automatically
tokenizer = PHITokenizer()
wrapped = OpenAIWrapper(client)
Linking to Email (Optional)
You can link your API key to an email for account recovery and notifications:
tokenizer = PHITokenizer(
api_key="rr_live_your_key_here",
email="your@email.com"
)
Multi-turn Conversations
Token mappings persist across conversation turns:
wrapped = OpenAIWrapper(client)
# First message
response1 = wrapped.chat(
model="gpt-4",
messages=[{"role": "user", "content": "Patient John Smith has diabetes"}],
conversation_id="conv-123"
)
# Second message - same conversation
response2 = wrapped.chat(
model="gpt-4",
messages=[
{"role": "user", "content": "Patient John Smith has diabetes"},
{"role": "assistant", "content": response1["tokenized_content"]},
{"role": "user", "content": "What medications should John take?"}
],
conversation_id="conv-123"
)
# "John Smith" is consistently tokenized across both turns
Tier Configuration
Different tiers offer different capabilities:
from redact_proxy_reid import PHITokenizer, TierConfig, PHIType, TokenFormat
# Basic tier - sensible defaults
tokenizer = PHITokenizer(config=TierConfig.basic())
# Pro tier - more customization
config = TierConfig.pro()
config.tokenize_types = [PHIType.NAME, PHIType.SSN, PHIType.MRN] # Only these types
config.token_format = TokenFormat.ANGLE # <NAME_a1b2c3> instead of [NAME_a1b2c3]
tokenizer = PHITokenizer(config=config)
# Enterprise tier - full control
config = TierConfig.enterprise()
config.token_id_length = 10 # Longer, more unique tokens
config.custom_format = "<<{type}:{id}>>" # Custom format
tokenizer = PHITokenizer(config=config)
PHI Types Supported
NAME- Patient and provider namesDATE- Dates of birth, admission, dischargeSSN- Social Security NumbersPHONE- Phone numbersEMAIL- Email addressesADDRESS- Street addressesMRN- Medical Record NumbersFACILITY- Hospital/clinic namesAGE- Ages over 89ZIP- ZIP codesACCOUNT- Account numbers, Medicare/Medicaid IDsLICENSE- License plate numbersVIN- Vehicle identification numbersDEVICE- Device identifiersURL- Web URLsIP- IP addresses
Token Map Serialization
Save and restore token maps for later re-identification:
import json
# Tokenize
result = tokenizer.tokenize("Patient data here")
# Save token map
token_map_json = json.dumps(result.token_map.to_dict())
# Store securely (database, encrypted file, etc.)
# Later: restore and re-identify
from redact_proxy_reid import TokenMap
token_map = TokenMap.from_dict(json.loads(token_map_json))
restored = reidentifier.reidentify(llm_response, token_map)
Selective Re-identification
Re-identify only certain PHI types:
# Only restore names, keep dates tokenized
restored = reidentifier.reidentify(
text=llm_response,
token_map=token_map,
types_to_restore=[PHIType.NAME]
)
How It Works
Your App Redact RE-ID SDK LLM Provider
| | |
| "Patient John Smith..." | |
| ---------------------------> | |
| | |
| Tokenize: "Patient [NAME_a1b2c3]..." |
| Store: {[NAME_a1b2c3]: "John Smith"} |
| | |
| | "[NAME_a1b2c3]..." |
| | -------------------------------> |
| | |
| | "...[NAME_a1b2c3]..." |
| | <------------------------------- |
| | |
| Re-identify: "...John Smith..." |
| | |
| "...John Smith..." | |
| <--------------------------- | |
Security Notes
- PHI tokenization happens locally in your environment
- Only anonymized tokens are sent to LLM providers
- Token maps should be stored securely (they contain the PHI!)
- Consider encrypting token maps at rest
License
Commercial license required. See redact.health for pricing.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redact_proxy_reid-0.1.1.tar.gz.
File metadata
- Download URL: redact_proxy_reid-0.1.1.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21b313c71090d8d112d2540accf4127b0a7c264e5520e2ca27ed58099bef11f6
|
|
| MD5 |
a01652804841a65f01ff67c5837caf18
|
|
| BLAKE2b-256 |
46634fcb617f85d38dd44bc348e10fdebb3661055a02e93c6e7b471a0313f60f
|
File details
Details for the file redact_proxy_reid-0.1.1-py3-none-any.whl.
File metadata
- Download URL: redact_proxy_reid-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc45c690469e2dc1d94ade709c70d4a268efc65f4ec9996789e13a3d7a154f08
|
|
| MD5 |
a81c5596668ff1d915686404e53fa8ed
|
|
| BLAKE2b-256 |
20380751c81da5d8120a5dc989e7185c045241912fad9efe8e7bd90a70e91bad
|