Skip to main content

DE-ID/RE-ID SDK for LLMs - Full-cycle PHI protection with automatic re-identification

Project description

Redact Proxy RE-ID SDK

Full-cycle PHI protection for LLMs with automatic re-identification.

Unlike simple PHI redaction which permanently removes sensitive data, this SDK:

  1. Tokenizes PHI with unique reversible tokens (John Smith[NAME_a1b2c3])
  2. Sends tokenized text to your LLM (OpenAI, Anthropic, Gemini)
  3. Re-identifies the response by restoring original PHI values

Your PHI never leaves your environment. LLMs only see anonymized tokens.

Installation

pip install redact-proxy-reid

# With specific LLM support
pip install redact-proxy-reid[openai]
pip install redact-proxy-reid[anthropic]
pip install redact-proxy-reid[all]  # All LLM providers

Quick Start (2 minutes)

Get Your API Key

  1. Sign up at redact.health
  2. Go to Dashboard → API Keys
  3. Create a new RE-ID API key (starts with rr_live_)

Basic Usage

from redact_proxy_reid import PHITokenizer, PHIReidentifier

# Set your API key (or use REDACT_API_KEY env var)
API_KEY = "rr_live_your_key_here"

# 1. Tokenize PHI
tokenizer = PHITokenizer(api_key=API_KEY)
result = tokenizer.tokenize("Patient John Smith, DOB 01/15/1980, SSN 123-45-6789")

print(result.tokenized_text)
# "Patient [NAME_a1b2c3], DOB [DATE_d4e5f6], SSN [SSN_g7h8i9]"

# 2. Send to your LLM (tokens are safe!)
# llm_response = your_llm_call(result.tokenized_text)

# 3. Re-identify the response
reidentifier = PHIReidentifier(api_key=API_KEY)
restored = reidentifier.reidentify(llm_response, result.token_map)

print(restored.text)
# Original PHI values restored

Drop-in OpenAI Wrapper

from openai import OpenAI
from redact_proxy_reid import OpenAIWrapper

# Wrap your existing client
client = OpenAI()
wrapped = OpenAIWrapper(client, api_key="rr_live_your_key_here")
# Or set REDACT_API_KEY env var and omit api_key

# Use exactly like normal - PHI protection is automatic
response = wrapped.chat(
    model="gpt-4",
    messages=[{
        "role": "user",
        "content": "Summarize this patient's condition: John Smith, 45yo male, MRN 12345"
    }]
)

print(response["content"])
# Response with original PHI restored automatically

Drop-in Anthropic Wrapper

from anthropic import Anthropic
from redact_proxy_reid import AnthropicWrapper

client = Anthropic()
wrapped = AnthropicWrapper(client, api_key="rr_live_your_key_here")

response = wrapped.message(
    model="claude-3-opus-20240229",
    messages=[{
        "role": "user",
        "content": "Patient Jane Doe needs a referral for her diabetes management"
    }]
)

Drop-in Gemini Wrapper

import google.generativeai as genai
from redact_proxy_reid import GeminiWrapper

genai.configure(api_key="your-gemini-key")
model = genai.GenerativeModel("gemini-pro")
wrapped = GeminiWrapper(model, api_key="rr_live_your_key_here")

response = wrapped.generate(
    "Create a care plan for patient Bob Johnson, age 67"
)

Using Environment Variables

# Set once, use everywhere
export REDACT_API_KEY="rr_live_your_key_here"
# No api_key parameter needed - uses env var automatically
tokenizer = PHITokenizer()
wrapped = OpenAIWrapper(client)

Linking to Email (Optional)

You can link your API key to an email for account recovery and notifications:

tokenizer = PHITokenizer(
    api_key="rr_live_your_key_here",
    email="your@email.com"
)

Multi-turn Conversations

Token mappings persist across conversation turns:

wrapped = OpenAIWrapper(client)

# First message
response1 = wrapped.chat(
    model="gpt-4",
    messages=[{"role": "user", "content": "Patient John Smith has diabetes"}],
    conversation_id="conv-123"
)

# Second message - same conversation
response2 = wrapped.chat(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Patient John Smith has diabetes"},
        {"role": "assistant", "content": response1["tokenized_content"]},
        {"role": "user", "content": "What medications should John take?"}
    ],
    conversation_id="conv-123"
)

# "John Smith" is consistently tokenized across both turns

Tier Configuration

Different tiers offer different capabilities:

from redact_proxy_reid import PHITokenizer, TierConfig, PHIType, TokenFormat

# Basic tier - sensible defaults
tokenizer = PHITokenizer(config=TierConfig.basic())

# Pro tier - more customization
config = TierConfig.pro()
config.tokenize_types = [PHIType.NAME, PHIType.SSN, PHIType.MRN]  # Only these types
config.token_format = TokenFormat.ANGLE  # <NAME_a1b2c3> instead of [NAME_a1b2c3]
tokenizer = PHITokenizer(config=config)

# Enterprise tier - full control
config = TierConfig.enterprise()
config.token_id_length = 10  # Longer, more unique tokens
config.custom_format = "<<{type}:{id}>>"  # Custom format
tokenizer = PHITokenizer(config=config)

PHI Types Supported

  • NAME - Patient and provider names
  • DATE - Dates of birth, admission, discharge
  • SSN - Social Security Numbers
  • PHONE - Phone numbers
  • EMAIL - Email addresses
  • ADDRESS - Street addresses
  • MRN - Medical Record Numbers
  • FACILITY - Hospital/clinic names
  • AGE - Ages over 89
  • ZIP - ZIP codes
  • ACCOUNT - Account numbers, Medicare/Medicaid IDs
  • LICENSE - License plate numbers
  • VIN - Vehicle identification numbers
  • DEVICE - Device identifiers
  • URL - Web URLs
  • IP - IP addresses

Token Map Serialization

Save and restore token maps for later re-identification:

import json

# Tokenize
result = tokenizer.tokenize("Patient data here")

# Save token map
token_map_json = json.dumps(result.token_map.to_dict())
# Store securely (database, encrypted file, etc.)

# Later: restore and re-identify
from redact_proxy_reid import TokenMap

token_map = TokenMap.from_dict(json.loads(token_map_json))
restored = reidentifier.reidentify(llm_response, token_map)

Selective Re-identification

Re-identify only certain PHI types:

# Only restore names, keep dates tokenized
restored = reidentifier.reidentify(
    text=llm_response,
    token_map=token_map,
    types_to_restore=[PHIType.NAME]
)

How It Works

Your App                    Redact RE-ID SDK                    LLM Provider
    |                              |                                  |
    |  "Patient John Smith..."     |                                  |
    | ---------------------------> |                                  |
    |                              |                                  |
    |                    Tokenize: "Patient [NAME_a1b2c3]..."         |
    |                    Store: {[NAME_a1b2c3]: "John Smith"}         |
    |                              |                                  |
    |                              |  "[NAME_a1b2c3]..."              |
    |                              | -------------------------------> |
    |                              |                                  |
    |                              |  "...[NAME_a1b2c3]..."           |
    |                              | <------------------------------- |
    |                              |                                  |
    |                    Re-identify: "...John Smith..."              |
    |                              |                                  |
    |  "...John Smith..."          |                                  |
    | <--------------------------- |                                  |

Security Notes

  • PHI tokenization happens locally in your environment
  • Only anonymized tokens are sent to LLM providers
  • Token maps should be stored securely (they contain the PHI!)
  • Consider encrypting token maps at rest

License

Commercial license required. See redact.health for pricing.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redact_proxy_reid-0.1.1.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redact_proxy_reid-0.1.1-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file redact_proxy_reid-0.1.1.tar.gz.

File metadata

  • Download URL: redact_proxy_reid-0.1.1.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for redact_proxy_reid-0.1.1.tar.gz
Algorithm Hash digest
SHA256 21b313c71090d8d112d2540accf4127b0a7c264e5520e2ca27ed58099bef11f6
MD5 a01652804841a65f01ff67c5837caf18
BLAKE2b-256 46634fcb617f85d38dd44bc348e10fdebb3661055a02e93c6e7b471a0313f60f

See more details on using hashes here.

File details

Details for the file redact_proxy_reid-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for redact_proxy_reid-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dc45c690469e2dc1d94ade709c70d4a268efc65f4ec9996789e13a3d7a154f08
MD5 a81c5596668ff1d915686404e53fa8ed
BLAKE2b-256 20380751c81da5d8120a5dc989e7185c045241912fad9efe8e7bd90a70e91bad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page