Scrub PII from text before sending to LLMs. Detect and redact emails, phones, SSNs, credit cards, names, and more.
Project description
pii-scrub
Scrub PII from text before sending to LLMs. Detect and redact emails, phone numbers, SSNs, credit cards, IP addresses, and more. Zero dependencies. GDPR/HIPAA friendly.
The Pain
You're sending customer data to OpenAI and your compliance team is panicking. Names, emails, SSNs, and credit card numbers are leaking to third-party AI providers.
Install
pip install pii-scrub
Quick Start
from pii_scrub import scrub, detect
text = "Contact John Smith at john@example.com or 555-123-4567. SSN: 123-45-6789"
# Scrub all PII
clean = scrub(text)
print(clean)
# "Contact [NAME] at [EMAIL] or [PHONE]. SSN: [SSN]"
# Detect without scrubbing
findings = detect(text)
for f in findings:
print(f"{f.type}: '{f.text}' at position {f.start}-{f.end}")
# EMAIL: 'john@example.com' at position 26-42
# PHONE: '555-123-4567' at position 46-58
# SSN: '123-45-6789' at position 65-76
What It Detects
| PII Type | Examples |
|---|---|
| user@example.com | |
| Phone | (555) 123-4567, +1-555-123-4567, 555.123.4567 |
| SSN | 123-45-6789, 123 45 6789 |
| Credit Card | 4111-1111-1111-1111, 5500 0000 0000 0004 |
| IP Address | 192.168.1.1, 10.0.0.1 |
| Date of Birth | born 01/15/1990, DOB: 1990-01-15 |
| Street Address | 123 Main Street, Apt 4B |
| Passport | Passport: AB1234567 |
| Driver License | DL: D1234567 |
| Bank Account | Account: 1234567890 |
| API Key | sk-proj-xxx, Bearer xxx |
API
from pii_scrub import scrub, detect, scrub_dict
# Scrub text (returns cleaned string)
clean = scrub(text, types=None, placeholder="[{type}]")
# Custom placeholders
clean = scrub(text, placeholder="***") # all replaced with ***
clean = scrub(text, placeholder="[REDACTED]")
# Detect only (returns list of Finding objects)
findings = detect(text, types=["EMAIL", "PHONE"]) # specific types only
# Scrub nested dicts/lists (for JSON payloads)
data = {"user": {"name": "John", "email": "john@test.com"}, "msg": "Call 555-1234"}
clean_data = scrub_dict(data)
# {"user": {"name": "John", "email": "[EMAIL]"}, "msg": "Call [PHONE]"}
OpenAI Integration
from pii_scrub import scrub
import openai
def safe_completion(messages):
# Scrub PII from all messages before sending
clean_messages = [
{**m, "content": scrub(m["content"])} for m in messages
]
return openai.chat.completions.create(
model="gpt-4o", messages=clean_messages
)
Features
- Zero dependencies — pure Python regex-based detection
- Fast — <1ms for typical inputs
- 11 PII types — email, phone, SSN, credit card, IP, DOB, address, passport, DL, bank, API keys
- Custom placeholders —
[EMAIL],***,[REDACTED], or any format - Dict/JSON support — recursively scrub nested data structures
- Type filtering — detect/scrub only specific PII types
- Position tracking — know exactly where PII was found
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pii_scrubber_lite-0.1.0.tar.gz.
File metadata
- Download URL: pii_scrubber_lite-0.1.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfad8f1105b6b40e82ecfbbbf53fb01b4a47275c68b8b65a454c3daa25ea9bf2
|
|
| MD5 |
5e03de634657bfe51c1a66808b7a82d5
|
|
| BLAKE2b-256 |
1cb4dc1a9ea4bcb5bb618ca680dad656b334a3898ffab202d4e6d480d3b92578
|
File details
Details for the file pii_scrubber_lite-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pii_scrubber_lite-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dea6bc5e3ae02f63734afb88f87e1b5991116a07ece1cd86a2f1aae2cccf3776
|
|
| MD5 |
ded1d586632cf68dba59c481a2af40fa
|
|
| BLAKE2b-256 |
4ccadfd3cf247800c6f2bfbd7b8b17145a29fb2e6cc897fc83eacc5dfe456aa9
|