Skip to main content

Scrub PII from text before sending to LLMs. Detect and redact emails, phones, SSNs, credit cards, names, and more.

Project description

pii-scrub

Scrub PII from text before sending to LLMs. Detect and redact emails, phone numbers, SSNs, credit cards, IP addresses, and more. Zero dependencies. GDPR/HIPAA friendly.

The Pain

You're sending customer data to OpenAI and your compliance team is panicking. Names, emails, SSNs, and credit card numbers are leaking to third-party AI providers.

Install

pip install pii-scrub

Quick Start

from pii_scrub import scrub, detect

text = "Contact John Smith at john@example.com or 555-123-4567. SSN: 123-45-6789"

# Scrub all PII
clean = scrub(text)
print(clean)
# "Contact [NAME] at [EMAIL] or [PHONE]. SSN: [SSN]"

# Detect without scrubbing
findings = detect(text)
for f in findings:
    print(f"{f.type}: '{f.text}' at position {f.start}-{f.end}")
# EMAIL: 'john@example.com' at position 26-42
# PHONE: '555-123-4567' at position 46-58
# SSN: '123-45-6789' at position 65-76

What It Detects

PII Type Examples
Email user@example.com
Phone (555) 123-4567, +1-555-123-4567, 555.123.4567
SSN 123-45-6789, 123 45 6789
Credit Card 4111-1111-1111-1111, 5500 0000 0000 0004
IP Address 192.168.1.1, 10.0.0.1
Date of Birth born 01/15/1990, DOB: 1990-01-15
Street Address 123 Main Street, Apt 4B
Passport Passport: AB1234567
Driver License DL: D1234567
Bank Account Account: 1234567890
API Key sk-proj-xxx, Bearer xxx

API

from pii_scrub import scrub, detect, scrub_dict

# Scrub text (returns cleaned string)
clean = scrub(text, types=None, placeholder="[{type}]")

# Custom placeholders
clean = scrub(text, placeholder="***")  # all replaced with ***
clean = scrub(text, placeholder="[REDACTED]")

# Detect only (returns list of Finding objects)
findings = detect(text, types=["EMAIL", "PHONE"])  # specific types only

# Scrub nested dicts/lists (for JSON payloads)
data = {"user": {"name": "John", "email": "john@test.com"}, "msg": "Call 555-1234"}
clean_data = scrub_dict(data)
# {"user": {"name": "John", "email": "[EMAIL]"}, "msg": "Call [PHONE]"}

OpenAI Integration

from pii_scrub import scrub
import openai

def safe_completion(messages):
    # Scrub PII from all messages before sending
    clean_messages = [
        {**m, "content": scrub(m["content"])} for m in messages
    ]
    return openai.chat.completions.create(
        model="gpt-4o", messages=clean_messages
    )

Features

  • Zero dependencies — pure Python regex-based detection
  • Fast — <1ms for typical inputs
  • 11 PII types — email, phone, SSN, credit card, IP, DOB, address, passport, DL, bank, API keys
  • Custom placeholders[EMAIL], ***, [REDACTED], or any format
  • Dict/JSON support — recursively scrub nested data structures
  • Type filtering — detect/scrub only specific PII types
  • Position tracking — know exactly where PII was found

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_scrubber_lite-0.1.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_scrubber_lite-0.1.0-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file pii_scrubber_lite-0.1.0.tar.gz.

File metadata

  • Download URL: pii_scrubber_lite-0.1.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for pii_scrubber_lite-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cfad8f1105b6b40e82ecfbbbf53fb01b4a47275c68b8b65a454c3daa25ea9bf2
MD5 5e03de634657bfe51c1a66808b7a82d5
BLAKE2b-256 1cb4dc1a9ea4bcb5bb618ca680dad656b334a3898ffab202d4e6d480d3b92578

See more details on using hashes here.

File details

Details for the file pii_scrubber_lite-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pii_scrubber_lite-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dea6bc5e3ae02f63734afb88f87e1b5991116a07ece1cd86a2f1aae2cccf3776
MD5 ded1d586632cf68dba59c481a2af40fa
BLAKE2b-256 4ccadfd3cf247800c6f2bfbd7b8b17145a29fb2e6cc897fc83eacc5dfe456aa9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page