Skip to main content

A Python library for redacting PII from text using privacy-filter model

Project description

pii-safe — Redact PII from text

PyPI Python Ruff

Uses the OpenAI Privacy Filter model to detect and redact personally identifiable information (PII) from text.

Why hash-based redaction?

Plain [REDACTED] placeholders lose all information about which PII values are the same. Using hash(salt | pii_data) instead:

  • Consistent identifiers: The same PII always maps to the same hash, enabling cross-document correlation (e.g., "how many documents mention the same person?")
  • Reversible with salt: With the salt, you can recompute hashes to identify original PII if needed
  • Salt prevents rainbow table attacks: Without a salt, hashes could be precomputed for common names/emails to reverse-identify PII from redacted text

Install

pip install pii-safe

Usage

from pii_safe import redact_text

text = "mi nombre es Dario Clavijo"
redacted = redact_text(text)
print(redacted)  # mi nombre es[REDACTED_<hash>]

Salt for hashing

By default, a random 64-character salt is generated at startup. You can specify a salt to ensure consistent hashing across runs:

from pii_safe import redact_text, set_salt

# Option 1: Pass salt to redact_text
redacted = redact_text("mi nombre es Dario Clavijo", salt="my_secret_salt")

# Option 2: Set salt globally
set_salt("my_secret_salt")
redacted = redact_text("mi nombre es Dario Clavijo")

Using the Redacter class

from pii_safe import Redacter

redacter = Redacter(salt="my_secret_salt")
result1 = redacter.redact("mi nombre es Dario Clavijo")
result2 = redacter.redact("el es Dario Clavijo")

# Same PII gets consistent hash within this instance
hash_map = redacter.get_hash_map()
print(hash_map)  # {' Dario Clavijo': '<hash>'}

CLI

pii-safe input.txt
pii-safe input.txt -o output.txt
pii-safe input.txt --salt my_secret_salt

Development

git clone https://github.com/daedalus/pii-safe.git
cd pii-safe
pip install -e ".[test]"

# run tests
pytest

# format
ruff format src/ tests/

# lint
ruff check src/ tests/

# type check
mypy src/

API

redact_text(text: str, salt: str | None = None) -> str

Redacts PII from text using the openai/privacy-filter model.

set_salt(salt: str) -> None

Set the salt for hashing PII in the default redacter.

class Redacter

Context manager for consistent PII-to-hash mapping across calls.

  • __init__(salt: str | None = None): Initialize with optional salt
  • redact(text: str) -> str: Redact PII from text
  • get_hash_map() -> dict[str, str]: Get PII-to-hash mapping

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_safe-0.1.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pii_safe-0.1.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file pii_safe-0.1.1.tar.gz.

File metadata

  • Download URL: pii_safe-0.1.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pii_safe-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1691ef4243938149886cef88a5b0482bfcae2c7e8fe855066faf02bb7ca81d9d
MD5 e3c3460f6c61b3579b2bec7636be8852
BLAKE2b-256 19e0aa9b3a85bf000c818fb21224e7606dbf1993f5439cb5f57d6f8fa035b137

See more details on using hashes here.

Provenance

The following attestation bundles were made for pii_safe-0.1.1.tar.gz:

Publisher: pypi-publish.yml on daedalus/pii-safe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pii_safe-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pii_safe-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pii_safe-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4049dcd7dcc3d3aa92b53008cd0285b9adc0946ee8cb3cfdc8764ac7cadee94a
MD5 f1bb34bcf08fcbf266f6862f8f2238ab
BLAKE2b-256 81f3b75f67677f91548f12ea5b8184987c14b5d37b6b709f8bc7d9529eb523eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for pii_safe-0.1.1-py3-none-any.whl:

Publisher: pypi-publish.yml on daedalus/pii-safe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page