Pseudonymise patient identifiers and PII in text (and restore them) — a local, dependency-free pattern engine.
Project description
redacta
Pseudonymise patient identifiers and PII in text — and restore them. A local, dependency-free Python pattern engine.
pip install redacta
from redacta import redact, reinstate
redacted, report, token_map = redact(
"Dear patient, NHS Number: 943 476 5919, tel 0113 278 4532."
)
# redacted -> "Dear patient, NHS Number: [NHS_NUMBER_1], tel [PHONE_1]."
original = reinstate(redacted, token_map)
# original -> "Dear patient, NHS Number: 943 476 5919, tel 0113 278 4532."
What it detects
Deterministic, checksum-validated patterns: NHS numbers (Modulus-11), UK National
Insurance numbers, dates of birth (keyword-anchored; appointment dates left
intact by default), UK postcodes, US SSNs and ZIP codes, hospital/MRN numbers,
emails, phone numbers, URLs, IP addresses, Luhn-validated payment cards, IBANs,
account numbers, and UK vehicle registrations — plus keyword-anchored patient,
relative and carer names (clinician names preserved). Same value → same token; a
token_map lets you reverse it.
Names in free prose aren't caught (they need an LLM — see the Redacta agent skill). Stdlib only, no network calls; review output before sharing.
Safe Harbor mode
redacted, report, token_map = redact(text, safe_harbor=True)
Applies the stricter HIPAA Safe Harbor (§164.514) pass on top of the default: all dates (not just DOB — appointment dates included), specific ages, fax numbers, certificate/licence numbers, device serial numbers, VINs, and health-plan/beneficiary numbers. Over-redacts slightly versus the letter of the standard, on the safe side. Not legal advice.
Self-check
from redacta import self_check
leftovers = self_check(redacted) # [{'label': ..., 'sample': ...}, ...]
CLI
redacta letter.txt # prints JSON: redacted_text, report, token_map
redacta letter.txt --text-only # just the redacted text
redacta letter.txt --safe-harbor # strict HIPAA Safe Harbor pass
redacta-reinstate redacted.txt --map token_map.json
License
MIT-0 (MIT No Attribution). Built by PharmaTools.AI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redacta-1.2.0.tar.gz.
File metadata
- Download URL: redacta-1.2.0.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
175faaf709dafd596d74b0b43c4e4a25e7b0dcef12d98d80ccc4125f961277c1
|
|
| MD5 |
b7e6ec814745cdc4234f77413e7a59bf
|
|
| BLAKE2b-256 |
9f9a0faf2edaa571d142420a6e1ee023139ac14525adef807f7653c05df9c7fe
|
File details
Details for the file redacta-1.2.0-py3-none-any.whl.
File metadata
- Download URL: redacta-1.2.0-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fec295d04d56ecc0a30025c864717075653fe8207f4d49950d5e0923eb63bc7b
|
|
| MD5 |
a5df72a7641b54a6a6d612e860b0e19f
|
|
| BLAKE2b-256 |
9fc75737acbb72162cfc6faa77a2e4acddf342b00e9a81cba3799f3847d704a6
|