Skip to main content

Pseudonymise patient identifiers and PII in text (and restore them) — a local, dependency-free pattern engine.

Project description

redacta

Pseudonymise patient identifiers and PII in text — and restore them. A local, dependency-free Python pattern engine.

pip install redacta
from redacta import redact, reinstate

redacted, report, token_map = redact(
    "Dear patient, NHS Number: 943 476 5919, tel 0113 278 4532."
)
# redacted -> "Dear patient, NHS Number: [NHS_NUMBER_1], tel [PHONE_1]."

original = reinstate(redacted, token_map)
# original -> "Dear patient, NHS Number: 943 476 5919, tel 0113 278 4532."

What it detects

Deterministic, checksum-validated patterns: NHS numbers (Modulus-11), UK National Insurance numbers, dates of birth (keyword-anchored; appointment dates left intact by default), UK postcodes, US SSNs and ZIP codes, hospital/MRN numbers, emails, phone numbers, URLs, IP addresses, Luhn-validated payment cards, IBANs, account numbers, and UK vehicle registrations — plus keyword-anchored patient, relative and carer names (clinician names preserved). Same value → same token; a token_map lets you reverse it.

Names in free prose aren't caught (they need an LLM — see the Redacta agent skill). Stdlib only, no network calls; review output before sharing.

Safe Harbor mode

redacted, report, token_map = redact(text, safe_harbor=True)

Applies the stricter HIPAA Safe Harbor (§164.514) pass on top of the default: all dates (not just DOB — appointment dates included), specific ages, fax numbers, certificate/licence numbers, device serial numbers, VINs, and health-plan/beneficiary numbers. Over-redacts slightly versus the letter of the standard, on the safe side. Not legal advice.

Self-check

from redacta import self_check
leftovers = self_check(redacted)   # [{'label': ..., 'sample': ...}, ...]

CLI

redacta letter.txt                 # prints JSON: redacted_text, report, token_map
redacta letter.txt --text-only     # just the redacted text
redacta letter.txt --safe-harbor   # strict HIPAA Safe Harbor pass
redacta-reinstate redacted.txt --map token_map.json

License

MIT-0 (MIT No Attribution). Built by PharmaTools.AI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redacta-1.2.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redacta-1.2.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file redacta-1.2.0.tar.gz.

File metadata

  • Download URL: redacta-1.2.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for redacta-1.2.0.tar.gz
Algorithm Hash digest
SHA256 175faaf709dafd596d74b0b43c4e4a25e7b0dcef12d98d80ccc4125f961277c1
MD5 b7e6ec814745cdc4234f77413e7a59bf
BLAKE2b-256 9f9a0faf2edaa571d142420a6e1ee023139ac14525adef807f7653c05df9c7fe

See more details on using hashes here.

File details

Details for the file redacta-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: redacta-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for redacta-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fec295d04d56ecc0a30025c864717075653fe8207f4d49950d5e0923eb63bc7b
MD5 a5df72a7641b54a6a6d612e860b0e19f
BLAKE2b-256 9fc75737acbb72162cfc6faa77a2e4acddf342b00e9a81cba3799f3847d704a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page