Skip to main content

A CLI tool to scan files for configurable regex patterns (PHI identifiers) and optionally replace matches with deterministic pseudonyms

Project description

ShredGuard by WISC Lab

CI CD

Scan files for PHI (Protected Health Information) patterns and replace them with deterministic pseudonyms. Integrates seamlessly with pre-commit hooks.

Appendix

Installation

pip install shred-guard

Quick Start

Run the interactive setup wizard:

shredguard init

This walks you through:

  • Selecting PHI patterns to detect (SSNs, emails, MRNs, custom patterns)
  • Configuring file restrictions
  • Setting up pre-commit hooks

Commands

shredguard init

Interactive setup wizard. Creates your configuration and optionally sets up pre-commit integration.

shredguard check

Scan for PHI patterns:

shredguard check .                    # Scan current directory
shredguard check data/ notes.txt     # Scan specific paths

Output uses ruff-style formatting:

patient_notes.txt:1:9: SG001 Subject ID [SUB-1234]
patient_notes.txt:2:6: SG002 SSN [123-45-6789]

shredguard fix

Replace PHI with pseudonyms:

shredguard fix .                                    # Replace with REDACTED-0, REDACTED-1, ...
shredguard fix --prefix ANON .                     # Custom prefix: ANON-0, ANON-1, ...
shredguard fix --output-map mapping.json .         # Save original -> pseudonym mapping

Replacements are deterministic: and the same value always gets the same pseudonym within a run.

Configuration

Configuration lives in pyproject.toml (or /*/*.toml set with --config):

[tool.shredguard]

[[tool.shredguard.patterns]]
regex = "SUB-\\d{4,6}"
description = "Subject ID"

[[tool.shredguard.patterns]]
regex = "\\b\\d{3}-\\d{2}-\\d{4}\\b"
description = "SSN"

Each pattern can optionally include files and exclude_files globs to control which files are scanned.

Pre-commit

Add to .pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: shredguard-check
        name: shredguard check
        entry: shredguard check
        language: system
        types: [text]

Or let shredguard init set this up for you.

Reference

CLI Options

shredguard check [OPTIONS] [FILES]...

Option Description
--all-files Scan all files recursively
--no-gitignore Don't respect .gitignore patterns
--config PATH Path to config file
-v, --verbose Show verbose output (skipped files, etc.)

shredguard fix [OPTIONS] [FILES]...

Option Description
--prefix TEXT Prefix for pseudonyms (default: REDACTED)
--output-map PATH Write JSON mapping of originals to pseudonyms
--all-files Scan all files recursively
--no-gitignore Don't respect .gitignore patterns
--config PATH Path to config file
-v, --verbose Show verbose output

Configuration Reference

[[tool.shredguard.patterns]]
regex = "SUB-\\d{4,6}"        # Required: regex pattern
description = "Subject ID"     # Optional: shown in output
files = ["*.csv", "data/**"]   # Optional: only scan matching files
exclude_files = ["*_test.*"]   # Optional: skip matching files

Built-in Pattern Suggestions

When running shredguard init, you can choose from these common patterns:

Pattern Description
SUB-\d{4,6} Subject ID
\b\d{3}-\d{2}-\d{4}\b Social Security Number
MRN\d{6,10} Medical Record Number
[email pattern] Email addresses
[phone pattern] Phone numbers (10 digits)
\b\d{5}(?:-\d{4})?\b ZIP codes

Exit Codes

Code Meaning
0 Success (no matches found for check)
1 Matches found or error

Binary File Handling

Binary files are automatically detected and skipped (null byte check in first 8KB). Use --verbose to see skipped files.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shred_guard-1.0.0.tar.gz (762.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shred_guard-1.0.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file shred_guard-1.0.0.tar.gz.

File metadata

  • Download URL: shred_guard-1.0.0.tar.gz
  • Upload date:
  • Size: 762.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shred_guard-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6ee047842cb3a4597f37ee3bca902954e560749e60c9e42eaa58386a5b2e8c0a
MD5 1b56a1142fb97185418e4d5d7ae11637
BLAKE2b-256 7afd7d4eb90a2d158eeece2fa199b41f5c5b1b394f68caf0a55cf896269ada18

See more details on using hashes here.

Provenance

The following attestation bundles were made for shred_guard-1.0.0.tar.gz:

Publisher: cd.yml on WISCLab/shred-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shred_guard-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: shred_guard-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for shred_guard-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 706ba9966bbaeaaeb6555f39a067b349829ae4ae9c39155fdfd690087fb94278
MD5 7e425978abe1e548381c64d21215f278
BLAKE2b-256 73b2018d0f7c809415c1f26d3b1a76a810b41de4469e0d80f356c83982aa4e96

See more details on using hashes here.

Provenance

The following attestation bundles were made for shred_guard-1.0.0-py3-none-any.whl:

Publisher: cd.yml on WISCLab/shred-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page