A CLI tool to scan files for configurable regex patterns (PHI identifiers) and optionally replace matches with deterministic pseudonyms
Project description
ShredGuard by WISC Lab
Scan files for PHI (Protected Health Information) patterns and replace them with deterministic pseudonyms. Integrates seamlessly with pre-commit hooks.
Appendix
Installation
pip install shred-guard
Quick Start
Run the interactive setup wizard:
shredguard init
This walks you through:
- Selecting PHI patterns to detect (SSNs, emails, MRNs, custom patterns)
- Configuring file restrictions
- Setting up pre-commit hooks
Commands
shredguard init
Interactive setup wizard. Creates your configuration and optionally sets up pre-commit integration.
shredguard check
Scan for PHI patterns:
shredguard check . # Scan current directory
shredguard check data/ notes.txt # Scan specific paths
Output uses ruff-style formatting:
patient_notes.txt:1:9: SG001 Subject ID [SUB-1234]
patient_notes.txt:2:6: SG002 SSN [123-45-6789]
shredguard fix
Replace PHI with pseudonyms:
shredguard fix . # Replace with REDACTED-0, REDACTED-1, ...
shredguard fix --prefix ANON . # Custom prefix: ANON-0, ANON-1, ...
shredguard fix --output-map mapping.json . # Save original -> pseudonym mapping
Replacements are deterministic: and the same value always gets the same pseudonym within a run.
Configuration
Configuration lives in pyproject.toml (or /*/*.toml set with --config):
[tool.shredguard]
[[tool.shredguard.patterns]]
regex = "SUB-\\d{4,6}"
description = "Subject ID"
[[tool.shredguard.patterns]]
regex = "\\b\\d{3}-\\d{2}-\\d{4}\\b"
description = "SSN"
Each pattern can optionally include files and exclude_files globs to control which files are scanned.
Pre-commit
Add to .pre-commit-config.yaml:
repos:
- repo: local
hooks:
- id: shredguard-check
name: shredguard check
entry: shredguard check
language: system
types: [text]
Or let shredguard init set this up for you.
Reference
CLI Options
shredguard check [OPTIONS] [FILES]...
| Option | Description |
|---|---|
--all-files |
Scan all files recursively |
--no-gitignore |
Don't respect .gitignore patterns |
--config PATH |
Path to config file |
-v, --verbose |
Show verbose output (skipped files, etc.) |
shredguard fix [OPTIONS] [FILES]...
| Option | Description |
|---|---|
--prefix TEXT |
Prefix for pseudonyms (default: REDACTED) |
--output-map PATH |
Write JSON mapping of originals to pseudonyms |
--all-files |
Scan all files recursively |
--no-gitignore |
Don't respect .gitignore patterns |
--config PATH |
Path to config file |
-v, --verbose |
Show verbose output |
Configuration Reference
[[tool.shredguard.patterns]]
regex = "SUB-\\d{4,6}" # Required: regex pattern
description = "Subject ID" # Optional: shown in output
files = ["*.csv", "data/**"] # Optional: only scan matching files
exclude_files = ["*_test.*"] # Optional: skip matching files
Built-in Pattern Suggestions
When running shredguard init, you can choose from these common patterns:
| Pattern | Description |
|---|---|
SUB-\d{4,6} |
Subject ID |
\b\d{3}-\d{2}-\d{4}\b |
Social Security Number |
MRN\d{6,10} |
Medical Record Number |
[email pattern] |
Email addresses |
[phone pattern] |
Phone numbers (10 digits) |
\b\d{5}(?:-\d{4})?\b |
ZIP codes |
Exit Codes
| Code | Meaning |
|---|---|
0 |
Success (no matches found for check) |
1 |
Matches found or error |
Binary File Handling
Binary files are automatically detected and skipped (null byte check in first 8KB). Use --verbose to see skipped files.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shred_guard-1.0.0.tar.gz.
File metadata
- Download URL: shred_guard-1.0.0.tar.gz
- Upload date:
- Size: 762.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ee047842cb3a4597f37ee3bca902954e560749e60c9e42eaa58386a5b2e8c0a
|
|
| MD5 |
1b56a1142fb97185418e4d5d7ae11637
|
|
| BLAKE2b-256 |
7afd7d4eb90a2d158eeece2fa199b41f5c5b1b394f68caf0a55cf896269ada18
|
Provenance
The following attestation bundles were made for shred_guard-1.0.0.tar.gz:
Publisher:
cd.yml on WISCLab/shred-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shred_guard-1.0.0.tar.gz -
Subject digest:
6ee047842cb3a4597f37ee3bca902954e560749e60c9e42eaa58386a5b2e8c0a - Sigstore transparency entry: 999699320
- Sigstore integration time:
-
Permalink:
WISCLab/shred-guard@2a506155849e9fdaa75f24fced9634a2ff8945c9 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/WISCLab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@2a506155849e9fdaa75f24fced9634a2ff8945c9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file shred_guard-1.0.0-py3-none-any.whl.
File metadata
- Download URL: shred_guard-1.0.0-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
706ba9966bbaeaaeb6555f39a067b349829ae4ae9c39155fdfd690087fb94278
|
|
| MD5 |
7e425978abe1e548381c64d21215f278
|
|
| BLAKE2b-256 |
73b2018d0f7c809415c1f26d3b1a76a810b41de4469e0d80f356c83982aa4e96
|
Provenance
The following attestation bundles were made for shred_guard-1.0.0-py3-none-any.whl:
Publisher:
cd.yml on WISCLab/shred-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shred_guard-1.0.0-py3-none-any.whl -
Subject digest:
706ba9966bbaeaaeb6555f39a067b349829ae4ae9c39155fdfd690087fb94278 - Sigstore transparency entry: 999699372
- Sigstore integration time:
-
Permalink:
WISCLab/shred-guard@2a506155849e9fdaa75f24fced9634a2ff8945c9 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/WISCLab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@2a506155849e9fdaa75f24fced9634a2ff8945c9 -
Trigger Event:
release
-
Statement type: