Skip to main content

Local-only CLI for batch PII redaction in PDFs

Project description

Redactron

Redactron — Local-only PII redaction for PDFs

Your files stay on your machine. No cloud. No subscription. No telemetry.

PyPI License: AGPL-3.0 Python 3.11+

Redactron redacts PII from PDFs on your machine. No cloud. No telemetry. No subscriptions.

Define your PII once in a profile, run it against any number of documents, and get a verified redacted output. The PDF never leaves your machine.

Encrypted multi-client vault. Touch ID gated on macOS. Audit log. OCR fallback for scanned documents. AGPL-3.0.



Why Redactron?

Free online redactors are usually ad-supported and many run analytics on the documents you upload. For medical records, legal documents, or anything covered by HIPAA, GDPR, or attorney-client privilege, that is a serious concern. Adobe Acrobat uploads files to Adobe's servers. iLovePDF and SmallPDF are cloud services with freemium models. You have no visibility into what happens to your files after upload.

Redactron runs entirely on your machine. The codebase has no HTTP client dependency and no outbound socket calls. You can verify this with a packet capture while running a redaction.

The source is AGPL-3.0 and available for inspection. There is no black-box model deciding what to redact. You define exactly what gets removed, and the tool re-scans the output to confirm it worked.

For professional use, the vault stores multiple client profiles encrypted with AES-256-GCM. The master key lives in the macOS login keychain, gated by Touch ID. Every run is logged to a local SQLite database.

Quickstart

pip install redactron
redactron init
redactron vault init

# Get the profile template — use this for every new profile
redactron profile template --output /tmp/me.yaml
# edit /tmp/me.yaml with your name, addresses, account numbers, etc.
redactron profile add --client me --from /tmp/me.yaml

# Redact a single file
redactron run document.pdf --client me

# Redact multiple files in a folder — outputs go to ./documents/redacted/
redactron run ./documents/ --client me

document_redacted.pdf lands in the same directory as the input. For a folder, all redacted files go to documents/redacted/ and a consolidated summary report is written to documents/redacted-reports/.

Features

  • Profile-driven. Define your PII once (names, aliases, addresses, phones, emails, SSNs, account numbers, custom regex) and redact any number of PDFs.
  • Encrypted vault. AES-256-GCM encrypted multi-client profile store. Master key in macOS Keychain.
  • Touch ID gate. LocalAuthentication soft-gate before every vault access on macOS.
  • OCR fallback. Auto-triggers on image-only pages via pytesseract. No flag needed.
  • Layout-aware. Column-aware address bridging prevents cross-column false positives in two-column PDFs.
  • Verification. Re-scans the redacted output and reports any PII survivors.
  • Audit log. SQLite record of every run (filename, detections, verification status).
  • Batch mode. redactron run ./docs/ redacts an entire directory. Outputs go to redacted/ subdir.
  • Consolidated report. Single YYYY-MM-DD-HHMM_batch-summary.md per batch run.
  • Dry run. Preview detections without writing output.

Profile example

version: 1
subject:
  display_name: "Jane Smith"
  aliases: ["Jane", "J. Smith"]
  addresses: ["123 Main Street, Springfield, IL 62701"]
  phones: ["+1-555-867-5309"]
  emails: ["jane@example.com"]
  account_numbers:
    - value: "0021305789Q834"
      preserve_last: 4
detection:
  fuzzy_match: true
  match_threshold: 0.85

Copy docs/examples/profile-template.yaml for the full annotated schema.

Multi-client vault

redactron vault init
redactron profile add --client alice --from alice.yaml
redactron profile add --client bob --from bob.yaml
redactron run statement.pdf --client alice
redactron profile list

Security model

The vault is AES-256-GCM encrypted at rest. On macOS, the master key is stored in the login keychain and access is gated by a Touch ID prompt via LocalAuthentication.

Touch ID is soft enforcement. It gates redactron's code path, not the keychain item itself. An unsigned Python package cannot use kSecAttrAccessControl (requires Apple code-signing entitlements). See docs/SECURITY.md for the full threat model.

Performance targets

Scenario Target
10-page text PDF < 3 seconds end-to-end
10-page image PDF (OCR) < 30 seconds
Peak memory per document < 500 MB

Platform support

Platform Status
macOS First-class (Touch ID vault)
Linux Planned for v1.1 (keyring via libsecret)
Windows Planned for v1.1 (DPAPI)

CLI reference

redactron run <path> [--client <id>] [--no-ocr] [--force-ocr] [--no-verify]
                     [--json] [--output <path>] [--quiet] [--per-file-reports]
redactron dry-run <path> [--json]
redactron verify <path>
redactron init
redactron vault init
redactron profile add --client <id> [--name <name>] [--from <yaml>]
redactron profile template [--output <path>] [--client <id>]
redactron profile list
redactron profile show <id> [--reveal]
redactron profile edit <id>
redactron profile delete <id>
redactron profile import <yaml> [--client <id>]
redactron log [--subject <id>] [--limit N]
redactron report <run-id>
redactron --version

Documentation

License

AGPL-3.0. See LICENSE.

Redactron depends on PyMuPDF which is also AGPL-3.0. If you distribute redactron as part of a proprietary product, the AGPL requires you to release your source. See docs/PRIVACY.md for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redactron-1.0.0.tar.gz (277.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redactron-1.0.0-py3-none-any.whl (67.2 kB view details)

Uploaded Python 3

File details

Details for the file redactron-1.0.0.tar.gz.

File metadata

  • Download URL: redactron-1.0.0.tar.gz
  • Upload date:
  • Size: 277.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redactron-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ef6fa0d696874af8159efcc2b0d424f9c7bdcb1d1bc157affac96db126c24b53
MD5 722d71497be69573ca4522342f012616
BLAKE2b-256 f2d4d4912506ea4cb6c196231e40a2ea8ba83886cdef485a32b1b7f6b454e272

See more details on using hashes here.

Provenance

The following attestation bundles were made for redactron-1.0.0.tar.gz:

Publisher: release.yml on tjndr/redactron

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file redactron-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: redactron-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 67.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redactron-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cdde9bcbd394ddcb0d34bb39e6ea49176195bde7a755bcb517cb6cfb2f111923
MD5 dab2389ea1077390b0afd68f429f4ae5
BLAKE2b-256 24d4f3214493b902b6b07ab7bc19a338a4496b6f734be13dca66f100501f5dc9

See more details on using hashes here.

Provenance

The following attestation bundles were made for redactron-1.0.0-py3-none-any.whl:

Publisher: release.yml on tjndr/redactron

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page