Skip to main content

Rule-based PII and secret redaction for Markdown documents — audit log, risk-level filtering, LLM pipeline ready

Project description

markdown-redactor

Rule-based PII and secret redaction for Markdown documents — audit log, risk-level filtering, LLM pipeline ready

Quick start

pip install markdown-redactor
printf "Contact me at jane@example.com\n" | markdown-redactor -

Expected output:

Contact me at [REDACTED]

See docs/GUIDE.md for the full API and CLI usage guide.

Table of contents

Who is this for

  • Teams feeding Markdown documents into LLMs (RAG, agents, chat pipelines)
  • Security-conscious teams that need deterministic redaction before inference
  • Developers who want a small codebase with extensible rules

Key features

  • Pluggable architecture: register custom redaction rules without touching core engine
  • Markdown-aware behavior: by default, skips fenced code blocks and inline code spans
  • Config file support: drop a redactor.toml (or use pyproject.toml) — auto-discovered from the working directory
  • Lightweight runtime: zero runtime dependencies on Python 3.11+ (tomli on 3.10)
  • Typed API: strict typing-friendly design
  • Operational visibility: per-rule match counters, timing stats, and opt-in audit log

Built-in redaction rules

Default engine includes 24 rules:

  • email, phone
  • ipv4, ipv6
  • us_ssn, us_ein
  • uk_nino
  • in_pan, in_aadhaar, in_gstin
  • br_cpf, br_cnpj
  • iban, swift_bic, eu_vat
  • labeled_sensitive_id (tax ID, driver license, passport, national ID labels)
  • secret_assignment (password/api_key/token style assignments)
  • credential_uri (connection-string credentials)
  • aws_access_key, generic_token, google_api_key, jwt, private_key
  • credit_card (Luhn-validated to reduce false positives)

How redaction works

  1. Markdown text is segmented.
  2. Based on config, non-redactable segments (like fenced code) can be preserved.
  3. Each redactable segment is processed by registered rules in order.
  4. Output and stats are returned.

This makes behavior explicit and easy to extend.

Performance

Runs in $O(n \cdot r)$ time where $n$ is input length and $r$ is active rule count. No network I/O, no AST parsing, no heavy dependencies.

Security and compliance notes

  • This is best-effort pattern redaction, not formal DLP certification
  • Always validate on your real data and threat model
  • Combine with downstream controls (access controls, logging, policy engines)
  • Add organization-specific rules for identifiers, ticket IDs, or internal secrets

Troubleshooting

Nothing is being redacted

  • Verify you are using create_default_engine() or registering custom rules
  • Check whether content is inside fenced/inline code that is skipped by default

Too much is being redacted

  • Tighten custom regex patterns
  • Keep --redact-inline-code / --redact-fenced-code-blocks disabled unless required

CLI command not found

  • Ensure package is installed in active environment
  • Try module mode: python -m markdown_redactor.cli input.md

Additional resources

Development and contribution

See CONTRIBUTING.md for setup and quality checks.

Primary local quality command:

PYTHONPATH=src .venv/bin/python -m ruff check src tests && \
PYTHONPATH=src .venv/bin/python -m mypy src && \
PYTHONPATH=src .venv/bin/python -m pytest

Release process

Maintainers can follow docs/RELEASING.md.

Publishing is automated via .github/workflows/release.yml on tags matching v*. GitHub Release notes and signed provenance attestations are generated via .github/workflows/github-release.yml.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdown_redactor-0.1.5.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markdown_redactor-0.1.5-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file markdown_redactor-0.1.5.tar.gz.

File metadata

  • Download URL: markdown_redactor-0.1.5.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for markdown_redactor-0.1.5.tar.gz
Algorithm Hash digest
SHA256 a3041d18cec5b4f2b01e76232bbde1985f0618fba3eb6af3aa99fb503a9ac9b0
MD5 bd1660668e52850e651d78ff59aebffc
BLAKE2b-256 195bf99d1a5e6c71ad9bf4006df507c670bb926a6a26939fbb164f9790aaac7e

See more details on using hashes here.

Provenance

The following attestation bundles were made for markdown_redactor-0.1.5.tar.gz:

Publisher: release.yml on jcatama/markdown-redactor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markdown_redactor-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for markdown_redactor-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 74f3dec84aa4a82caa7e70c6eb3fc41219b6a331aefd2e49106e3d4e7eec8109
MD5 1af2cff9600a403ccd82eca407595bd0
BLAKE2b-256 4f954121d64e673f1cc428299bdda2a4422154a62412eb499807e4b7a35e7149

See more details on using hashes here.

Provenance

The following attestation bundles were made for markdown_redactor-0.1.5-py3-none-any.whl:

Publisher: release.yml on jcatama/markdown-redactor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page