Rule-based PII and secret redaction for Markdown documents — audit log, risk-level filtering, LLM pipeline ready
Project description
markdown-redactor
Rule-based PII and secret redaction for Markdown documents — audit log, risk-level filtering, LLM pipeline ready
Quick start
pip install markdown-redactor
printf "Contact me at jane@example.com\n" | markdown-redactor -
Expected output:
Contact me at [REDACTED]
See docs/GUIDE.md for the full API and CLI usage guide.
Table of contents
- Who is this for
- Key features
- Built-in redaction rules
- How redaction works
- Performance
- Security and compliance notes
- Troubleshooting
- Additional resources
- Development and contribution
- Release process
Who is this for
- Teams feeding Markdown documents into LLMs (RAG, agents, chat pipelines)
- Security-conscious teams that need deterministic redaction before inference
- Developers who want a small codebase with extensible rules
Key features
- Pluggable architecture: register custom redaction rules without touching core engine
- Markdown-aware behavior: by default, skips fenced code blocks and inline code spans
- Config file support: drop a
redactor.toml(or usepyproject.toml) — auto-discovered from the working directory - Lightweight runtime: zero runtime dependencies on Python 3.11+ (tomli on 3.10)
- Typed API: strict typing-friendly design
- Operational visibility: per-rule match counters, timing stats, and opt-in audit log
Built-in redaction rules
Default engine includes 24 rules:
email,phoneipv4,ipv6us_ssn,us_einuk_ninoin_pan,in_aadhaar,in_gstinbr_cpf,br_cnpjiban,swift_bic,eu_vatlabeled_sensitive_id(tax ID, driver license, passport, national ID labels)secret_assignment(password/api_key/token style assignments)credential_uri(connection-string credentials)aws_access_key,generic_token,google_api_key,jwt,private_keycredit_card(Luhn-validated to reduce false positives)
How redaction works
- Markdown text is segmented.
- Based on config, non-redactable segments (like fenced code) can be preserved.
- Each redactable segment is processed by registered rules in order.
- Output and stats are returned.
This makes behavior explicit and easy to extend.
Performance
Runs in $O(n \cdot r)$ time where $n$ is input length and $r$ is active rule count. No network I/O, no AST parsing, no heavy dependencies.
Security and compliance notes
- This is best-effort pattern redaction, not formal DLP certification
- Always validate on your real data and threat model
- Combine with downstream controls (access controls, logging, policy engines)
- Add organization-specific rules for identifiers, ticket IDs, or internal secrets
Troubleshooting
Nothing is being redacted
- Verify you are using
create_default_engine()or registering custom rules - Check whether content is inside fenced/inline code that is skipped by default
Too much is being redacted
- Tighten custom regex patterns
- Keep
--redact-inline-code/--redact-fenced-code-blocksdisabled unless required
CLI command not found
- Ensure package is installed in active environment
- Try module mode:
python -m markdown_redactor.cli input.md
Additional resources
- Full usage guide: docs/GUIDE.md
- Architecture guide: docs/ARCHITECTURE.md
- FAQ: docs/FAQ.md
- Support process: SUPPORT.md
- Security policy: SECURITY.md
- Changelog: CHANGELOG.md
- Releasing guide: docs/RELEASING.md
- Guided onboarding docs: docs/README.md
- Runnable examples:
Development and contribution
See CONTRIBUTING.md for setup and quality checks.
Primary local quality command:
PYTHONPATH=src .venv/bin/python -m ruff check src tests && \
PYTHONPATH=src .venv/bin/python -m mypy src && \
PYTHONPATH=src .venv/bin/python -m pytest
Release process
Maintainers can follow docs/RELEASING.md.
Publishing is automated via .github/workflows/release.yml on tags matching v*.
GitHub Release notes and signed provenance attestations are generated via .github/workflows/github-release.yml.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markdown_redactor-0.1.5.tar.gz.
File metadata
- Download URL: markdown_redactor-0.1.5.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3041d18cec5b4f2b01e76232bbde1985f0618fba3eb6af3aa99fb503a9ac9b0
|
|
| MD5 |
bd1660668e52850e651d78ff59aebffc
|
|
| BLAKE2b-256 |
195bf99d1a5e6c71ad9bf4006df507c670bb926a6a26939fbb164f9790aaac7e
|
Provenance
The following attestation bundles were made for markdown_redactor-0.1.5.tar.gz:
Publisher:
release.yml on jcatama/markdown-redactor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markdown_redactor-0.1.5.tar.gz -
Subject digest:
a3041d18cec5b4f2b01e76232bbde1985f0618fba3eb6af3aa99fb503a9ac9b0 - Sigstore transparency entry: 1341769703
- Sigstore integration time:
-
Permalink:
jcatama/markdown-redactor@990cec2da4db9267e93b860ac93ec12a4a430baf -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/jcatama
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@990cec2da4db9267e93b860ac93ec12a4a430baf -
Trigger Event:
push
-
Statement type:
File details
Details for the file markdown_redactor-0.1.5-py3-none-any.whl.
File metadata
- Download URL: markdown_redactor-0.1.5-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74f3dec84aa4a82caa7e70c6eb3fc41219b6a331aefd2e49106e3d4e7eec8109
|
|
| MD5 |
1af2cff9600a403ccd82eca407595bd0
|
|
| BLAKE2b-256 |
4f954121d64e673f1cc428299bdda2a4422154a62412eb499807e4b7a35e7149
|
Provenance
The following attestation bundles were made for markdown_redactor-0.1.5-py3-none-any.whl:
Publisher:
release.yml on jcatama/markdown-redactor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markdown_redactor-0.1.5-py3-none-any.whl -
Subject digest:
74f3dec84aa4a82caa7e70c6eb3fc41219b6a331aefd2e49106e3d4e7eec8109 - Sigstore transparency entry: 1341769707
- Sigstore integration time:
-
Permalink:
jcatama/markdown-redactor@990cec2da4db9267e93b860ac93ec12a4a430baf -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/jcatama
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@990cec2da4db9267e93b860ac93ec12a4a430baf -
Trigger Event:
push
-
Statement type: