Production-hardened Python redaction for structured logs, LLM agent payloads, and AWS-signed URLs.
Project description
redactkit
Production-hardened Python redaction for structured logs, LLM agent payloads, and
AWS-signed URLs. Zero dependencies — pure stdlib. Built for dict[str, Any], not
for free-form prose.
from redactkit import redact_args
redact_args({"password": "hunter2", "user": "alice"})
# → {"password": "***", "user": "alice"}
Install
pip install redactkit
Requires Python ≥ 3.10. No runtime dependencies.
Why redactkit?
- Structured-data first. Recursive
dict/list/tupletraversal out of the box. Drop into any code that already speaksMapping[str, Any]. - AWS SigV4 redaction. Scrubs
X-Amz-Signature,X-Amz-Credential,X-Amz-Security-Tokenfrom presigned URLs — a real production gap most redaction libraries ignore. - Overflow splitting.
summarize_payloadreturns a short wire-safe summary plus an optional full body, so a 5 MB tool result doesn't blow your log budget. - Comprehensive default denylist. Passwords, tokens, JWT, OAuth, bearer, credentials, cookies, sessions — covered. Extensible without mutating module state.
- Zero dependencies. Drop into any Python ≥ 3.10 project, no transitive bloat.
- Production-hardened. Extracted from the Convilyn agent platform; battle-tested in LLM agent middleware, supervisor handoffs, event emission, and HTTP response redaction.
When to pick redactkit vs. alternatives
| Tool | Approach | Best at | Deps | Pick when… |
|---|---|---|---|---|
| Microsoft Presidio | ML-based NER (spaCy / transformers) | Free-form text PII (names, addresses) | Heavy (~1 GB models) | Document / chat PII detection in regulated industries |
| scrubadub | NLP rules + named recognizers | Free-form text (emails, phones, names) | nltk, textblob | Scrubbing user-generated prose |
Hand-rolled logging.Filter |
Custom filter per team | Logging-specific | None | Reinventing the wheel; AWS SigV4 never covered |
| redactkit | Key denylist + AWS SigV4 + overflow splitting | Structured data: dicts, JSON, kwargs, OTel attrs, request bodies | None | LLM agent logs, API request/response logs, anywhere dict[str, Any] is the unit |
Non-goals. redactkit does not do free-form NLP PII detection, cryptographic anonymization, or database-column encryption. Reach for Presidio or scrubadub for those.
Three killer examples
1. Structured dict redaction
from redactkit import redact_args
payload = {
"account": {"bearer_token": "xyz", "user": "alice"},
"items": [{"secret": "s"}, {"name": "plain"}],
}
redact_args(payload)
# → {
# "account": {"bearer_token": "***", "user": "alice"},
# "items": [{"secret": "***"}, {"name": "plain"}],
# }
Case-insensitive, recursive, and non-mutating. Covers nested dicts, list-of-dicts, and tuples (preserving type).
2. AWS presigned URL scrubbing
from redactkit import redact_args
redact_args({
"url": "https://s3.example.com/x?X-Amz-Signature=abcdef123&foo=1",
})
# → {"url": "https://s3.example.com/x?X-Amz-Signature=***&foo=1"}
Strings inside payloads get scanned for AWS Signature V4 query fragments and scrubbed in place. Most logging filters miss this — redactkit doesn't.
3. Overflow splitting for big payloads
from redactkit import summarize_payload, output_digest
big_tool_result = {"items": [...]} # 5 MB
summary, overflow = summarize_payload(big_tool_result, max_bytes=2048)
# Attach the short summary to your span / log event:
span.set_attribute("tool.output_summary", summary)
# Persist the full body to S3 if it overflowed:
if overflow is not None:
digest = output_digest(overflow)
s3.put_object(Bucket="logs", Key=f"overflow/{digest}", Body=overflow)
span.set_attribute("tool.output_ref", f"s3://logs/overflow/{digest}")
UTF-8 boundary safe — no mojibake even if the cut lands mid-character.
Public API
| Symbol | Kind | Purpose |
|---|---|---|
redact_args(payload, *, key_pattern=SENSITIVE_KEY_RE) |
function | Deep-redact dicts/lists/strings |
redact_text(value, *, key_pattern=SENSITIVE_KEY_RE) |
function | Scrub key=value patterns + presigned URLs from free-form text |
redact_url_query(url) |
function | Redact sensitive values in URL query strings |
summarize_payload(payload, *, max_bytes=2048, already_redacted=False) |
function | Wire-safe truncation; returns (summary, overflow_body) |
output_digest(body) |
function | 16-hex SHA-256 prefix — content-addressable overflow reference |
extend_key_pattern(extra_terms) |
function | Open/Closed denylist extension without module-state mutation |
OutboundErrorRedactor(pattern) |
class | Mask caller-supplied internal terms in error/log payloads |
OutboundErrorRedactor.from_terms(terms) |
classmethod | Compile a term list into a redactor |
SENSITIVE_KEY_RE, SENSITIVE_KV_TEXT_RE, PRESIGNED_QUERY_RE |
regex | Public patterns for direct use |
DEFAULT_KEY_TERMS |
tuple[str, ...] | Raw fragments backing SENSITIVE_KEY_RE |
MASK |
str | The redaction placeholder ("***") |
MAX_SUMMARY_BYTES |
int | Default cap for summarize_payload (2048) |
Extending the denylist
The default denylist covers password / token / secret / bearer / cookie / session families. To add project-specific field names without monkey-patching:
from redactkit import extend_key_pattern, redact_args
my_pattern = extend_key_pattern([r"vendor_passcode", r"internal_id"])
redact_args({"vendor_passcode": "x", "user": "alice"}, key_pattern=my_pattern)
# → {"vendor_passcode": "***", "user": "alice"}
extend_key_pattern returns a new compiled regex; SENSITIVE_KEY_RE is never
mutated (Open/Closed principle).
FAQ
Does it handle free-form text PII (names, addresses)? No. Use Presidio or scrubadub for that. redactkit's strength is structured data and known-schema secrets.
Does it handle nested dicts and lists?
Yes. redact_args walks recursively. Tuples preserve their type.
Can I add my own sensitive key names?
Yes — use extend_key_pattern([r"my_term"]) and pass the result via the
key_pattern= argument. No module-level state is mutated.
Is it thread-safe? Yes. All redaction functions are pure (no global mutation, no I/O) and the module-level regexes are immutable compiled patterns.
Is redact_args non-mutating?
Yes. It always returns a new object — the input dict / list / tuple is never
modified in place.
Will redaction slow down my hot path?
The dominant cost is the regex .search per dict key. For typical agent payloads
(< 100 keys, < 10 KB serialized) the overhead is sub-millisecond.
Contributing
Issues and PRs welcome. Please add tests for any behavior change and keep the zero-runtime-dependency invariant.
Production users
Used in production by Convilyn. Open a PR adding your project here once you've shipped redactkit to prod.
License
MIT. Copyright © 2026 CoreNovus.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file redactkit-0.1.0.tar.gz.
File metadata
- Download URL: redactkit-0.1.0.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
813f2929eaf21fe513fa0025b597d9334fa15c3522511060aa927002ca53d5cf
|
|
| MD5 |
5ac4f845f8ca9fedd2a9847048cd4b3d
|
|
| BLAKE2b-256 |
5e24a1cf330afb9153de32f3c5fc0d74f4d22c0ae4fa890c7af225bb7113e8a1
|
File details
Details for the file redactkit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: redactkit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
813cd84d758f78297dd8de3240a241a5e9326ea0d18116b7efd6c6b0ec95c30f
|
|
| MD5 |
8936cc4b846bc68db11e33eccd9f124c
|
|
| BLAKE2b-256 |
6452c614057ce9cff1f9ba425f6fc68ca8c8fcbc1131cbd21bb9899bbc2b69c5
|