Skip to main content

Production-hardened Python redaction for structured logs, LLM agent payloads, and AWS-signed URLs.

Project description

redactkit

PyPI version Python versions CI License: MIT

Production-hardened Python redaction for structured logs, LLM agent payloads, and AWS-signed URLs. Zero dependencies — pure stdlib. Built for dict[str, Any], not for free-form prose.

from redactkit import redact_args

redact_args({"password": "hunter2", "user": "alice"})
# → {"password": "***", "user": "alice"}

Install

pip install redactkit

Requires Python ≥ 3.10. No runtime dependencies.

Why redactkit?

  1. Structured-data first. Recursive dict / list / tuple traversal out of the box. Drop into any code that already speaks Mapping[str, Any].
  2. AWS SigV4 redaction. Scrubs X-Amz-Signature, X-Amz-Credential, X-Amz-Security-Token from presigned URLs — a real production gap most redaction libraries ignore.
  3. Overflow splitting. summarize_payload returns a short wire-safe summary plus an optional full body, so a 5 MB tool result doesn't blow your log budget.
  4. Comprehensive default denylist. Passwords, tokens, JWT, OAuth, bearer, credentials, cookies, sessions — covered. Extensible without mutating module state.
  5. Zero dependencies. Drop into any Python ≥ 3.10 project, no transitive bloat.
  6. Production-hardened. Extracted from the Convilyn agent platform; battle-tested in LLM agent middleware, supervisor handoffs, event emission, and HTTP response redaction.

When to pick redactkit vs. alternatives

Tool Approach Best at Deps Pick when…
Microsoft Presidio ML-based NER (spaCy / transformers) Free-form text PII (names, addresses) Heavy (~1 GB models) Document / chat PII detection in regulated industries
scrubadub NLP rules + named recognizers Free-form text (emails, phones, names) nltk, textblob Scrubbing user-generated prose
Hand-rolled logging.Filter Custom filter per team Logging-specific None Reinventing the wheel; AWS SigV4 never covered
redactkit Key denylist + AWS SigV4 + overflow splitting Structured data: dicts, JSON, kwargs, OTel attrs, request bodies None LLM agent logs, API request/response logs, anywhere dict[str, Any] is the unit

Non-goals. redactkit does not do free-form NLP PII detection, cryptographic anonymization, or database-column encryption. Reach for Presidio or scrubadub for those.

Three killer examples

1. Structured dict redaction

from redactkit import redact_args

payload = {
    "account": {"bearer_token": "xyz", "user": "alice"},
    "items": [{"secret": "s"}, {"name": "plain"}],
}
redact_args(payload)
# → {
#     "account": {"bearer_token": "***", "user": "alice"},
#     "items": [{"secret": "***"}, {"name": "plain"}],
# }

Case-insensitive, recursive, and non-mutating. Covers nested dicts, list-of-dicts, and tuples (preserving type).

2. AWS presigned URL scrubbing

from redactkit import redact_args

redact_args({
    "url": "https://s3.example.com/x?X-Amz-Signature=abcdef123&foo=1",
})
# → {"url": "https://s3.example.com/x?X-Amz-Signature=***&foo=1"}

Strings inside payloads get scanned for AWS Signature V4 query fragments and scrubbed in place. Most logging filters miss this — redactkit doesn't.

3. Overflow splitting for big payloads

from redactkit import summarize_payload, output_digest

big_tool_result = {"items": [...]}  # 5 MB
summary, overflow = summarize_payload(big_tool_result, max_bytes=2048)

# Attach the short summary to your span / log event:
span.set_attribute("tool.output_summary", summary)

# Persist the full body to S3 if it overflowed:
if overflow is not None:
    digest = output_digest(overflow)
    s3.put_object(Bucket="logs", Key=f"overflow/{digest}", Body=overflow)
    span.set_attribute("tool.output_ref", f"s3://logs/overflow/{digest}")

UTF-8 boundary safe — no mojibake even if the cut lands mid-character.

Public API

Symbol Kind Purpose
redact_args(payload, *, key_pattern=SENSITIVE_KEY_RE) function Deep-redact dicts/lists/strings
redact_text(value, *, key_pattern=SENSITIVE_KEY_RE) function Scrub key=value patterns + presigned URLs from free-form text
redact_url_query(url) function Redact sensitive values in URL query strings
summarize_payload(payload, *, max_bytes=2048, already_redacted=False) function Wire-safe truncation; returns (summary, overflow_body)
output_digest(body) function 16-hex SHA-256 prefix — content-addressable overflow reference
extend_key_pattern(extra_terms) function Open/Closed denylist extension without module-state mutation
OutboundErrorRedactor(pattern) class Mask caller-supplied internal terms in error/log payloads
OutboundErrorRedactor.from_terms(terms) classmethod Compile a term list into a redactor
SENSITIVE_KEY_RE, SENSITIVE_KV_TEXT_RE, PRESIGNED_QUERY_RE regex Public patterns for direct use
DEFAULT_KEY_TERMS tuple[str, ...] Raw fragments backing SENSITIVE_KEY_RE
MASK str The redaction placeholder ("***")
MAX_SUMMARY_BYTES int Default cap for summarize_payload (2048)

Extending the denylist

The default denylist covers password / token / secret / bearer / cookie / session families. To add project-specific field names without monkey-patching:

from redactkit import extend_key_pattern, redact_args

my_pattern = extend_key_pattern([r"vendor_passcode", r"internal_id"])
redact_args({"vendor_passcode": "x", "user": "alice"}, key_pattern=my_pattern)
# → {"vendor_passcode": "***", "user": "alice"}

extend_key_pattern returns a new compiled regex; SENSITIVE_KEY_RE is never mutated (Open/Closed principle).

FAQ

Does it handle free-form text PII (names, addresses)? No. Use Presidio or scrubadub for that. redactkit's strength is structured data and known-schema secrets.

Does it handle nested dicts and lists? Yes. redact_args walks recursively. Tuples preserve their type.

Can I add my own sensitive key names? Yes — use extend_key_pattern([r"my_term"]) and pass the result via the key_pattern= argument. No module-level state is mutated.

Is it thread-safe? Yes. All redaction functions are pure (no global mutation, no I/O) and the module-level regexes are immutable compiled patterns.

Is redact_args non-mutating? Yes. It always returns a new object — the input dict / list / tuple is never modified in place.

Will redaction slow down my hot path? The dominant cost is the regex .search per dict key. For typical agent payloads (< 100 keys, < 10 KB serialized) the overhead is sub-millisecond.

Contributing

Issues and PRs welcome. Please add tests for any behavior change and keep the zero-runtime-dependency invariant.

Production users

Used in production by Convilyn. Open a PR adding your project here once you've shipped redactkit to prod.

License

MIT. Copyright © 2026 CoreNovus.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redactkit-0.1.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redactkit-0.1.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file redactkit-0.1.0.tar.gz.

File metadata

  • Download URL: redactkit-0.1.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redactkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 813f2929eaf21fe513fa0025b597d9334fa15c3522511060aa927002ca53d5cf
MD5 5ac4f845f8ca9fedd2a9847048cd4b3d
BLAKE2b-256 5e24a1cf330afb9153de32f3c5fc0d74f4d22c0ae4fa890c7af225bb7113e8a1

See more details on using hashes here.

File details

Details for the file redactkit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: redactkit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redactkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 813cd84d758f78297dd8de3240a241a5e9326ea0d18116b7efd6c6b0ec95c30f
MD5 8936cc4b846bc68db11e33eccd9f124c
BLAKE2b-256 6452c614057ce9cff1f9ba425f6fc68ca8c8fcbc1131cbd21bb9899bbc2b69c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page