Skip to main content

Production-hardened Python redaction for structured logs, LLM agent payloads, and AWS-signed URLs.

Project description

redactkit

PyPI Python CI Type Checked License

Production-hardened Python redaction for structured logs, LLM agent payloads, and AWS-signed URLs. Zero dependencies — pure stdlib. Built for dict[str, Any], not for free-form prose.

Status: Beta (0.1.x) · Pre-1.0 — the API may evolve based on early-user feedback before v1.0. Pin to redactkit~=0.1.1 if you want patch-level updates only.

from redactkit import redact_args

redact_args({"password": "hunter2", "user": "alice"})
# → {"password": "***", "user": "alice"}

Install

pip install redactkit

Requires Python ≥ 3.10. No runtime dependencies.

Why redactkit?

  1. Structured-data first. Recursive dict / list / tuple traversal out of the box. Drop into any code that already speaks Mapping[str, Any].
  2. AWS SigV4 redaction. Scrubs X-Amz-Signature, X-Amz-Credential, X-Amz-Security-Token from presigned URLs — a real production gap most redaction libraries ignore.
  3. Overflow splitting. summarize_payload returns a short wire-safe summary plus an optional full body, so a 5 MB tool result doesn't blow your log budget.
  4. Comprehensive + extensible denylist. Passwords, tokens, JWT, OAuth, bearer, credentials, cookies, sessions covered out of the box. Add your own via extend_key_pattern — no module-state mutation, no monkey-patching.
  5. Zero deps, production provenance. Pure stdlib (Python ≥ 3.10). Extracted from the Convilyn agent platform — used in LLM agent middleware, supervisor handoffs, event emission, and HTTP response redaction.

When to pick redactkit vs. alternatives

Tool Approach Best at Deps Pick when…
Microsoft Presidio ML-based NER (spaCy / transformers) Free-form text PII (names, addresses) Heavy (~1 GB models) Document / chat PII detection in regulated industries
scrubadub NLP rules + named recognizers Free-form text (emails, phones, names) nltk, textblob Scrubbing user-generated prose
Hand-rolled logging.Filter Custom filter per team Logging-specific None Reinventing the wheel; AWS SigV4 never covered
redactkit Key denylist + AWS SigV4 + overflow splitting Structured data: dicts, JSON, kwargs, OTel attrs, request bodies None LLM agent logs, API request/response logs, anywhere dict[str, Any] is the unit

Non-goals. redactkit does not do free-form NLP PII detection, cryptographic anonymization, or database-column encryption. Reach for Presidio or scrubadub for those.

Three killer examples

1. Structured dict redaction

from redactkit import redact_args

payload = {
    "account": {"bearer_token": "xyz", "user": "alice"},
    "items": [{"secret": "s"}, {"name": "plain"}],
}
redact_args(payload)
# → {
#     "account": {"bearer_token": "***", "user": "alice"},
#     "items": [{"secret": "***"}, {"name": "plain"}],
# }

Case-insensitive, recursive, and non-mutating. Covers nested dicts, list-of-dicts, and tuples (preserving type).

2. AWS presigned URL scrubbing

from redactkit import redact_args

redact_args({
    "url": "https://s3.example.com/x?X-Amz-Signature=abcdef123&foo=1",
})
# → {"url": "https://s3.example.com/x?X-Amz-Signature=***&foo=1"}

Strings inside payloads get scanned for AWS Signature V4 query fragments and scrubbed in place. Most logging filters miss this — redactkit doesn't.

3. Overflow splitting for big payloads

from redactkit import summarize_payload, output_digest

big_tool_result = {"items": [...]}  # 5 MB
summary, overflow = summarize_payload(big_tool_result, max_bytes=2048)

# Attach the short summary to your span / log event:
span.set_attribute("tool.output_summary", summary)

# Persist the full body to S3 if it overflowed:
if overflow is not None:
    digest = output_digest(overflow)
    s3.put_object(Bucket="logs", Key=f"overflow/{digest}", Body=overflow)
    span.set_attribute("tool.output_ref", f"s3://logs/overflow/{digest}")

UTF-8 boundary safe — no mojibake even if the cut lands mid-character.

Public API

Symbol Kind Purpose
redact_args(payload, *, key_pattern=SENSITIVE_KEY_RE) function Deep-redact dicts/lists/strings
redact_text(value, *, key_pattern=SENSITIVE_KEY_RE) function Scrub key=value patterns + presigned URLs from free-form text
redact_url_query(url) function Redact sensitive values in URL query strings
summarize_payload(payload, *, max_bytes=2048, already_redacted=False) function Wire-safe truncation; returns (summary, overflow_body)
output_digest(body) function 16-hex SHA-256 prefix — content-addressable overflow reference
extend_key_pattern(extra_terms) function Open/Closed denylist extension without module-state mutation
OutboundErrorRedactor(pattern) class Mask caller-supplied internal terms in error/log payloads
OutboundErrorRedactor.from_terms(terms) classmethod Compile a term list into a redactor
SENSITIVE_KEY_RE, SENSITIVE_KV_TEXT_RE, PRESIGNED_QUERY_RE regex Public patterns for direct use
DEFAULT_KEY_TERMS tuple[str, ...] Raw fragments backing SENSITIVE_KEY_RE
MASK str The redaction placeholder ("***")
MAX_SUMMARY_BYTES int Default cap for summarize_payload (2048)

Extending the denylist

The default denylist covers password / token / secret / bearer / cookie / session families. To add project-specific field names without monkey-patching:

from redactkit import extend_key_pattern, redact_args

my_pattern = extend_key_pattern([r"vendor_passcode", r"internal_id"])
redact_args({"vendor_passcode": "x", "user": "alice"}, key_pattern=my_pattern)
# → {"vendor_passcode": "***", "user": "alice"}

extend_key_pattern returns a new compiled regex; SENSITIVE_KEY_RE is never mutated (Open/Closed principle).

FAQ

Does it handle free-form text PII (names, addresses)?

No. Use Presidio or scrubadub for that. redactkit's strength is structured data and known-schema secrets.

Does it handle nested dicts and lists?

Yes. redact_args walks recursively. Tuples preserve their type.

Can I add my own sensitive key names?

Yes — use extend_key_pattern([r"my_term"]) and pass the result via the key_pattern= argument. No module-level state is mutated.

Is it thread-safe?

Yes. All redaction functions are pure (no global mutation, no I/O) and the module-level regexes are immutable compiled patterns.

Is redact_args non-mutating?

Yes. It always returns a new object — the input dict / list / tuple is never modified in place.

Will redaction slow down my hot path?

The dominant cost is the regex .search per dict key. For typical agent payloads (< 100 keys, < 10 KB serialized) the overhead is sub-millisecond.

Contributing

Issues and PRs welcome. See CONTRIBUTING.md for the dev setup, what we do and don't accept, and the PR checklist. Conduct expectations are in CODE_OF_CONDUCT.md.

Security

Found a redaction bypass or ReDoS pattern? Please don't open a public issue. See SECURITY.md for the private-advisory process and our 90-day coordinated disclosure window.

Production users

Used in production by Convilyn. Open a PR adding your project here once you've shipped redactkit to prod.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redactkit-0.1.1.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redactkit-0.1.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file redactkit-0.1.1.tar.gz.

File metadata

  • Download URL: redactkit-0.1.1.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redactkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 797cca4f658b448cc9df5c64e1240dc9e6f5f17592524fde91df65bb4253b59b
MD5 538ace5009681a0e3a6670d38bb23dcd
BLAKE2b-256 522cab2cc5b44c62ccf53ea33e5c981638b19258efef4016ee6993e85383b8a6

See more details on using hashes here.

File details

Details for the file redactkit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: redactkit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for redactkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9b1d205881d54c2d662d1cfc43fddee1a1e63d94b10f37b02796e9132a99ad1f
MD5 734dfbe9246963c69267a7bc79291458
BLAKE2b-256 ce00c16ba7282f43f0c5c28a09f55e41a767d00c7c34a471610ad7a9acc91785

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page