Skip to main content

Scrub personal info, secrets, and API keys from Claude Code transcripts before publishing

Project description

🧼 claude-code-scrubber🫧

Scrub API keys, secrets, and personal information from Claude Code transcripts before publishing them to public repos.

Inspired by Simon Willison's claude-code-transcripts — this tool sits between your raw session data and your public GitHub Pages, giving you peace of mind that you're not leaking credentials.

What it catches

Severity What's detected
🔴 High Anthropic, OpenAI, GitHub, AWS, Google, Slack, Stripe, HuggingFace, SendGrid, Twilio, npm, PyPI, Vercel, Supabase, Netlify API keys & tokens. Bearer/Basic auth headers. JWT tokens. SSH private keys. Database connection strings. Generic SECRET/TOKEN/PASSWORD assignments.
🟡 Medium Email addresses. Private IP addresses (10.x, 192.168.x, 172.16-31.x).
🔵 Low OS usernames in file paths (/Users/you/, /home/you/). Encoded Claude project paths. Shell prompts with user@host.local.

Supported formats

  • JSONL — Local Claude Code sessions from ~/.claude/projects/
  • JSON — Claude Code for web session exports
  • HTML — Output from claude-code-transcripts (preserves HTML structure)

Install

# With pip
pip install claude-code-scrubber

# With uv (no install needed)
uvx claude-code-scrubber scan session.jsonl

# From source
git clone https://github.com/yanndebray/claude-code-scrubber
cd claude-code-scrubber
pip install -e .

Quick start

# 1. Scan first (dry-run) — see what would be scrubbed
claude-code-scrubber scan my-session.jsonl -u $(whoami)

# 2. Scrub and write clean output
claude-code-scrubber scrub my-session.jsonl -u $(whoami)
# → writes my-session.scrubbed.jsonl

# 3. Scrub HTML transcripts into an output directory
claude-code-scrubber scrub transcripts/*.html -u $(whoami) -o clean/

Usage

scan — Dry-run detection

# Scan with verbose output showing every match
claude-code-scrubber scan session.jsonl -u myuser -v

# Scan only high-severity items
claude-code-scrubber scan session.jsonl -s high

# Scan multiple files
claude-code-scrubber scan *.jsonl *.html

Example output:

📄 Scanning session.jsonl ...
  🔴 [HIGH] Anthropic API key  at line 1.message.content  → sk-ant-***…
  🔴 [HIGH] AWS access key     at line 3.message.content  → AKIAIOSFO…
  🟡 [MEDIUM] Email address    at line 3.message.content  → yann.dup…
  🔵 [LOW] Home directory path  at line 2.message.content  → /Users/yan…

🧼 Found 23 item(s) to scrub across 1 file(s):
  🔴 High:   19  (API keys, tokens, passwords)
  🟡 Medium: 2   (emails, private IPs)
  🔵 Low:    2   (usernames in paths)

The scan command exits with code 1 if findings exist — useful in CI.

scrub — Redact and write clean files

# Write to a suffixed file (default: .scrubbed)
claude-code-scrubber scrub session.jsonl -u myuser
# → session.scrubbed.jsonl

# Write to an output directory
claude-code-scrubber scrub session.jsonl -o clean/

# Overwrite originals (careful!)
claude-code-scrubber scrub session.jsonl --in-place

# Custom suffix
claude-code-scrubber scrub session.jsonl --suffix .clean

init — Create a config file

claude-code-scrubber init           # creates .claude-code-scrubber.json
claude-code-scrubber init -f toml   # creates .claude-code-scrubber.toml

Configuration

Create a .claude-code-scrubber.json (or .toml) in your project root:

{
    "username": "yann",
    "severity": ["high", "medium", "low"],
    "output_suffix": ".scrubbed",
    "allowlist": [
        "sk-ant-this-is-a-dummy-key-for-docs"
    ],
    "string_replacements": {
        "MyCompanyName": "ACME",
        "my-secret-project": "project-x"
    },
    "patterns": [
        {
            "name": "Internal ticket ID",
            "regex": "PROJ-[0-9]{4,}",
            "replacement": "PROJ-XXXX",
            "severity": "medium"
        }
    ]
}
Field Description
username Your OS username, used to redact paths like /Users/you/
severity Which levels to scrub. Default: all three.
allowlist Strings that should never be redacted (false positive prevention)
string_replacements Exact string → replacement pairs, applied after regex
patterns Extra regex patterns with name, regex, replacement, severity
output_suffix Suffix for output files. Default: .scrubbed

Config files are auto-discovered by walking up from the current directory.

Workflow: Claude Code → Public GitHub

# 1. Generate HTML transcripts with Simon's tool
claude-code-transcripts local -o transcripts/

# 2. Scrub them
claude-code-scrubber scrub transcripts/*.html -u $(whoami) -o docs/

# 3. Push to GitHub Pages
git add docs/
git commit -m "Add scrubbed transcripts"
git push

CI gate (GitHub Actions)

- name: Check transcripts for secrets
  run: |
    pip install claude-code-scrubber
    claude-code-scrubber scan docs/**/*.html -u runner -s high

How it works

  1. Pattern matching — A curated set of 30+ regex patterns detect common API key formats, PII, and secrets
  2. Format-aware parsing — JSONL/JSON files are parsed and scrubbed recursively at the value level (preserving valid JSON structure). HTML is split on tags to avoid breaking markup.
  3. Layered severity — You control what gets scrubbed. Need to keep email addresses but redact API keys? Use -s high
  4. Allowlisting — Known-safe strings (like dummy keys in docs) are never touched

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_code_scrubber-0.1.0.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_code_scrubber-0.1.0-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file claude_code_scrubber-0.1.0.tar.gz.

File metadata

  • Download URL: claude_code_scrubber-0.1.0.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for claude_code_scrubber-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7cf100a4becd78aada72198b6a3c8f979190b5ee8fa845d2df4e95dbca39e911
MD5 95e48de674aa2c70fcfefc568c4599e2
BLAKE2b-256 f9b6d3fb0837b9cc94f00c342e0414e97bd6a56aee126365c1d0b4d536fa0306

See more details on using hashes here.

File details

Details for the file claude_code_scrubber-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: claude_code_scrubber-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for claude_code_scrubber-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ff21a90826d54bdcf6b52657751cc5b56ca46f158940364ca6abc15a9dc1257
MD5 eb958977e23924febbabd44d4b6b9e85
BLAKE2b-256 dda5947f41ac375524f56ae9cac69528035707739662b3baccb88cdfa797d2ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page