Drop-in LLM secret scrubber to prevent API key and credential leaks

These details have not been verified by PyPI

Project links

Project description

scrub-llm

A lightweight, drop-in LLM secret scrubber to prevent API key and credential leaks in your AI applications.

Features

Drop-in wrapper for OpenAI/httpx - no code rewrite required
Bidirectional redaction - scrubs secrets before requests and after responses
30+ built-in patterns - AWS, GCP, GitHub, Slack, JWT tokens, and more
Entropy detection - catches high-entropy strings that look like secrets
Placeholder system - preserves secret functionality while hiding values
Zero-copy streaming - works with stream=True responses
CLI tool - scrub logs and files from the command line

Installation

pip install scrub-llm

Quick Start

OpenAI Integration

from scrub_llm import OpenAIScrubber
import openai

# Wrap your OpenAI client
client = openai.OpenAI(api_key="your-key")
scrubbed_client = OpenAIScrubber(client)

# Use normally - secrets are automatically redacted
response = scrubbed_client.chat.completions.create(
    model="gpt-4",
    messages=[{
        "role": "user",
        "content": "My AWS key is AKIAIOSFODNN7EXAMPLE"  # ← Automatically redacted
    }]
)

# Response secrets are also redacted
print(response.choices[0].message.content)
# "Your AWS key <REDACTED_AWS_ACCESS_KEY_ID> has been hidden"

Direct Usage

from scrub_llm import Scrubber

scrubber = Scrubber()

# Scrub prompts (with placeholder mapping)
text = "My GitHub token is ghp_1234567890abcdefghijklmnopqrstuvwxyz"
clean_text, mappings = scrubber.scrub_prompt(text)
print(clean_text)  # "My GitHub token is <SECRET_1>"

# Scrub responses (one-way redaction)  
response = "Generated API key: sk-proj-abc123xyz789"
clean_response = scrubber.scrub_response(response)
print(clean_response)  # "Generated API key: <REDACTED_OPENAI_API_KEY>"

CLI Usage

# Check files for secrets
scrub-llm scan file.log

# Scrub secrets from files
scrub-llm scan file.log -o cleaned.log

# Pipe from stdin
cat production.log | scrub-llm scan

# Scan multiple files
scrub-llm scan *.log

Detected Secret Types

The library detects 30+ secret patterns out of the box:

Cloud Providers: AWS keys, GCP keys, Azure credentials
Source Control: GitHub, GitLab, Bitbucket tokens
API Services: OpenAI, Anthropic, Stripe, Twilio, Mailgun keys
Communication: Slack tokens/webhooks, Discord tokens
Package Managers: npm, PyPI tokens
Monitoring: DataDog, New Relic keys
Authentication: JWTs, OAuth tokens, passwords in URLs
Encryption: Private keys (RSA, SSH, PGP)
High Entropy: Any string with high randomness (configurable)

Advanced Usage

Custom Detectors

from scrub_llm import Scrubber
from scrub_llm.detectors import RegexDetector

# Add custom patterns
scrubber = Scrubber()
custom_detector = RegexDetector()
custom_detector.patterns["my_pattern"] = re.compile(r"CUSTOM-[A-Z0-9]{16}")
scrubber.add_detector(custom_detector)

Entropy Configuration

# Adjust entropy detection sensitivity
scrubber = Scrubber(
    enable_entropy=True,
    min_entropy=4.0,      # Higher = more selective (default: 3.5)
    min_entropy_length=25  # Minimum length to check (default: 20)
)

Streaming Responses

# Works seamlessly with streaming
response = scrubbed_client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    stream=True
)

for chunk in response:
    if chunk.flagged:  # True if secrets detected
        print(f"Secrets found: {chunk.secrets}")
    print(chunk.safe_text())  # Always safe to display

httpx Integration

from scrub_llm.transport import ScrubberHTTPXHook

# Create a scrubbed httpx client
hook = ScrubberHTTPXHook()
client = hook.create_client()

# All requests/responses are automatically scrubbed
response = client.post("https://api.example.com", json={
    "api_key": "sk-1234567890abcdef"  # Automatically redacted
})

How It Works

Pattern Matching: Detects secrets using regex patterns for known formats
Entropy Analysis: Identifies high-entropy strings that look like secrets
Placeholder Mapping: Replaces secrets with placeholders, maintaining a secure mapping
Streaming Safety: Processes streaming responses chunk-by-chunk
Bidirectional: Scrubs both outgoing prompts and incoming responses

Development

# Clone the repository
git clone https://github.com/haasonsaas/scrub-llm.git
cd scrub-llm

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check .
mypy .

Security Notes

Placeholders are stored in thread-local storage for safety
Original secrets never leave your application
No external API calls or network access required
All processing happens locally in-memory
Safe for concurrent/async usage

Performance

Minimal overhead (<1ms for typical prompts)
Zero-copy streaming responses
Efficient regex compilation and caching
Thread-safe for production use

License

MIT License - This project is released under the MIT License.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Roadmap

LangChain & LlamaIndex middleware
Automatic PII detection (names, emails, phone numbers)
ML-based false positive reduction
Vault/secrets manager integration
Rust port for performance-critical paths
YARA rule support for advanced patterns

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

Built with ❤️ to keep your secrets secret.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrub_llm-0.1.0.tar.gz (14.1 kB view details)

Uploaded Jun 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrub_llm-0.1.0-py3-none-any.whl (18.1 kB view details)

Uploaded Jun 4, 2025 Python 3

File details

Details for the file scrub_llm-0.1.0.tar.gz.

File metadata

Download URL: scrub_llm-0.1.0.tar.gz
Upload date: Jun 4, 2025
Size: 14.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for scrub_llm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0df00cfe76e38054f4594ed9b3f26b03946a77e0bf5a359f4a6051fc793fe9de`
MD5	`bab7d505a957df07bbaf2d94815b17df`
BLAKE2b-256	`d2a22b6fb6278ca84588b72ea78defc83a899a9ed271ddf614ea8f3828fd4a91`

See more details on using hashes here.

File details

Details for the file scrub_llm-0.1.0-py3-none-any.whl.

File metadata

Download URL: scrub_llm-0.1.0-py3-none-any.whl
Upload date: Jun 4, 2025
Size: 18.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for scrub_llm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`429c843b5ea8b1404fb8d4d796c5223e527a02200db09431ce65178afa5888dd`
MD5	`6ea3f8cecb266ef7be93769a8604965c`
BLAKE2b-256	`7c1030480d7aa1d6b73ab4c2f54d5e68741d09386c00a662e25b95a2202c7428`

See more details on using hashes here.

scrub-llm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scrub-llm

Features

Installation

Quick Start

OpenAI Integration

Direct Usage

CLI Usage

Detected Secret Types

Advanced Usage

Custom Detectors

Entropy Configuration

Streaming Responses

httpx Integration

How It Works

Development

Security Notes

Performance

License

Contributing

Roadmap

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes