Drop-in LLM secret scrubber to prevent API key and credential leaks
Project description
scrub-llm
A lightweight, drop-in LLM secret scrubber to prevent API key and credential leaks in your AI applications.
Features
- Drop-in wrapper for OpenAI/httpx - no code rewrite required
- Bidirectional redaction - scrubs secrets before requests and after responses
- 30+ built-in patterns - AWS, GCP, GitHub, Slack, JWT tokens, and more
- Entropy detection - catches high-entropy strings that look like secrets
- Placeholder system - preserves secret functionality while hiding values
- Zero-copy streaming - works with
stream=Trueresponses - CLI tool - scrub logs and files from the command line
Installation
pip install scrub-llm
Quick Start
OpenAI Integration
from scrub_llm import OpenAIScrubber
import openai
# Wrap your OpenAI client
client = openai.OpenAI(api_key="your-key")
scrubbed_client = OpenAIScrubber(client)
# Use normally - secrets are automatically redacted
response = scrubbed_client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": "My AWS key is AKIAIOSFODNN7EXAMPLE" # ← Automatically redacted
}]
)
# Response secrets are also redacted
print(response.choices[0].message.content)
# "Your AWS key <REDACTED_AWS_ACCESS_KEY_ID> has been hidden"
Direct Usage
from scrub_llm import Scrubber
scrubber = Scrubber()
# Scrub prompts (with placeholder mapping)
text = "My GitHub token is ghp_1234567890abcdefghijklmnopqrstuvwxyz"
clean_text, mappings = scrubber.scrub_prompt(text)
print(clean_text) # "My GitHub token is <SECRET_1>"
# Scrub responses (one-way redaction)
response = "Generated API key: sk-proj-abc123xyz789"
clean_response = scrubber.scrub_response(response)
print(clean_response) # "Generated API key: <REDACTED_OPENAI_API_KEY>"
CLI Usage
# Check files for secrets
scrub-llm scan file.log
# Scrub secrets from files
scrub-llm scan file.log -o cleaned.log
# Pipe from stdin
cat production.log | scrub-llm scan
# Scan multiple files
scrub-llm scan *.log
Detected Secret Types
The library detects 30+ secret patterns out of the box:
- Cloud Providers: AWS keys, GCP keys, Azure credentials
- Source Control: GitHub, GitLab, Bitbucket tokens
- API Services: OpenAI, Anthropic, Stripe, Twilio, Mailgun keys
- Communication: Slack tokens/webhooks, Discord tokens
- Package Managers: npm, PyPI tokens
- Monitoring: DataDog, New Relic keys
- Authentication: JWTs, OAuth tokens, passwords in URLs
- Encryption: Private keys (RSA, SSH, PGP)
- High Entropy: Any string with high randomness (configurable)
Advanced Usage
Custom Detectors
from scrub_llm import Scrubber
from scrub_llm.detectors import RegexDetector
# Add custom patterns
scrubber = Scrubber()
custom_detector = RegexDetector()
custom_detector.patterns["my_pattern"] = re.compile(r"CUSTOM-[A-Z0-9]{16}")
scrubber.add_detector(custom_detector)
Entropy Configuration
# Adjust entropy detection sensitivity
scrubber = Scrubber(
enable_entropy=True,
min_entropy=4.0, # Higher = more selective (default: 3.5)
min_entropy_length=25 # Minimum length to check (default: 20)
)
Streaming Responses
# Works seamlessly with streaming
response = scrubbed_client.chat.completions.create(
model="gpt-4",
messages=[...],
stream=True
)
for chunk in response:
if chunk.flagged: # True if secrets detected
print(f"Secrets found: {chunk.secrets}")
print(chunk.safe_text()) # Always safe to display
httpx Integration
from scrub_llm.transport import ScrubberHTTPXHook
# Create a scrubbed httpx client
hook = ScrubberHTTPXHook()
client = hook.create_client()
# All requests/responses are automatically scrubbed
response = client.post("https://api.example.com", json={
"api_key": "sk-1234567890abcdef" # Automatically redacted
})
How It Works
- Pattern Matching: Detects secrets using regex patterns for known formats
- Entropy Analysis: Identifies high-entropy strings that look like secrets
- Placeholder Mapping: Replaces secrets with placeholders, maintaining a secure mapping
- Streaming Safety: Processes streaming responses chunk-by-chunk
- Bidirectional: Scrubs both outgoing prompts and incoming responses
Development
# Clone the repository
git clone https://github.com/haasonsaas/scrub-llm.git
cd scrub-llm
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check .
mypy .
Security Notes
- Placeholders are stored in thread-local storage for safety
- Original secrets never leave your application
- No external API calls or network access required
- All processing happens locally in-memory
- Safe for concurrent/async usage
Performance
- Minimal overhead (<1ms for typical prompts)
- Zero-copy streaming responses
- Efficient regex compilation and caching
- Thread-safe for production use
License
MIT License - This project is released under the MIT License.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Roadmap
- LangChain & LlamaIndex middleware
- Automatic PII detection (names, emails, phone numbers)
- ML-based false positive reduction
- Vault/secrets manager integration
- Rust port for performance-critical paths
- YARA rule support for advanced patterns
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with ❤️ to keep your secrets secret.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrub_llm-0.1.0.tar.gz.
File metadata
- Download URL: scrub_llm-0.1.0.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0df00cfe76e38054f4594ed9b3f26b03946a77e0bf5a359f4a6051fc793fe9de
|
|
| MD5 |
bab7d505a957df07bbaf2d94815b17df
|
|
| BLAKE2b-256 |
d2a22b6fb6278ca84588b72ea78defc83a899a9ed271ddf614ea8f3828fd4a91
|
File details
Details for the file scrub_llm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scrub_llm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
429c843b5ea8b1404fb8d4d796c5223e527a02200db09431ce65178afa5888dd
|
|
| MD5 |
6ea3f8cecb266ef7be93769a8604965c
|
|
| BLAKE2b-256 |
7c1030480d7aa1d6b73ab4c2f54d5e68741d09386c00a662e25b95a2202c7428
|