Runtime privacy leak scanner - detects and optionally blocks sensitive data leakage
Project description
ahh-leek
Runtime Privacy Leak Scanner - Detects and optionally blocks sensitive data leakage at runtime.
Features
- Secret Detection: API keys, tokens, credentials, private keys
- PII Detection: Credit cards (Luhn validated), SSN, email, phone numbers
- Exfiltration Detection: Base64-encoded secrets, env var exposure, large payloads
- Multiple Modes: Alert, Block, or Redact sensitive data
- Runtime Hooks: Intercepts network requests, logging, and file writes
- Zero False Positives: Entropy validation, Luhn algorithm, format validation
- CLI Tool: Scan files or wrap scripts with leak detection
Installation
pip install ahh-leek
Or install from source:
pip install -e .
Quick Start
Library Mode
import ahhleek
# Enable with defaults (alert only)
ahhleek.enable()
# Enable with blocking (raises exception on leak)
ahhleek.enable(mode="block")
# Custom configuration
ahhleek.enable(
detect=["secrets", "pii"], # Categories to scan
mode="alert", # alert | block | redact
on_leak=my_callback, # Custom handler
exclude=[r"/health", r"localhost"], # URLs to ignore
confidence_threshold=0.8, # Min confidence (0.0-1.0)
)
# Disable when done
ahhleek.disable()
One-Shot Scanning
import ahhleek
# Scan a string
result = ahhleek.scan("my api key is sk_live_abc123xyz456")
if result.has_leaks:
for leak in result:
print(f"Found: {leak.pattern_name}")
print(f" Redacted: {leak.redacted_value}")
print(f" Confidence: {leak.confidence:.0%}")
CLI Mode
# Scan a file
ahh-leek scan config.py
# Scan text directly
ahh-leek scan --text "AKIAIOSFODNN7EXAMPLE"
# JSON output
ahh-leek scan file.py --json
# Run a script with leak detection
ahh-leek run script.py --mode alert
# Run with blocking enabled
ahh-leek run script.py --mode block --detect secrets --detect pii
Detection Patterns
Secrets (High Confidence)
| Pattern | Example |
|---|---|
| AWS Access Key | AKIAIOSFODNN7EXAMPLE |
| AWS Secret Key | wJalrXUtnFEMI/K7MDENG... |
| GitHub Token | ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
| Slack Token | xoxb-1234567890-abcdefghijk |
| Stripe Key | sk_live_1234567890abcdefghij |
| JWT Token | eyJhbGciOiJIUzI1NiIs... |
| Private Key | -----BEGIN RSA PRIVATE KEY----- |
| Database URI | postgres://user:pass@host/db |
PII (Validated)
| Pattern | Validation |
|---|---|
| Credit Card | Luhn algorithm checksum |
| SSN | Format + range validation |
| Format validation | |
| Phone | Country code patterns |
Exfiltration
- Base64-encoded secrets
- Sensitive env vars in outbound data
- Large POST payloads
- Writes to suspicious paths (
/tmp,/public, etc.)
Operating Modes
Alert Mode (Default)
Logs warnings but allows data through:
[ahh-leek] LEAK DETECTED: aws_access_key
Location: requests.post() -> api.example.com
Match: AKIA****************
Confidence: 95%
Action: Allowed (alert mode)
Block Mode
Raises LeakBlockedError to prevent data from leaving:
ahhleek.enable(mode="block")
try:
requests.post(url, data={"key": "AKIAIOSFODNN7EXAMPLE"})
except ahhleek.LeakBlockedError as e:
print(f"Blocked: {e.event.pattern_name}")
Redact Mode
Automatically replaces sensitive data with placeholders:
ahhleek.enable(mode="redact")
# "sk_live_abc123" becomes "[REDACTED:stripe_secret_key]"
Configuration Options
ahhleek.enable(
# Detection categories
detect=["secrets", "pii", "exfil"],
# Operating mode
mode="alert", # alert | block | redact
# Custom leak handler
on_leak=lambda event: print(f"Leak: {event}"),
# URL patterns to exclude (regex)
exclude=[r"localhost", r"/health"],
# Patterns to allowlist (won't trigger)
allowlist=[r"test_", r"example\.com"],
# Minimum confidence threshold
confidence_threshold=0.7,
# Hook controls
hook_network=True, # Intercept HTTP requests
hook_logging=True, # Intercept logging/print
hook_files=False, # Intercept file writes (more invasive)
)
Hooks
Network Hooks
Intercepts outbound network requests:
urllib.request.urlopenrequests.Session.requesthttpx.Client.request/AsyncClient.requestaiohttp.ClientSession._requestsocket.socket.send/sendall
Logging Hooks
Intercepts log output:
logging.Handler.emitbuiltins.printsys.stdout/sys.stderr
File Hooks
Monitors file writes (disabled by default):
builtins.open(write modes)pathlib.Path.write_text/write_bytes
Zero False Positive Strategy
- Entropy Validation: Generic patterns require high entropy (>3.5 bits/char)
- Luhn Algorithm: Credit cards must pass checksum validation
- Format + Range: SSN must have valid area/group/serial
- Allowlists: User-defined patterns to ignore
- Confidence Scoring: Only alert above threshold
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run specific test file
pytest tests/test_patterns.py -v
Examples
Custom Callback
def my_leak_handler(event):
# Send to security logging system
security_log.warning(
"Data leak detected",
extra={
"pattern": event.pattern_name,
"location": event.location,
"confidence": event.confidence,
}
)
ahhleek.enable(on_leak=my_leak_handler)
Context Manager
from contextlib import contextmanager
@contextmanager
def leak_protection():
ahhleek.enable(mode="block")
try:
yield
finally:
ahhleek.disable()
with leak_protection():
# All network/logging activity is monitored
requests.post(url, data=sensitive_data)
Integration with Web Frameworks
# Flask middleware
@app.before_request
def enable_leak_detection():
if app.config.get("LEAK_DETECTION"):
ahhleek.enable(mode="alert")
@app.teardown_request
def disable_leak_detection(exception):
ahhleek.disable()
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ahh_leek-0.1.0.tar.gz.
File metadata
- Download URL: ahh_leek-0.1.0.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fab2fafd3d91830769ca86766162c1545c8fd1f47fcef19dc44fe82f8c31660
|
|
| MD5 |
7013bac6d7edc0f06adce911964877e7
|
|
| BLAKE2b-256 |
c4441a680904d4693ea070d00e8e8464f9a56e8fdbf97354608ec44208c425e9
|
File details
Details for the file ahh_leek-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ahh_leek-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e2bbdb9968317d77670f23bca00d4712ea51676abd3d7f95cc3c79b3e954065
|
|
| MD5 |
357029bb503a3a1355229dd7286a2c36
|
|
| BLAKE2b-256 |
5ca09dbeb53a7c3888fdb83b49053c23025f2762beffe129eb1a1d65be53f124
|