Skip to main content

Entropy-based secret scanner for source code — detects API keys, tokens, passwords, and other sensitive data leaks

Project description

entro-scan

Entropy-based secret scanner for source code — detects API keys, tokens, passwords, and other sensitive data leaks before they reach production.

CI Python Version License: MIT

Features

  • Shannon entropy analysis — finds high-entropy strings that look like secrets
  • Known pattern detection — regex matching for JWT, AWS keys, GitHub tokens, private keys, DB URLs, and more
  • Git history scanning — scan commit history for accidentally committed secrets
  • Multiple output formats — terminal (colorized), JSON, CSV, SARIF (GitHub code scanning compatible)
  • Fast parallel scanning — multi-process worker pool for large codebases
  • Low false-positive rate — smart filters reject common strings, hex dumps, and natural language
  • Configurable — TOML-based config with custom thresholds, excludes, and file types
  • Zero external dependencies — pure Python 3.11+, uses only the standard library
  • Pre-commit hook ready — catch secrets before they're committed

Installation

pip install entro-scan

Or install from source:

git clone https://github.com/vyofgod/entro-scan.git
cd entro-scan
pip install -e .

Usage

# Scan current directory
entro-scan

# Scan a specific path
entro-scan /path/to/project

# Custom entropy threshold (lower = more findings)
entro-scan /path --threshold 4.0

# Output in JSON format
entro-scan /path --format json

# Output to file
entro-scan /path --format json -o results.json

# Scan git history (last 100 commits)
entro-scan /path --git

# Scan git history with custom depth
entro-scan /path --git --max-commits 500

# Parallel scan with 8 workers
entro-scan /path --workers 8

# Quiet mode (only findings, no banner)
entro-scan /path --quiet

# Generate a default config file
entro-scan --init

Output Formats

Terminal (default)

Color-coded output with severity levels:

  • Red (score > 4.5): Critical — likely a secret
  • Yellow (score > 3.9): High — suspicious
  • Green (score <= 3.9): Medium — low-confidence finding

JSON

Machine-readable output for CI/CD pipelines.

CSV

Spreadsheet-friendly output for reporting.

SARIF

Static Analysis Results Interchange Format — compatible with GitHub code scanning.

Configuration

Create .entro-scan.toml in your project root:

threshold = 3.5
workers = 4
quiet = false
verbose = false
output_format = "terminal"
git_enabled = false
max_commits = 100

exclude_dirs = [
    ".git", "node_modules", "venv", "__pycache__",
    ".idea", ".vscode", "build", "dist", "target",
]

exclude_files = [
    "package-lock.json", "yarn.lock", "pnpm-lock.yaml",
    "cargo.lock", "go.sum",
]

include_extensions = [
    ".py", ".rs", ".js", ".ts", ".go", ".java", ".kt", ".swift",
    ".rb", ".php", ".sh", ".json", ".yaml", ".yml", ".toml", ".env",
]

Alternatively, config can live under [tool.entro-scan] in your pyproject.toml.

Supported Patterns

Pattern Severity
JWT (JSON Web Tokens) Critical
AWS Access Key ID Critical
AWS Secret Key Critical
GitHub Token Critical
Slack Token Critical
Private Keys (RSA/DSA/EC/OpenSSH) Critical
GitLab Token High
Heroku API Key High
Database URLs (Postgres, MySQL, MongoDB, Redis) High
Generic API Keys / Secrets Medium

Pre-commit Hook

Add to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/vyofgod/entro-scan
    rev: v1.0.0
    hooks:
      - id: entro-scan

CI/CD Integration

GitHub Actions

See .github/workflows/ci.yml for a complete example that runs entro-scan on every push and PR.

Exit Codes

  • 0: No secrets found (or scan completed successfully)
  • 1: Error (config issue, path not found)

Development

# Install dev dependencies
pip install pytest ruff

# Run tests
pytest tests/ -v

# Lint
ruff check .

# Type check
mypy entro_scan/

Why entropy?

Secrets like API keys, tokens, and passwords are typically random strings with high entropy (information density). Natural language text and code identifiers have much lower entropy. By measuring the Shannon entropy of strings in your codebase, entro-scan can flag potential secrets with high accuracy.

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entro_scan-1.0.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

entro_scan-1.0.0-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file entro_scan-1.0.0.tar.gz.

File metadata

  • Download URL: entro_scan-1.0.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for entro_scan-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b5d073426e799c645f262447376a44bd808b3a957db905f7d748da2502f8744f
MD5 7bac00c213bfacabf1b23d23bad44c39
BLAKE2b-256 b60a2294847e7f88fab635ff97e50a3b31631c6bf4e0a356d63fe7c5c3a817c1

See more details on using hashes here.

File details

Details for the file entro_scan-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: entro_scan-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for entro_scan-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 64df8d22b6c6953cc3fe8f9f78dcac002ef25bfb0812cb111b65d4c5853dcf76
MD5 9179e9e27581484dc5ee56642038bc1f
BLAKE2b-256 1f832a138317cedf69250ceb01293f141a08b69624e246517c466e81fd12d94c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page