Skip to main content

Deep GitHub repository audit CLI with context, documentation, and security heuristics

Project description

๐Ÿ‡บ๐Ÿ‡ธ English | ๐Ÿ‡ท๐Ÿ‡บ ะ ัƒััะบะธะน

PyPI Python License

repo-auditor

Objective GitHub repository audit: a compact terminal card and a full Markdown report. Four balanced categories (purpose ยท security ยท quality ยท maturity), transparent methodology, optionally powered by gitleaks, osv-scanner, tokei, semgrep.

Pure Python 3.12+, zero third-party dependencies. Networking via urllib with gh/curl fallback; TOML via tomllib.

๐Ÿ‘‹ Not a programmer but want to check a project? Open GETTING_STARTED.ru.md (Russian) โ€” a step-by-step guide in plain language, from installing Python to reading your first report.


What you'll see

After running, a card is printed to the terminal:

โ”Œโ”€ vercel/turbo โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Rust ยท MIT ยท โ˜…30258
โ”‚ Purpose          โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  10/10
โ”‚ Security         โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  6/10
โ”‚ Code Quality     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  10/10
โ”‚ Maturity         โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  10/10
โ”œโ”€ TOTAL: 90/100 ยท ~ reliable
โ”‚ Key: 0 high + 5 medium security findings (heuristic)
โ”‚ Suggestion: review .devcontainer/Dockerfile, .github/workflows/turborepo-library-release.yml
โ””โ”€ full report: data/github.com/vercel__turbo/runs/2026-04-25_071849/report.ru.md

At the same time, data/.../latest/ receives:

  • report.ru.md โ€” full Markdown report (8 sections, methodology, delta with previous run);
  • report.json โ€” versioned JSON (schema_version) for automation;
  • card.txt โ€” copy of the terminal card;
  • raw/ โ€” raw tool outputs for manual drill-down.

Note: Reports are generated in Russian by default. Use --language en for English output.


Quick start

# One-off audit โ€” no installation needed
python3 -m repo_auditor https://github.com/pallets/flask

# View the full report
cat data/github.com/pallets__flask/latest/report.ru.md

That's enough to get your first result. External scanners (gitleaks, osv-scanner, tokei, semgrep) are optional โ€” built-in heuristics work without them.


Installation

Base requirements: Python 3.12+ and git.

# Install from PyPI (recommended)
pip install sfild-repo-auditor

# Verify
repo-audit --version

Or install from source:

git clone https://github.com/SfilD/repo-auditor.git
cd repo-auditor
pip install -e .

Also works without pip install โ€” invoke via python3 -m repo_auditor.

Docker (alternative to local install)

# Build image
 docker build -t repo-auditor .

# Run an audit
 docker run --rm -v $(pwd)/data:/data repo-auditor \
   https://github.com/pallets/flask --output-root /data

The image is based on python:3.13-slim (Debian 12 trixie-slim) and includes all external tools (gitleaks, osv-scanner, tokei, semgrep, gh). It is single-stage by design โ€” prioritising simplicity and fast rebuilds for audit environments over minimal runtime size.

GitHub Actions

- uses: SfilD/repo-auditor@v0.9.2
  with:
    target: 'owner/repo'
    fail-below: '60'
    format: 'json'
    exclude: 'vendor/**,*.min.js'
    clone-depth: '1'

Action outputs: total, verdict, output-path.

External tools (optional but recommended)

The more you install, the more accurate the assessment.

Tool What it gives Linux (apt/snap) macOS (brew) Cargo
gh GitHub API without rate limits sudo apt install gh brew install gh โ€”
gitleaks Leaked secret detection sudo apt install gitleaks brew install gitleaks โ€”
osv-scanner CVE dependency scanning โ€” (binary from GitHub Releases) brew install osv-scanner โ€”
tokei Accurate LOC count sudo apt install tokei brew install tokei cargo install tokei
semgrep Static analysis pip install semgrep brew install semgrep โ€”

Authentication for private repos and rate-limit removal:

gh auth login                       # interactive via browser
# or
export GITHUB_TOKEN=ghp_xxxxxxxx    # personal access token

Adapter prerequisites

v0.8.0 adds optional adapters that normalize external-tool signals into the report. They do not affect the four-category score in this release.

Adapter What it needs Install
OpenSSF Scorecard scorecard binary on PATH + GITHUB_AUTH_TOKEN env var go install github.com/ossf/scorecard/v5/cmd/scorecard@latest
GitHub Community Profile Existing GitHub client (no extra install) โ€”

Token for Scorecard: any GitHub personal access token with public_repo scope (repo scope for private repositories). Export it as GITHUB_AUTH_TOKEN.

Both adapters are optional. If prerequisites are missing, the adapter returns UNAVAILABLE and the audit continues. Use --no-external-tools to disable all adapters.


MCP server

repo-auditor can run as an MCP (Model Context Protocol) server, exposing the audit pipeline to AI agents such as Claude Desktop, Claude Code, Cline, and Continue.

This is opt-in โ€” the default CLI install has zero extra dependencies.

pip install 'sfild-repo-auditor[mcp]'

Launch the server:

repo-audit-mcp

Then wire it into your MCP client. See docs/mcp.md for full setup instructions, tool reference, and trust-model notes.


Usage

repo-audit <target> [flags]

Target:

  • owner/repo โ€” short form
  • https://github.com/owner/repo โ€” repository
  • https://github.com/owner โ€” organization or user

Flags:

Flag Purpose
--output-root PATH Storage root (default ./data)
--quick clone_depth=50, skip osv-scanner and semgrep (2โ€“3ร— faster)
--no-external-tools Built-in heuristics only (if nothing is installed)
--format {card,json,sarif} Output format (card by default)
--json-only Alias for --format json (legacy, still works)
--language {ru,en} Report language (ru by default)
--plugin PATH External plugin executable (repeatable). Trusted: honoured even with --ignore-repo-config
--allow-repo-plugins Allow repo-local executable plugins from .repo-auditor.toml. Disabled by default for security
--exclude PATTERN Skip files/dirs matching glob (repeatable)
--clone-depth N Git clone depth (overrides --quick)
--max-workers N Max parallel collector workers (auto by default)
--fail-below N CI gate: exit 4 if total < N (0..100)
--no-raw-tool-output Don't save raw outputs to raw/
--config-file PATH Path to .repo-auditor.toml (overrides auto-discovery)
--ignore-repo-config Ignore .repo-auditor.toml in the audited repository
--org-mode {primary,multiple,off} Behavior for organization URLs
--org-limit N How many repos in multiple mode (default 5)
--keep-last N Keep N latest runs after audit
--gc --keep-last N Standalone GC: apply retention to all slugs in index.json
--doctor Environment diagnostics (tool versions)
--version Version

Exit codes:

  • 0 โ€” success
  • 1 โ€” clone/collection error
  • 2 โ€” invalid target / --gc without --keep-last
  • 3 โ€” organization URL with --org-mode off
  • 4 โ€” total below --fail-below (CI gate)
  • 5 โ€” no suitable repos in organization (all forks/archived)
  • 6 โ€” regression detected (--since)
  • 8 โ€” policy threshold breach (--fail-on); if both 4 and 8 trigger, 4 wins

Repository configuration

repo-auditor looks for .repo-auditor.toml in the root of the cloned repository. It can contain scoring overrides and security suppression rules.

Security suppressions

You can suppress known-false-positive security findings with [[security.suppress]] tables:

[[security.suppress]]
tool = "gitleaks"
path_pattern = "tests/fixtures/*"
check_id_prefix = "generic-api-key"

Each rule matches when all specified fields match a finding. Fields act as wildcards when omitted:

  • tool โ€” scanner name (gitleaks, osv-scanner, semgrep)
  • path_pattern โ€” glob pattern against the file path
  • check_id_prefix โ€” prefix of the check/rule identifier

Trust boundary

.repo-auditor.toml inside the audited repository is repo-local config. When you audit a third-party repository you do not control, that config is untrusted โ€” the repository could hide findings via local suppressions or override scoring.

For high-trust audits of third-party repositories, use --ignore-repo-config to skip loading .repo-auditor.toml from the cloned repository. This disables all repo-local config, including:

  • scoring overrides;
  • [[security.suppress]] rules;
  • repo-local executable [[plugins]];
  • [[scoring.plugins]] rules.

Repo-local executable plugins execute code and remain disabled by default unless --allow-repo-plugins is provided. --ignore-repo-config still wins even when --allow-repo-plugins is passed.

Trusted suppressions passed programmatically via AuditConfig.security_suppressions are always honored and remain separate from repo-local suppressions. Trusted CLI plugins from --plugin PATH are also always honored.

Plugin scoring documentation is in docs/plugins.md.


Where to find results

After a run, the report lives here:

data/github.com/<owner>__<repo>/latest/report.ru.md

Handy commands:

# Latest report
cat data/github.com/pallets__flask/latest/report.ru.md

# Just the JSON score section (for scripts)
jq '.score' data/github.com/pallets__flask/latest/report.json

# List all audited repos
jq '.entries[] | "\(.slug)\t\(.total)/100\t\(.verdict)"' data/index.json

# Open in editor
$EDITOR data/github.com/pallets__flask/latest/report.ru.md

Policy scoring (v0.9.0+)

v0.9.0 introduces optional policy profiles that map adapter evidence (OpenSSF Scorecard, GitHub Community Profile) into bounded per-category contributions. The default profile preserves backward compatibility; use --profile to opt in.

See docs/scoring.md for profiles, trust matrix, CI gating with --fail-on, and migration notes.


Scoring explained

Categories (each 0..10)

Category What counts
Purpose README, Installation/Usage sections, ARCHITECTURE, CONTRIBUTING
Security High/medium findings from gitleaks + osv-scanner, SECURITY.md, committed .env
Code Quality CI, tests, linter, type checking, TODO density
Maturity License, releases, contributors, stars, activity โ‰ค 90 days

Total = sum(categories) ร— 100 / 40. Red flag = any category โ‰ค 3.

Verdicts

Condition Verdict Meaning
Any category โ‰ค 3 and total < 40 not-recommended Critical gap and overall weak
Any category โ‰ค 3 and total โ‰ฅ 40 caution Mostly OK, but one block is weak
No red flags, total โ‰ฅ 80 reliable Strong across all categories
No red flags, total โ‰ฅ 60 working Viable, no obvious gaps
No red flags, total โ‰ฅ 40 caution Medium level, needs attention
No red flags, total < 40 not-recommended Weak overall

Storage layout

data/
  github.com/
    <owner>__<repo>/
      repo/                       โ† git clone, reused via git fetch
      runs/
        YYYY-MM-DD_HHMMSS/        โ† UTC timestamp
          report.ru.md
          report.json
          card.txt
          raw/
            gitleaks.json
            osv-scanner.json
            tokei.json
      latest -> runs/<ts>
    <org>__org/
      runs/...
  index.json                      โ† aggregate entry per slug

Retention

runs/ grow linearly โ€” each run writes a new UTC directory. Cleanup:

  • --keep-last N โ€” after a successful audit, keep N latest runs, delete older ones.
  • --gc --keep-last N โ€” standalone pass over index.json: apply retention to all slugs at once (without a new audit).

Invariant: the target of the latest symlink is never deleted.

# Clean entire storage, keeping 3 latest runs per repo
repo-audit --gc --keep-last 3

Examples

# Basic audit
python3 -m repo_auditor https://github.com/pallets/flask

# Fast run without heavy external tools
python3 -m repo_auditor pallets/flask --quick

# Built-in heuristics only (no gitleaks/osv/tokei)
python3 -m repo_auditor pallets/flask --no-external-tools

# Primary repo of an organization
python3 -m repo_auditor https://github.com/amnezia-vpn

# Top-3 by stars in an organization
python3 -m repo_auditor https://github.com/amnezia-vpn \
    --org-mode multiple --org-limit 3

# JSON to stdout for pipelines
python3 -m repo_auditor pallets/flask --format json | jq '.score.total'

# SARIF for GitHub Advanced Security
python3 -m repo_auditor pallets/flask --format sarif --output-root ./sarif

# Exclude vendor and minified assets
python3 -m repo_auditor pallets/flask --exclude 'vendor/**' --exclude '*.min.js'

# CI gate: fail if total < 60
python3 -m repo_auditor pallets/flask --fail-below 60 || echo "Audit failed"

# With retention โ€” keep only 5 latest runs
python3 -m repo_auditor pallets/flask --keep-last 5

Auditing large repositories

Repositories like torvalds/linux, chromium/chromium, or semgrep/semgrep can take a long time to clone and consume significant disk space.

Approach Command When to use
Fast audit (recommended) --quick Shallow clone (depth=50) and skip osv-scanner + semgrep (2โ€“3ร— faster)
Shallow clone --clone-depth 1 CI pipelines where you only need the latest snapshot
No clone --no-external-tools Skip clone entirely; relies on GitHub API only (poorer card, but instant)
# Example: audit a very large repo without downloading full history
python3 -m repo_auditor torvalds/linux --quick

# CI pipeline: minimal clone, fail gate
python3 -m repo_auditor owner/repo --clone-depth 1 --fail-below 60

Tip: semgrep/semgrep clones at ~170 MB. --quick brings it down to ~50 MB and cuts scan time by half because semgrep and osv-scanner are skipped.


Troubleshooting

gh: command not found or 403 rate-limit GitHub API without auth = 60 requests/hour. Install gh (gh auth login) or export GITHUB_TOKEN.

fatal: could not read Username for 'https://github.com' (private repo) Auth required. gh auth login or git config --global credential.helper store + GITHUB_TOKEN.

Score seems low for model-card / dataset repos The rubric is geared toward software projects. Repos without tests/, CI, pyproject.toml will score low on quality. This is by design.

tool unavailable for osv-scanner/tokei/gitleaks in JSON Scanner not installed โ€” pipeline falls back to built-in heuristics. For full accuracy install the tools (see table above) or run with --no-external-tools to suppress warnings.

Clone hangs / consumes a lot of disk Use --quick (clone_depth=50). Large repos (100+ MB) can be audited without cloning via --no-external-tools + GitHub API only โ€” but the card will be poorer (no file-walk findings).

Card breaks on CJK/emoji in owner/name Known limitation: alignment is character-based, not cell-width. Use --format json for machine-readable output without the visual card.

Want just the header without the full report --format json | jq '.score' or cat data/.../latest/card.txt.


Limitations

  • Does not replace manual security review or CodeQL.
  • Heuristic mode (without gitleaks) may produce rare false positives โ€” e.g. on 12-character PowerShell cmdlets ('Get-Location' after pwd:).
  • Without tokei, todo_density is approximated via source LOC from code_analysis โ€” lower accuracy.
  • Monorepo with nested manifests: each manifest is counted separately, no aggregation across "total" dependencies.
  • HTML fallback for organization listing is not implemented โ€” requires working gh or GitHub REST API.

See also

  • README.ru.md โ€” Russian version of this README.
  • GETTING_STARTED.ru.md โ€” step-by-step guide for non-programmers (Russian).
  • AUDIT_GUIDE.md โ€” for external auditors / AI tools before review (load-bearing constraints, hot zones, regression baseline).
  • ARCHITECTURE.md โ€” 30-second overview of layers and data flow.
  • ROADMAP.md โ€” public direction of development.
  • SECURITY.md โ€” disclosure policy.
  • CHANGELOG.md โ€” release history.
  • CONTRIBUTING.md โ€” dev cycle, conventional commits, stdlib-only rule.
  • CLAUDE.md โ€” architecture, data flow, scoring rules, false-positive filters (for contributors and Claude Code).
  • LICENSE โ€” MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sfild_repo_auditor-0.9.2.tar.gz (251.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sfild_repo_auditor-0.9.2-py3-none-any.whl (131.7 kB view details)

Uploaded Python 3

File details

Details for the file sfild_repo_auditor-0.9.2.tar.gz.

File metadata

  • Download URL: sfild_repo_auditor-0.9.2.tar.gz
  • Upload date:
  • Size: 251.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for sfild_repo_auditor-0.9.2.tar.gz
Algorithm Hash digest
SHA256 7d43ed895364a3c8acd90ffbe80f43140940f66ea720415bd6af0e754afe29b9
MD5 c23b09be0340df31f124013bb92a337e
BLAKE2b-256 7dfe01a8ba4a99548f3a612442fdab21d650f2b8f8cf3b487e8e1906db99813a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfild_repo_auditor-0.9.2.tar.gz:

Publisher: pypi-publish.yml on SfilD/repo-auditor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sfild_repo_auditor-0.9.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sfild_repo_auditor-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 169307c47e382d3d382460d1a8acf529cc8dc332a7c7a883c4f98d528eae2b41
MD5 6466988f8d8f3f765d9817ebb7218919
BLAKE2b-256 7aea33dbd2d6f9b34dc79bb94dd3a05f4fbc9c44951021b1869350013dd696fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfild_repo_auditor-0.9.2-py3-none-any.whl:

Publisher: pypi-publish.yml on SfilD/repo-auditor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page