Skip to main content

Deep GitHub repository audit CLI with context, documentation, and security heuristics

Project description

๐Ÿ‡บ๐Ÿ‡ธ English | ๐Ÿ‡ท๐Ÿ‡บ ะ ัƒััะบะธะน

repo-auditor

Objective GitHub repository audit: a compact terminal card and a full Markdown report. Four balanced categories (purpose ยท security ยท quality ยท maturity), transparent methodology, optionally powered by gitleaks, osv-scanner, tokei, semgrep.

Pure Python 3.12+, zero third-party dependencies. Networking via urllib with gh/curl fallback; TOML via tomllib.

๐Ÿ‘‹ Not a programmer but want to check a project? Open GETTING_STARTED.ru.md (Russian) โ€” a step-by-step guide in plain language, from installing Python to reading your first report.


What you'll see

After running, a card is printed to the terminal:

โ”Œโ”€ vercel/turbo โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Rust ยท MIT ยท โ˜…30258
โ”‚ Purpose          โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  10/10
โ”‚ Security         โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  6/10
โ”‚ Code Quality     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  10/10
โ”‚ Maturity         โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  10/10
โ”œโ”€ TOTAL: 90/100 ยท ~ reliable
โ”‚ Key: 0 high + 5 medium security findings (heuristic)
โ”‚ Suggestion: review .devcontainer/Dockerfile, .github/workflows/turborepo-library-release.yml
โ””โ”€ full report: data/github.com/vercel__turbo/runs/2026-04-25_071849/report.ru.md

At the same time, data/.../latest/ receives:

  • report.ru.md โ€” full Markdown report (8 sections, methodology, delta with previous run);
  • report.json โ€” versioned JSON (schema_version) for automation;
  • card.txt โ€” copy of the terminal card;
  • raw/ โ€” raw tool outputs for manual drill-down.

Note: Reports are generated in Russian by default. Use --language en for English output.


Quick start

# One-off audit โ€” no installation needed
python3 -m repo_auditor https://github.com/pallets/flask

# View the full report
cat data/github.com/pallets__flask/latest/report.ru.md

That's enough to get your first result. External scanners (gitleaks, osv-scanner, tokei, semgrep) are optional โ€” built-in heuristics work without them.


Installation

Base requirements: Python 3.12+ and git.

# Install from PyPI (recommended)
pip install sfild-repo-auditor

# Verify
repo-audit --version

Or install from source:

git clone https://github.com/SfilD/repo-auditor.git
cd repo-auditor
pip install -e .

Also works without pip install โ€” invoke via python3 -m repo_auditor.

Docker (alternative to local install)

# Build image
 docker build -t repo-auditor .

# Run an audit
 docker run --rm -v $(pwd)/data:/data repo-auditor \
   https://github.com/pallets/flask --output-root /data

The image is based on python:3.13-slim (Debian 12 trixie-slim) and includes all external tools (gitleaks, osv-scanner, tokei, semgrep, gh). It is single-stage by design โ€” prioritising simplicity and fast rebuilds for audit environments over minimal runtime size.

GitHub Actions

- uses: SfilD/repo-auditor@v0.5.1
  with:
    target: 'owner/repo'
    fail-below: '60'
    format: 'json'
    exclude: 'vendor/**,*.min.js'
    clone-depth: '1'

Action outputs: total, verdict, output-path.

External tools (optional but recommended)

The more you install, the more accurate the assessment.

Tool What it gives Linux (apt/snap) macOS (brew) Cargo
gh GitHub API without rate limits sudo apt install gh brew install gh โ€”
gitleaks Leaked secret detection sudo apt install gitleaks brew install gitleaks โ€”
osv-scanner CVE dependency scanning โ€” (binary from GitHub Releases) brew install osv-scanner โ€”
tokei Accurate LOC count sudo apt install tokei brew install tokei cargo install tokei
semgrep Static analysis pip install semgrep brew install semgrep โ€”

Authentication for private repos and rate-limit removal:

gh auth login                       # interactive via browser
# or
export GITHUB_TOKEN=ghp_xxxxxxxx    # personal access token

Usage

repo-audit <target> [flags]

Target:

  • owner/repo โ€” short form
  • https://github.com/owner/repo โ€” repository
  • https://github.com/owner โ€” organization or user

Flags:

Flag Purpose
--output-root PATH Storage root (default ./data)
--quick clone_depth=50, skip osv-scanner and semgrep (2โ€“3ร— faster)
--no-external-tools Built-in heuristics only (if nothing is installed)
--format {card,json,sarif} Output format (card by default)
--json-only Alias for --format json (legacy, still works)
--language {ru,en} Report language (ru by default)
--exclude PATTERN Skip files/dirs matching glob (repeatable)
--clone-depth N Git clone depth (overrides --quick)
--max-workers N Max parallel collector workers (auto by default)
--fail-below N CI gate: exit 4 if total < N (0..100)
--no-raw-tool-output Don't save raw outputs to raw/
--config-file PATH Path to .repo-auditor.toml (overrides auto-discovery)
--ignore-repo-config Ignore .repo-auditor.toml in the audited repository
--org-mode {primary,multiple,off} Behavior for organization URLs
--org-limit N How many repos in multiple mode (default 5)
--keep-last N Keep N latest runs after audit
--gc --keep-last N Standalone GC: apply retention to all slugs in index.json
--doctor Environment diagnostics (tool versions)
--version Version

Exit codes:

  • 0 โ€” success
  • 1 โ€” clone/collection error
  • 2 โ€” invalid target / --gc without --keep-last
  • 3 โ€” organization URL with --org-mode off
  • 4 โ€” total below --fail-below (CI gate)
  • 5 โ€” no suitable repos in organization (all forks/archived)
  • 6 โ€” regression detected (--since)

Repository configuration

repo-auditor looks for .repo-auditor.toml in the root of the cloned repository. It can contain scoring overrides and security suppression rules.

Security suppressions

You can suppress known-false-positive security findings with [[security.suppress]] tables:

[[security.suppress]]
tool = "gitleaks"
path_pattern = "tests/fixtures/*"
check_id_prefix = "generic-api-key"

Each rule matches when all specified fields match a finding. Fields act as wildcards when omitted:

  • tool โ€” scanner name (gitleaks, osv-scanner, semgrep)
  • path_pattern โ€” glob pattern against the file path
  • check_id_prefix โ€” prefix of the check/rule identifier

Trust boundary

.repo-auditor.toml inside the audited repository is repo-local config. When you audit a third-party repository you do not control, that config is untrusted โ€” the repository could hide findings via local suppressions.

For high-trust audits of third-party repositories, use --ignore-repo-config to skip loading .repo-auditor.toml from the cloned repository. Scoring overrides and local [[security.suppress]] rules will both be ignored.

Trusted suppressions passed programmatically via AuditConfig.security_suppressions are always honored and remain separate from repo-local suppressions.


Where to find results

After a run, the report lives here:

data/github.com/<owner>__<repo>/latest/report.ru.md

Handy commands:

# Latest report
cat data/github.com/pallets__flask/latest/report.ru.md

# Just the JSON score section (for scripts)
jq '.score' data/github.com/pallets__flask/latest/report.json

# List all audited repos
jq '.entries[] | "\(.slug)\t\(.total)/100\t\(.verdict)"' data/index.json

# Open in editor
$EDITOR data/github.com/pallets__flask/latest/report.ru.md

Scoring explained

Categories (each 0..10)

Category What counts
Purpose README, Installation/Usage sections, ARCHITECTURE, CONTRIBUTING
Security High/medium findings from gitleaks + osv-scanner, SECURITY.md, committed .env
Code Quality CI, tests, linter, type checking, TODO density
Maturity License, releases, contributors, stars, activity โ‰ค 90 days

Total = sum(categories) ร— 100 / 40. Red flag = any category โ‰ค 3.

Verdicts

Condition Verdict Meaning
Any category โ‰ค 3 and total < 40 not-recommended Critical gap and overall weak
Any category โ‰ค 3 and total โ‰ฅ 40 caution Mostly OK, but one block is weak
No red flags, total โ‰ฅ 80 reliable Strong across all categories
No red flags, total โ‰ฅ 60 working Viable, no obvious gaps
No red flags, total โ‰ฅ 40 caution Medium level, needs attention
No red flags, total < 40 not-recommended Weak overall

Storage layout

data/
  github.com/
    <owner>__<repo>/
      repo/                       โ† git clone, reused via git fetch
      runs/
        YYYY-MM-DD_HHMMSS/        โ† UTC timestamp
          report.ru.md
          report.json
          card.txt
          raw/
            gitleaks.json
            osv-scanner.json
            tokei.json
      latest -> runs/<ts>
    <org>__org/
      runs/...
  index.json                      โ† aggregate entry per slug

Retention

runs/ grow linearly โ€” each run writes a new UTC directory. Cleanup:

  • --keep-last N โ€” after a successful audit, keep N latest runs, delete older ones.
  • --gc --keep-last N โ€” standalone pass over index.json: apply retention to all slugs at once (without a new audit).

Invariant: the target of the latest symlink is never deleted.

# Clean entire storage, keeping 3 latest runs per repo
repo-audit --gc --keep-last 3

Examples

# Basic audit
python3 -m repo_auditor https://github.com/pallets/flask

# Fast run without heavy external tools
python3 -m repo_auditor pallets/flask --quick

# Built-in heuristics only (no gitleaks/osv/tokei)
python3 -m repo_auditor pallets/flask --no-external-tools

# Primary repo of an organization
python3 -m repo_auditor https://github.com/amnezia-vpn

# Top-3 by stars in an organization
python3 -m repo_auditor https://github.com/amnezia-vpn \
    --org-mode multiple --org-limit 3

# JSON to stdout for pipelines
python3 -m repo_auditor pallets/flask --format json | jq '.score.total'

# SARIF for GitHub Advanced Security
python3 -m repo_auditor pallets/flask --format sarif --output-root ./sarif

# Exclude vendor and minified assets
python3 -m repo_auditor pallets/flask --exclude 'vendor/**' --exclude '*.min.js'

# CI gate: fail if total < 60
python3 -m repo_auditor pallets/flask --fail-below 60 || echo "Audit failed"

# With retention โ€” keep only 5 latest runs
python3 -m repo_auditor pallets/flask --keep-last 5

Auditing large repositories

Repositories like torvalds/linux, chromium/chromium, or semgrep/semgrep can take a long time to clone and consume significant disk space.

Approach Command When to use
Fast audit (recommended) --quick Shallow clone (depth=50) and skip osv-scanner + semgrep (2โ€“3ร— faster)
Shallow clone --clone-depth 1 CI pipelines where you only need the latest snapshot
No clone --no-external-tools Skip clone entirely; relies on GitHub API only (poorer card, but instant)
# Example: audit a very large repo without downloading full history
python3 -m repo_auditor torvalds/linux --quick

# CI pipeline: minimal clone, fail gate
python3 -m repo_auditor owner/repo --clone-depth 1 --fail-below 60

Tip: semgrep/semgrep clones at ~170 MB. --quick brings it down to ~50 MB and cuts scan time by half because semgrep and osv-scanner are skipped.


Troubleshooting

gh: command not found or 403 rate-limit GitHub API without auth = 60 requests/hour. Install gh (gh auth login) or export GITHUB_TOKEN.

fatal: could not read Username for 'https://github.com' (private repo) Auth required. gh auth login or git config --global credential.helper store + GITHUB_TOKEN.

Score seems low for model-card / dataset repos The rubric is geared toward software projects. Repos without tests/, CI, pyproject.toml will score low on quality. This is by design.

tool unavailable for osv-scanner/tokei/gitleaks in JSON Scanner not installed โ€” pipeline falls back to built-in heuristics. For full accuracy install the tools (see table above) or run with --no-external-tools to suppress warnings.

Clone hangs / consumes a lot of disk Use --quick (clone_depth=50). Large repos (100+ MB) can be audited without cloning via --no-external-tools + GitHub API only โ€” but the card will be poorer (no file-walk findings).

Card breaks on CJK/emoji in owner/name Known limitation: alignment is character-based, not cell-width. Use --format json for machine-readable output without the visual card.

Want just the header without the full report --format json | jq '.score' or cat data/.../latest/card.txt.


Limitations

  • Does not replace manual security review or CodeQL.
  • Heuristic mode (without gitleaks) may produce rare false positives โ€” e.g. on 12-character PowerShell cmdlets ('Get-Location' after pwd:).
  • Without tokei, todo_density is approximated via source LOC from code_analysis โ€” lower accuracy.
  • Monorepo with nested manifests: each manifest is counted separately, no aggregation across "total" dependencies.
  • HTML fallback for organization listing is not implemented โ€” requires working gh or GitHub REST API.

See also

  • README.ru.md โ€” Russian version of this README.
  • GETTING_STARTED.ru.md โ€” step-by-step guide for non-programmers (Russian).
  • AUDIT_GUIDE.md โ€” for external auditors / AI tools before review (load-bearing constraints, hot zones, regression baseline).
  • ARCHITECTURE.md โ€” 30-second overview of layers and data flow.
  • ROADMAP.md โ€” public direction of development.
  • SECURITY.md โ€” disclosure policy.
  • CHANGELOG.md โ€” release history.
  • CONTRIBUTING.md โ€” dev cycle, conventional commits, stdlib-only rule.
  • CLAUDE.md โ€” architecture, data flow, scoring rules, false-positive filters (for contributors and Claude Code).
  • LICENSE โ€” MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sfild_repo_auditor-0.5.2.tar.gz (192.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sfild_repo_auditor-0.5.2-py3-none-any.whl (100.3 kB view details)

Uploaded Python 3

File details

Details for the file sfild_repo_auditor-0.5.2.tar.gz.

File metadata

  • Download URL: sfild_repo_auditor-0.5.2.tar.gz
  • Upload date:
  • Size: 192.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for sfild_repo_auditor-0.5.2.tar.gz
Algorithm Hash digest
SHA256 07c415ee9e3c64ca4e6f9298345b80c6ff2685c0ba278a706ff5858c7aae017e
MD5 301c13e3da74b7e9a3c7a4eee6d35792
BLAKE2b-256 05ab8849e3134ace95f4a3ddd68526802e69f4f7fe0e72d49a5c9488a6f903f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfild_repo_auditor-0.5.2.tar.gz:

Publisher: pypi-publish.yml on SfilD/repo-auditor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sfild_repo_auditor-0.5.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sfild_repo_auditor-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e442e75ab780fd8fcf82732efce0f29ccb6853440c0af0793347adbeafa8f79b
MD5 94fe7a2b12c55e031a15faf917312d6d
BLAKE2b-256 85d4fb89697ded8e1c06219191e9702b8bf3901cb06599b2c273e00fbc5a8f0a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfild_repo_auditor-0.5.2-py3-none-any.whl:

Publisher: pypi-publish.yml on SfilD/repo-auditor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page