Deep GitHub repository audit CLI with context, documentation, and security heuristics

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Motoroller

These details have not been verified by PyPI

Project description

🇺🇸 English | 🇷🇺 Русский

repo-auditor

Objective GitHub repository audit: a compact terminal card and a full Markdown report. Four balanced categories (purpose · security · quality · maturity), transparent methodology, optionally powered by gitleaks, osv-scanner, tokei, semgrep.

Pure Python 3.12+, zero third-party dependencies. Networking via urllib with gh/curl fallback; TOML via tomllib.

👋 Not a programmer but want to check a project? Open GETTING_STARTED.ru.md (Russian) — a step-by-step guide in plain language, from installing Python to reading your first report.

What you'll see

After running, a card is printed to the terminal:

┌─ vercel/turbo ──────────────────────────────── Rust · MIT · ★30258
│ Purpose          ██████████  10/10
│ Security         ██████░░░░  6/10
│ Code Quality     ██████████  10/10
│ Maturity         ██████████  10/10
├─ TOTAL: 90/100 · ~ reliable
│ Key: 0 high + 5 medium security findings (heuristic)
│ Suggestion: review .devcontainer/Dockerfile, .github/workflows/turborepo-library-release.yml
└─ full report: data/github.com/vercel__turbo/runs/2026-04-25_071849/report.ru.md

At the same time, data/.../latest/ receives:

report.ru.md — full Markdown report (8 sections, methodology, delta with previous run);
report.json — versioned JSON (schema_version) for automation;
card.txt — copy of the terminal card;
raw/ — raw tool outputs for manual drill-down.

Note: Reports are generated in Russian by default. Use --language en for English output.

Quick start

# One-off audit — no installation needed
python3 -m repo_auditor https://github.com/pallets/flask

# View the full report
cat data/github.com/pallets__flask/latest/report.ru.md

That's enough to get your first result. External scanners (gitleaks, osv-scanner, tokei, semgrep) are optional — built-in heuristics work without them.

Installation

Base requirements: Python 3.12+ and git.

# Install from PyPI (recommended)
pip install sfild-repo-auditor

# Verify
repo-audit --version

Or install from source:

git clone https://github.com/SfilD/repo-auditor.git
cd repo-auditor
pip install -e .

Also works without pip install — invoke via python3 -m repo_auditor.

Docker (alternative to local install)

# Build image
 docker build -t repo-auditor .

# Run an audit
 docker run --rm -v $(pwd)/data:/data repo-auditor \
   https://github.com/pallets/flask --output-root /data

The image is based on python:3.13-slim (Debian 12 trixie-slim) and includes all external tools (gitleaks, osv-scanner, tokei, semgrep, gh). It is single-stage by design — prioritising simplicity and fast rebuilds for audit environments over minimal runtime size.

GitHub Actions

- uses: SfilD/repo-auditor@v0.5.1
  with:
    target: 'owner/repo'
    fail-below: '60'
    format: 'json'
    exclude: 'vendor/**,*.min.js'
    clone-depth: '1'

Action outputs: total, verdict, output-path.

External tools (optional but recommended)

The more you install, the more accurate the assessment.

Tool	What it gives	Linux (apt/snap)	macOS (brew)	Cargo
`gh`	GitHub API without rate limits	`sudo apt install gh`	`brew install gh`	—
`gitleaks`	Leaked secret detection	`sudo apt install gitleaks`	`brew install gitleaks`	—
`osv-scanner`	CVE dependency scanning	— (binary from GitHub Releases)	`brew install osv-scanner`	—
`tokei`	Accurate LOC count	`sudo apt install tokei`	`brew install tokei`	`cargo install tokei`
`semgrep`	Static analysis	`pip install semgrep`	`brew install semgrep`	—

Authentication for private repos and rate-limit removal:

gh auth login                       # interactive via browser
# or
export GITHUB_TOKEN=ghp_xxxxxxxx    # personal access token

Usage

repo-audit <target> [flags]

Target:

owner/repo — short form
https://github.com/owner/repo — repository
https://github.com/owner — organization or user

Flags:

Flag	Purpose
`--output-root PATH`	Storage root (default `./data`)
`--quick`	`clone_depth=50`, skip `osv-scanner` and `semgrep` (2–3× faster)
`--no-external-tools`	Built-in heuristics only (if nothing is installed)
`--format {card,json,sarif}`	Output format (`card` by default)
`--json-only`	Alias for `--format json` (legacy, still works)
`--language {ru,en}`	Report language (`ru` by default)
`--exclude PATTERN`	Skip files/dirs matching glob (repeatable)
`--clone-depth N`	Git clone depth (overrides `--quick`)
`--max-workers N`	Max parallel collector workers (`auto` by default)
`--fail-below N`	CI gate: exit 4 if total < N (0..100)
`--no-raw-tool-output`	Don't save raw outputs to `raw/`
`--config-file PATH`	Path to `.repo-auditor.toml` (overrides auto-discovery)
`--ignore-repo-config`	Ignore `.repo-auditor.toml` in the audited repository
`--org-mode {primary,multiple,off}`	Behavior for organization URLs
`--org-limit N`	How many repos in `multiple` mode (default 5)
`--keep-last N`	Keep N latest runs after audit
`--gc --keep-last N`	Standalone GC: apply retention to all slugs in `index.json`
`--doctor`	Environment diagnostics (tool versions)
`--version`	Version

Exit codes:

0 — success
1 — clone/collection error
2 — invalid target / --gc without --keep-last
3 — organization URL with --org-mode off
4 — total below --fail-below (CI gate)
5 — no suitable repos in organization (all forks/archived)
6 — regression detected (--since)

Repository configuration

repo-auditor looks for .repo-auditor.toml in the root of the cloned repository. It can contain scoring overrides and security suppression rules.

Security suppressions

You can suppress known-false-positive security findings with [[security.suppress]] tables:

[[security.suppress]]
tool = "gitleaks"
path_pattern = "tests/fixtures/*"
check_id_prefix = "generic-api-key"

Each rule matches when all specified fields match a finding. Fields act as wildcards when omitted:

tool — scanner name (gitleaks, osv-scanner, semgrep)
path_pattern — glob pattern against the file path
check_id_prefix — prefix of the check/rule identifier

Trust boundary

.repo-auditor.toml inside the audited repository is repo-local config. When you audit a third-party repository you do not control, that config is untrusted — the repository could hide findings via local suppressions.

For high-trust audits of third-party repositories, use --ignore-repo-config to skip loading .repo-auditor.toml from the cloned repository. Scoring overrides and local [[security.suppress]] rules will both be ignored.

Trusted suppressions passed programmatically via AuditConfig.security_suppressions are always honored and remain separate from repo-local suppressions.

Where to find results

After a run, the report lives here:

data/github.com/<owner>__<repo>/latest/report.ru.md

Handy commands:

# Latest report
cat data/github.com/pallets__flask/latest/report.ru.md

# Just the JSON score section (for scripts)
jq '.score' data/github.com/pallets__flask/latest/report.json

# List all audited repos
jq '.entries[] | "\(.slug)\t\(.total)/100\t\(.verdict)"' data/index.json

# Open in editor
$EDITOR data/github.com/pallets__flask/latest/report.ru.md

Scoring explained

Categories (each 0..10)

Category	What counts
Purpose	README, Installation/Usage sections, ARCHITECTURE, CONTRIBUTING
Security	High/medium findings from `gitleaks` + `osv-scanner`, SECURITY.md, committed `.env`
Code Quality	CI, tests, linter, type checking, TODO density
Maturity	License, releases, contributors, stars, activity ≤ 90 days

Total = sum(categories) × 100 / 40. Red flag = any category ≤ 3.

Verdicts

Condition	Verdict	Meaning
Any category ≤ 3 and total < 40	not-recommended	Critical gap and overall weak
Any category ≤ 3 and total ≥ 40	caution	Mostly OK, but one block is weak
No red flags, total ≥ 80	reliable	Strong across all categories
No red flags, total ≥ 60	working	Viable, no obvious gaps
No red flags, total ≥ 40	caution	Medium level, needs attention
No red flags, total < 40	not-recommended	Weak overall

Storage layout

data/
  github.com/
    <owner>__<repo>/
      repo/                       ← git clone, reused via git fetch
      runs/
        YYYY-MM-DD_HHMMSS/        ← UTC timestamp
          report.ru.md
          report.json
          card.txt
          raw/
            gitleaks.json
            osv-scanner.json
            tokei.json
      latest -> runs/<ts>
    <org>__org/
      runs/...
  index.json                      ← aggregate entry per slug

Retention

runs/ grow linearly — each run writes a new UTC directory. Cleanup:

--keep-last N — after a successful audit, keep N latest runs, delete older ones.
--gc --keep-last N — standalone pass over index.json: apply retention to all slugs at once (without a new audit).

Invariant: the target of the latest symlink is never deleted.

# Clean entire storage, keeping 3 latest runs per repo
repo-audit --gc --keep-last 3

Examples

# Basic audit
python3 -m repo_auditor https://github.com/pallets/flask

# Fast run without heavy external tools
python3 -m repo_auditor pallets/flask --quick

# Built-in heuristics only (no gitleaks/osv/tokei)
python3 -m repo_auditor pallets/flask --no-external-tools

# Primary repo of an organization
python3 -m repo_auditor https://github.com/amnezia-vpn

# Top-3 by stars in an organization
python3 -m repo_auditor https://github.com/amnezia-vpn \
    --org-mode multiple --org-limit 3

# JSON to stdout for pipelines
python3 -m repo_auditor pallets/flask --format json | jq '.score.total'

# SARIF for GitHub Advanced Security
python3 -m repo_auditor pallets/flask --format sarif --output-root ./sarif

# Exclude vendor and minified assets
python3 -m repo_auditor pallets/flask --exclude 'vendor/**' --exclude '*.min.js'

# CI gate: fail if total < 60
python3 -m repo_auditor pallets/flask --fail-below 60 || echo "Audit failed"

# With retention — keep only 5 latest runs
python3 -m repo_auditor pallets/flask --keep-last 5

Auditing large repositories

Repositories like torvalds/linux, chromium/chromium, or semgrep/semgrep can take a long time to clone and consume significant disk space.

Approach	Command	When to use
Fast audit (recommended)	`--quick`	Shallow clone (`depth=50`) and skip `osv-scanner` + `semgrep` (2–3× faster)
Shallow clone	`--clone-depth 1`	CI pipelines where you only need the latest snapshot
No clone	`--no-external-tools`	Skip clone entirely; relies on GitHub API only (poorer card, but instant)

# Example: audit a very large repo without downloading full history
python3 -m repo_auditor torvalds/linux --quick

# CI pipeline: minimal clone, fail gate
python3 -m repo_auditor owner/repo --clone-depth 1 --fail-below 60

Tip: semgrep/semgrep clones at ~170 MB. --quick brings it down to ~50 MB and cuts scan time by half because semgrep and osv-scanner are skipped.

Troubleshooting

gh: command not found or 403 rate-limit GitHub API without auth = 60 requests/hour. Install gh (gh auth login) or export GITHUB_TOKEN.

fatal: could not read Username for 'https://github.com' (private repo) Auth required. gh auth login or git config --global credential.helper store + GITHUB_TOKEN.

Score seems low for model-card / dataset repos The rubric is geared toward software projects. Repos without tests/, CI, pyproject.toml will score low on quality. This is by design.

tool unavailable for osv-scanner/tokei/gitleaks in JSON Scanner not installed — pipeline falls back to built-in heuristics. For full accuracy install the tools (see table above) or run with --no-external-tools to suppress warnings.

Clone hangs / consumes a lot of disk Use --quick (clone_depth=50). Large repos (100+ MB) can be audited without cloning via --no-external-tools + GitHub API only — but the card will be poorer (no file-walk findings).

Card breaks on CJK/emoji in owner/name Known limitation: alignment is character-based, not cell-width. Use --format json for machine-readable output without the visual card.

Want just the header without the full report --format json | jq '.score' or cat data/.../latest/card.txt.

Limitations

Does not replace manual security review or CodeQL.
Heuristic mode (without gitleaks) may produce rare false positives — e.g. on 12-character PowerShell cmdlets ('Get-Location' after pwd:).
Without tokei, todo_density is approximated via source LOC from code_analysis — lower accuracy.
Monorepo with nested manifests: each manifest is counted separately, no aggregation across "total" dependencies.
HTML fallback for organization listing is not implemented — requires working gh or GitHub REST API.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Motoroller

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.0

May 25, 2026

1.0.0

May 25, 2026

0.9.2

May 23, 2026

0.9.1

May 23, 2026

0.9.0

May 22, 2026

0.8.1

May 21, 2026

0.8.0

May 21, 2026

0.7.0

May 18, 2026

0.6.1

May 8, 2026

0.6.0

May 7, 2026

0.5.4

May 6, 2026

0.5.3

May 6, 2026

This version

0.5.2

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sfild_repo_auditor-0.5.2.tar.gz (192.9 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sfild_repo_auditor-0.5.2-py3-none-any.whl (100.3 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file sfild_repo_auditor-0.5.2.tar.gz.

File metadata

Download URL: sfild_repo_auditor-0.5.2.tar.gz
Upload date: May 6, 2026
Size: 192.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for sfild_repo_auditor-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`07c415ee9e3c64ca4e6f9298345b80c6ff2685c0ba278a706ff5858c7aae017e`
MD5	`301c13e3da74b7e9a3c7a4eee6d35792`
BLAKE2b-256	`05ab8849e3134ace95f4a3ddd68526802e69f4f7fe0e72d49a5c9488a6f903f6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfild_repo_auditor-0.5.2.tar.gz:

Publisher: pypi-publish.yml on SfilD/repo-auditor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sfild_repo_auditor-0.5.2.tar.gz
- Subject digest: 07c415ee9e3c64ca4e6f9298345b80c6ff2685c0ba278a706ff5858c7aae017e
- Sigstore transparency entry: 1448327236
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: SfilD/repo-auditor@b7376fbc201d972d0a1fa48622e53f82b551f0c0
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/SfilD
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@b7376fbc201d972d0a1fa48622e53f82b551f0c0
- Trigger Event: push

File details

Details for the file sfild_repo_auditor-0.5.2-py3-none-any.whl.

File metadata

Download URL: sfild_repo_auditor-0.5.2-py3-none-any.whl
Upload date: May 6, 2026
Size: 100.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for sfild_repo_auditor-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e442e75ab780fd8fcf82732efce0f29ccb6853440c0af0793347adbeafa8f79b`
MD5	`94fe7a2b12c55e031a15faf917312d6d`
BLAKE2b-256	`85d4fb89697ded8e1c06219191e9702b8bf3901cb06599b2c273e00fbc5a8f0a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfild_repo_auditor-0.5.2-py3-none-any.whl:

Publisher: pypi-publish.yml on SfilD/repo-auditor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sfild_repo_auditor-0.5.2-py3-none-any.whl
- Subject digest: e442e75ab780fd8fcf82732efce0f29ccb6853440c0af0793347adbeafa8f79b
- Sigstore transparency entry: 1448327312
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: SfilD/repo-auditor@b7376fbc201d972d0a1fa48622e53f82b551f0c0
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/SfilD
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@b7376fbc201d972d0a1fa48622e53f82b551f0c0
- Trigger Event: push

sfild-repo-auditor 0.5.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

repo-auditor

What you'll see

Quick start

Installation

Docker (alternative to local install)

GitHub Actions

External tools (optional but recommended)

Usage

Repository configuration

Security suppressions

Trust boundary

Where to find results

Scoring explained

Categories (each 0..10)

Verdicts

Storage layout

Retention

Examples

Auditing large repositories

Troubleshooting

Limitations

See also

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance