Deep GitHub repository audit CLI with context, documentation, and security heuristics
Project description
๐บ๐ธ English | ๐ท๐บ ะ ัััะบะธะน
repo-auditor
Objective GitHub repository audit: a compact terminal card and a full Markdown report. Four balanced categories (purpose ยท security ยท quality ยท maturity), transparent methodology, optionally powered by gitleaks, osv-scanner, tokei, semgrep.
Pure Python 3.12+, zero third-party dependencies. Networking via urllib with gh/curl fallback; TOML via tomllib.
๐ Not a programmer but want to check a project? Open GETTING_STARTED.ru.md (Russian) โ a step-by-step guide in plain language, from installing Python to reading your first report.
What you'll see
After running, a card is printed to the terminal:
โโ vercel/turbo โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Rust ยท MIT ยท โ
30258
โ Purpose โโโโโโโโโโ 10/10
โ Security โโโโโโโโโโ 6/10
โ Code Quality โโโโโโโโโโ 10/10
โ Maturity โโโโโโโโโโ 10/10
โโ TOTAL: 90/100 ยท ~ reliable
โ Key: 0 high + 5 medium security findings (heuristic)
โ Suggestion: review .devcontainer/Dockerfile, .github/workflows/turborepo-library-release.yml
โโ full report: data/github.com/vercel__turbo/runs/2026-04-25_071849/report.ru.md
At the same time, data/.../latest/ receives:
report.ru.mdโ full Markdown report (8 sections, methodology, delta with previous run);report.jsonโ versioned JSON (schema_version) for automation;card.txtโ copy of the terminal card;raw/โ raw tool outputs for manual drill-down.
Note: Reports are generated in Russian by default. Use
--language enfor English output.
Quick start
# One-off audit โ no installation needed
python3 -m repo_auditor https://github.com/pallets/flask
# View the full report
cat data/github.com/pallets__flask/latest/report.ru.md
That's enough to get your first result. External scanners (gitleaks, osv-scanner, tokei, semgrep) are optional โ built-in heuristics work without them.
Installation
Base requirements: Python 3.12+ and git.
# Install from PyPI (recommended)
pip install sfild-repo-auditor
# Verify
repo-audit --version
Or install from source:
git clone https://github.com/SfilD/repo-auditor.git
cd repo-auditor
pip install -e .
Also works without pip install โ invoke via python3 -m repo_auditor.
Docker (alternative to local install)
# Build image
docker build -t repo-auditor .
# Run an audit
docker run --rm -v $(pwd)/data:/data repo-auditor \
https://github.com/pallets/flask --output-root /data
The image is based on python:3.13-slim (Debian 12 trixie-slim) and includes
all external tools (gitleaks, osv-scanner, tokei, semgrep, gh).
It is single-stage by design โ prioritising simplicity and fast rebuilds
for audit environments over minimal runtime size.
GitHub Actions
- uses: SfilD/repo-auditor@v1.0.0
with:
target: 'owner/repo'
fail-below: '60'
format: 'json'
exclude: 'vendor/**,*.min.js'
clone-depth: '1'
Action outputs: total, verdict, output-path.
External tools (optional but recommended)
The more you install, the more accurate the assessment.
| Tool | What it gives | Linux (apt/snap) | macOS (brew) | Cargo |
|---|---|---|---|---|
gh |
GitHub API without rate limits | sudo apt install gh |
brew install gh |
โ |
gitleaks |
Leaked secret detection | sudo apt install gitleaks |
brew install gitleaks |
โ |
osv-scanner |
CVE dependency scanning | โ (binary from GitHub Releases) | brew install osv-scanner |
โ |
tokei |
Accurate LOC count | sudo apt install tokei |
brew install tokei |
cargo install tokei |
semgrep |
Static analysis | pip install semgrep |
brew install semgrep |
โ |
Authentication for private repos and rate-limit removal:
gh auth login # interactive via browser
# or
export GITHUB_TOKEN=ghp_xxxxxxxx # personal access token
Adapter prerequisites
v0.8.0 adds optional adapters that normalize external-tool signals into the report. They do not affect the four-category score in this release.
| Adapter | What it needs | Install |
|---|---|---|
| OpenSSF Scorecard | scorecard binary on PATH + GITHUB_AUTH_TOKEN env var |
go install github.com/ossf/scorecard/v5/cmd/scorecard@latest |
| GitHub Community Profile | Existing GitHub client (no extra install) | โ |
Token for Scorecard: any GitHub personal access token with public_repo scope (repo scope for private repositories). Export it as GITHUB_AUTH_TOKEN.
Both adapters are optional. If prerequisites are missing, the adapter returns UNAVAILABLE and the audit continues. Use --no-external-tools to disable all adapters.
MCP server
repo-auditor can run as an MCP (Model Context Protocol) server, exposing the audit pipeline to AI agents such as Claude Desktop, Claude Code, Cline, and Continue.
This is opt-in โ the default CLI install has zero extra dependencies.
pip install 'sfild-repo-auditor[mcp]'
Launch the server:
repo-audit-mcp
Then wire it into your MCP client. See docs/mcp.md for full setup instructions, tool reference, and trust-model notes.
Usage
repo-audit <target> [flags]
Target:
owner/repoโ short formhttps://github.com/owner/repoโ repositoryhttps://github.com/ownerโ organization or user
Flags:
| Flag | Purpose |
|---|---|
--output-root PATH |
Storage root (default ./data) |
--quick |
clone_depth=50, skip osv-scanner and semgrep (2โ3ร faster) |
--no-external-tools |
Built-in heuristics only (if nothing is installed) |
--format {card,json,sarif} |
Output format (card by default) |
--json-only |
Alias for --format json (legacy, still works) |
--language {ru,en} |
Report language (ru by default) |
--plugin PATH |
External plugin executable (repeatable). Trusted: honoured even with --ignore-repo-config |
--allow-repo-plugins |
Allow repo-local executable plugins from .repo-auditor.toml. Disabled by default for security |
--exclude PATTERN |
Skip files/dirs matching glob (repeatable) |
--clone-depth N |
Git clone depth (overrides --quick) |
--max-workers N |
Max parallel collector workers (auto by default) |
--fail-below N |
CI gate: exit 4 if total < N (0..100) |
--no-raw-tool-output |
Don't save raw outputs to raw/ |
--config-file PATH |
Path to .repo-auditor.toml (overrides auto-discovery) |
--ignore-repo-config |
Ignore .repo-auditor.toml in the audited repository |
--org-mode {primary,multiple,off} |
Behavior for organization URLs |
--org-limit N |
How many repos in multiple mode (default 5) |
--keep-last N |
Keep N latest runs after audit |
--gc --keep-last N |
Standalone GC: apply retention to all slugs in index.json |
--doctor |
Environment diagnostics (tool versions) |
--version |
Version |
Exit codes:
0โ success1โ clone/collection error2โ invalid target /--gcwithout--keep-last3โ organization URL with--org-mode off4โ total below--fail-below(CI gate)5โ no suitable repos in organization (all forks/archived)6โ regression detected (--since)8โ policy threshold breach (--fail-on); if both4and8trigger,4wins16โ regression detected in compare mode (--regression-fail)
Repository configuration
repo-auditor looks for .repo-auditor.toml in the root of the cloned repository.
It can contain scoring overrides and security suppression rules.
Security suppressions
You can suppress known-false-positive security findings with [[security.suppress]] tables:
[[security.suppress]]
tool = "gitleaks"
path_pattern = "tests/fixtures/*"
check_id_prefix = "generic-api-key"
Each rule matches when all specified fields match a finding. Fields act as wildcards when omitted:
toolโ scanner name (gitleaks,osv-scanner,semgrep)path_patternโ glob pattern against the file pathcheck_id_prefixโ prefix of the check/rule identifier
Trust boundary
.repo-auditor.toml inside the audited repository is repo-local config. When you audit a third-party repository you do not control, that config is untrusted โ the repository could hide findings via local suppressions or override scoring.
For high-trust audits of third-party repositories, use --ignore-repo-config to skip loading .repo-auditor.toml from the cloned repository. This disables all repo-local config, including:
- scoring overrides;
[[security.suppress]]rules;- repo-local executable
[[plugins]]; [[scoring.plugins]]rules.
Repo-local executable plugins execute code and remain disabled by default unless --allow-repo-plugins is provided. --ignore-repo-config still wins even when --allow-repo-plugins is passed.
Trusted suppressions passed programmatically via AuditConfig.security_suppressions are always honored and remain separate from repo-local suppressions. Trusted CLI plugins from --plugin PATH are also always honored.
Plugin scoring documentation is in docs/plugins.md.
Where to find results
After a run, the report lives here:
data/github.com/<owner>__<repo>/latest/report.ru.md
Handy commands:
# Latest report
cat data/github.com/pallets__flask/latest/report.ru.md
# Just the JSON score section (for scripts)
jq '.score' data/github.com/pallets__flask/latest/report.json
# List all audited repos
jq '.entries[] | "\(.slug)\t\(.total)/100\t\(.verdict)"' data/index.json
# Open in editor
$EDITOR data/github.com/pallets__flask/latest/report.ru.md
Policy scoring (v0.9.0+)
v0.9.0 introduces optional policy profiles that map adapter evidence (OpenSSF Scorecard, GitHub Community Profile) into bounded per-category contributions. The default profile preserves backward compatibility; use --profile to opt in.
See docs/scoring.md for profiles, trust matrix, CI gating with --fail-on, and migration notes.
Cross-repo compare (v1.0+)
Compare two repositories side-by-side before adoption or track your own repository over time.
# Pairwise library evaluation
repo-audit pallets/click --compare-with pallets/flask
# Regression mode against latest stored run
repo-audit owner/repo --diff-previous --regression-fail
Compare produces a terminal card, Markdown report, JSON document, and SARIF export. Repo-local config is always ignored for trust; use --plugin <path> for explicit scoring rules.
See docs/compare.md for the full reference: trust boundary, storage layout, SARIF structure, schema evolution, and exit code 16.
Scoring explained
Categories (each 0..10)
| Category | What counts |
|---|---|
| Purpose | README, Installation/Usage sections, ARCHITECTURE, CONTRIBUTING |
| Security | High/medium findings from gitleaks + osv-scanner, SECURITY.md, committed .env |
| Code Quality | CI, tests, linter, type checking, TODO density |
| Maturity | License, releases, contributors, stars, activity โค 90 days |
Total = sum(categories) ร 100 / 40. Red flag = any category โค 3.
Verdicts
| Condition | Verdict | Meaning |
|---|---|---|
| Any category โค 3 and total < 40 | not-recommended | Critical gap and overall weak |
| Any category โค 3 and total โฅ 40 | caution | Mostly OK, but one block is weak |
| No red flags, total โฅ 80 | reliable | Strong across all categories |
| No red flags, total โฅ 60 | working | Viable, no obvious gaps |
| No red flags, total โฅ 40 | caution | Medium level, needs attention |
| No red flags, total < 40 | not-recommended | Weak overall |
Storage layout
data/
github.com/
<owner>__<repo>/
repo/ โ git clone, reused via git fetch
runs/
YYYY-MM-DD_HHMMSS/ โ UTC timestamp
report.ru.md
report.json
card.txt
raw/
gitleaks.json
osv-scanner.json
tokei.json
latest -> runs/<ts>
<org>__org/
runs/...
index.json โ aggregate entry per slug
Retention
runs/ grow linearly โ each run writes a new UTC directory. Cleanup:
--keep-last Nโ after a successful audit, keep N latest runs, delete older ones.--gc --keep-last Nโ standalone pass overindex.json: apply retention to all slugs at once (without a new audit).
Invariant: the target of the latest symlink is never deleted.
# Clean entire storage, keeping 3 latest runs per repo
repo-audit --gc --keep-last 3
Examples
# Basic audit
python3 -m repo_auditor https://github.com/pallets/flask
# Fast run without heavy external tools
python3 -m repo_auditor pallets/flask --quick
# Built-in heuristics only (no gitleaks/osv/tokei)
python3 -m repo_auditor pallets/flask --no-external-tools
# Primary repo of an organization
python3 -m repo_auditor https://github.com/amnezia-vpn
# Top-3 by stars in an organization
python3 -m repo_auditor https://github.com/amnezia-vpn \
--org-mode multiple --org-limit 3
# JSON to stdout for pipelines
python3 -m repo_auditor pallets/flask --format json | jq '.score.total'
# SARIF for GitHub Advanced Security
python3 -m repo_auditor pallets/flask --format sarif --output-root ./sarif
# Exclude vendor and minified assets
python3 -m repo_auditor pallets/flask --exclude 'vendor/**' --exclude '*.min.js'
# CI gate: fail if total < 60
python3 -m repo_auditor pallets/flask --fail-below 60 || echo "Audit failed"
# With retention โ keep only 5 latest runs
python3 -m repo_auditor pallets/flask --keep-last 5
Auditing large repositories
Repositories like torvalds/linux, chromium/chromium, or semgrep/semgrep can take a long time to clone and consume significant disk space.
| Approach | Command | When to use |
|---|---|---|
| Fast audit (recommended) | --quick |
Shallow clone (depth=50) and skip osv-scanner + semgrep (2โ3ร faster) |
| Shallow clone | --clone-depth 1 |
CI pipelines where you only need the latest snapshot |
| No clone | --no-external-tools |
Skip clone entirely; relies on GitHub API only (poorer card, but instant) |
# Example: audit a very large repo without downloading full history
python3 -m repo_auditor torvalds/linux --quick
# CI pipeline: minimal clone, fail gate
python3 -m repo_auditor owner/repo --clone-depth 1 --fail-below 60
Tip:
semgrep/semgrepclones at ~170 MB.--quickbrings it down to ~50 MB and cuts scan time by half becausesemgrepandosv-scannerare skipped.
Troubleshooting
gh: command not found or 403 rate-limit
GitHub API without auth = 60 requests/hour. Install gh (gh auth login) or export GITHUB_TOKEN.
fatal: could not read Username for 'https://github.com' (private repo)
Auth required. gh auth login or git config --global credential.helper store + GITHUB_TOKEN.
Score seems low for model-card / dataset repos
The rubric is geared toward software projects. Repos without tests/, CI, pyproject.toml will score low on quality. This is by design.
tool unavailable for osv-scanner/tokei/gitleaks in JSON
Scanner not installed โ pipeline falls back to built-in heuristics. For full accuracy install the tools (see table above) or run with --no-external-tools to suppress warnings.
Clone hangs / consumes a lot of disk
Use --quick (clone_depth=50). Large repos (100+ MB) can be audited without cloning via --no-external-tools + GitHub API only โ but the card will be poorer (no file-walk findings).
Card breaks on CJK/emoji in owner/name
Known limitation: alignment is character-based, not cell-width. Use --format json for machine-readable output without the visual card.
Want just the header without the full report
--format json | jq '.score' or cat data/.../latest/card.txt.
Limitations
- Does not replace manual security review or CodeQL.
- Heuristic mode (without
gitleaks) may produce rare false positives โ e.g. on 12-character PowerShell cmdlets ('Get-Location'afterpwd:). - Without
tokei,todo_densityis approximated via source LOC fromcode_analysisโ lower accuracy. - Monorepo with nested manifests: each manifest is counted separately, no aggregation across "total" dependencies.
- HTML fallback for organization listing is not implemented โ requires working
ghor GitHub REST API.
See also
README.ru.mdโ Russian version of this README.GETTING_STARTED.ru.mdโ step-by-step guide for non-programmers (Russian).AUDIT_GUIDE.mdโ for external auditors / AI tools before review (load-bearing constraints, hot zones, regression baseline).ARCHITECTURE.mdโ 30-second overview of layers and data flow.ROADMAP.mdโ public direction of development.SECURITY.mdโ disclosure policy.CHANGELOG.mdโ release history.CONTRIBUTING.mdโ dev cycle, conventional commits, stdlib-only rule.docs/compare.mdโ cross-repo compare reference.CLAUDE.mdโ architecture, data flow, scoring rules, false-positive filters (for contributors and Claude Code).LICENSEโ MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sfild_repo_auditor-1.0.0.tar.gz.
File metadata
- Download URL: sfild_repo_auditor-1.0.0.tar.gz
- Upload date:
- Size: 279.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b0c4e671d72d4b7d5f9d985ed414366c0938b4f54b517253b11e5a50461c4d3
|
|
| MD5 |
1bfff46d49c615bdb300338af624a600
|
|
| BLAKE2b-256 |
9ac072534bf0e2243da5cc74faa87a35814eecdcb1241724381679773a81f81b
|
Provenance
The following attestation bundles were made for sfild_repo_auditor-1.0.0.tar.gz:
Publisher:
pypi-publish.yml on SfilD/repo-auditor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sfild_repo_auditor-1.0.0.tar.gz -
Subject digest:
1b0c4e671d72d4b7d5f9d985ed414366c0938b4f54b517253b11e5a50461c4d3 - Sigstore transparency entry: 1628578421
- Sigstore integration time:
-
Permalink:
SfilD/repo-auditor@665d9356eccff57180f024ccc7d4c0d12f1642d1 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/SfilD
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@665d9356eccff57180f024ccc7d4c0d12f1642d1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sfild_repo_auditor-1.0.0-py3-none-any.whl.
File metadata
- Download URL: sfild_repo_auditor-1.0.0-py3-none-any.whl
- Upload date:
- Size: 150.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58fab6a49b13174b591b5c9682b05d4f963946801a794c5a1e0afe3974b22c55
|
|
| MD5 |
82f8ec66e4bb6b4c89903b0cd6b782d1
|
|
| BLAKE2b-256 |
0f13e7a5582537b78e1289fce386d9c2e335cf0b4affa81b0c27c336df837053
|
Provenance
The following attestation bundles were made for sfild_repo_auditor-1.0.0-py3-none-any.whl:
Publisher:
pypi-publish.yml on SfilD/repo-auditor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sfild_repo_auditor-1.0.0-py3-none-any.whl -
Subject digest:
58fab6a49b13174b591b5c9682b05d4f963946801a794c5a1e0afe3974b22c55 - Sigstore transparency entry: 1628578436
- Sigstore integration time:
-
Permalink:
SfilD/repo-auditor@665d9356eccff57180f024ccc7d4c0d12f1642d1 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/SfilD
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@665d9356eccff57180f024ccc7d4c0d12f1642d1 -
Trigger Event:
push
-
Statement type: