A polished Python CLI that scans repos for signals consistent with AI-assisted code.
Project description
GitZero
GitZero is an explainable Python CLI that scans a GitHub repository URL or local repository folder for signals consistent with AI-assisted code.
It is designed for careful review, not accusations. GitZero does not prove authorship. It surfaces evidence, ranks the strongest signals, and explains why a repository may deserve closer inspection.
Introduction | How It Works | Install | Usage | Evaluation | Demo
Introduction
AI-generated and AI-assisted code often leaves patterns across commit history, project shape, documentation style, and static code structure. GitZero combines those signals into an explainable terminal report.
The project was built as a data, software, and AI systems portfolio piece:
- Data pipeline: batch scanning, labeled corpus export, feature columns, and model-ready JSONL/CSV output.
- Software engineering: Typer CLI, Rich terminal UI, test coverage, linting, local and GitHub URL support, and safe temporary clone cleanup.
- AI evaluation: heuristic scoring plus an optional experimental ML model for calibration.
- Responsible language: risk bands are triage labels, not authorship claims.
Screenshot Slots
Use these slots for screenshots before publishing the README:
| Slot | What to capture | Suggested file |
|---|---|---|
| CLI summary | A normal gitzero scan <repo> report showing the summary and signal map. |
docs/images/scan-summary.png |
| Hard evidence | A scan where GitZero finds an AI config file or explicit README phrase. | docs/images/hard-evidence.png |
| JSON output | A terminal/editor view of gitzero scan <repo> --json. |
docs/images/json-output.png |
| Batch workflow | A corpus scan or evaluation output. | docs/images/batch-evaluation.png |
After adding images, place them near the relevant sections, for example:

How It Works
GitZero runs a multi-stage scan:
-
Load the repository
- Accepts a local folder, public GitHub URL, or public git URL.
- URL scans are cloned into a temporary directory and deleted after the scan.
-
Analyze git history
- Looks for large commit bursts, file creation waves, short project timelines, single/drop-style histories, no-merge histories, formulaic commit messages, author uniformity, and unusual time clustering.
-
Analyze source files
- Uses Python
ast, regex heuristics, andradoncomplexity metrics. - Supports Python, JavaScript, TypeScript, JSX/TSX, notebooks, and common source files.
- Ignores common generated files, lockfiles, vendor libraries, framework config files, build output, caches, virtual environments, and oversized files.
- Uses Python
-
Detect hard evidence
- Flags explicit AI-assistant project files such as
AGENTS.md,CLAUDE.md,.cursorrules, Copilot instructions,.aider,.continue, Windsurf/Cline/Roo rules, and README phrases likebuilt with ChatGPT,made with Cursor, orbuilt with help from ChatGPT.
- Flags explicit AI-assistant project files such as
-
Apply false-positive dampeners
- Reduces risk when the repo shows organic development patterns: long-lived history, multiple authors, merge commits, debug residue, personal TODOs, substantive tests, README/code alignment, used dependencies, and starter-template patterns.
-
Explain the result
- Prints a risk band, confidence score, top signals, highest-signal files, skipped-file counts, and optional verbose per-file findings.
Install
GitZero is a Python package with a CLI entrypoint named gitzero.
Recommended: install from GitHub with pipx
pipx installs CLI tools into isolated environments.
pipx install git+https://github.com/Ivansost/gitzero.git
Then run:
gitzero help
Install with pip
python -m pip install git+https://github.com/Ivansost/gitzero.git
Local development install
git clone https://github.com/Ivansost/gitzero.git
cd gitzero
python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev,ml,parsing]"
Run the local CLI:
.venv/bin/gitzero help
Usage
Scan a public GitHub repository:
gitzero scan https://github.com/user/project
Scan a local repository:
gitzero scan ./my-local-repo
Print machine-readable JSON:
gitzero scan ./my-local-repo --json
Show every per-file signal:
gitzero scan ./my-local-repo --verbose
Skip git history and score only the current source tree:
gitzero scan ./my-local-repo --no-git-history
Exclude folders or globs:
gitzero scan ./my-local-repo --exclude node_modules --exclude dist
Use the optional experimental ML model:
gitzero scan ./my-local-repo --ml-model ./model.joblib
--ml-model is experimental. Use the probability as a calibration aid next to the
heuristic score, not as a standalone authorship claim.
Risk Bands
| Band | Range | Meaning |
|---|---|---|
| Low | 0-39 | Few signals consistent with AI-assisted code. |
| Medium | 40-69 | Several signals are elevated. Review the top findings. This is not an AI claim. |
| High | 70-100 | Many signals are elevated. Inspect history and files closely. |
Batch And Corpus Workflow
Create a labeled fixture corpus:
gitzero fixtures ./fixtures/gitzero-corpus
Scan a labeled corpus into JSONL:
gitzero batch ./fixtures/gitzero-corpus \
--labels ./fixtures/gitzero-corpus/labels.csv \
--format jsonl \
--output ./fixtures/results.jsonl
Scan a two-level corpus layout:
corpus/
ai_generated/repo-a
ai_assisted/repo-b
human/repo-c
template/repo-d
gitzero batch ./corpus --recursive --label-from-parent --format jsonl -o corpus.jsonl
Batch rows include inspection fields and ML-ready feature columns:
signal.git.large_commits_present
signal.git.large_commits_score
signal.git.large_commits_weight
signal.dampener.git.multi_author_history_score
signal.dampener.static.personal_todo_patterns_score
Evaluation
GitZero currently uses heuristic scoring as the primary product behavior. The ML model is kept optional because the live tests showed the heuristic is more reliable for public scans.
Current validation artifacts:
- Cleaned labeled corpus: 129 repositories across
ai_generated,ai_assisted,human, andtemplate. - Grouped cross-validation: grouped by repository owner to reduce leakage.
- Ablation model without hard evidence: ROC-AUC
0.903, PR-AUC0.853. - Live out-of-corpus smoke test: 60 GitHub repositories.
- Hard-evidence AI: 17/20 scored High.
- Human OSS: 0/20 scored High.
- AI-assisted candidates: mostly Medium/High, with intentionally conservative ML scores.
The main takeaway: GitZero is useful as an explainable review tool. It should not be framed as a definitive detector.
Tech Stack
- Python package with a
gitzeroconsole script. - Typer for CLI commands.
- Rich for terminal UI.
- PyDriller plus git fallback for history analysis.
- radon for complexity metrics.
- Optional scikit-learn / joblib model loading for experimental ML probability.
Development
python -m pip install -e ".[dev,ml,parsing]"
python -m pytest
python -m ruff check .
Demo
Full demo, screenshots, and technical write-up:
GitZero project write-up and demo
Replace the demo link with your portfolio URL after publishing the write-up.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gitzero-0.1.0.tar.gz.
File metadata
- Download URL: gitzero-0.1.0.tar.gz
- Upload date:
- Size: 56.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce8dd9d0346b12356b4ea3bd4fe5e29c814878988d0538c0d52228eaa65b9488
|
|
| MD5 |
94fec9e73c2b9aa6862dbb0bfb6c418e
|
|
| BLAKE2b-256 |
68b6e81cffd23bc4dcff15b7da2aab0b86a2325eaaee449013a9e2318646d05a
|
File details
Details for the file gitzero-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gitzero-0.1.0-py3-none-any.whl
- Upload date:
- Size: 49.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ce067d17b57fbcb027ee183b9b5dbb951f20a9a85bbbaf40acd3f4d0856186b
|
|
| MD5 |
d8f24b24c415a7f26fdd526ce3403867
|
|
| BLAKE2b-256 |
aa23bff4c307c6384f0a093fa531196ef6f858875e62a86ec5b417bb23a242c2
|