Skip to main content

A polished Python CLI that scans repos for signals consistent with AI-assisted code.

Project description

GitZero

GitZero is an explainable Python CLI that scans a GitHub repository URL or local repository folder for signals consistent with AI-assisted code.

It is designed for careful review, not accusations. GitZero does not prove authorship. It surfaces evidence, ranks the strongest signals, and explains why a repository may deserve closer inspection.

Introduction | How It Works | Install | Usage | Evaluation | Demo

Introduction

AI-generated and AI-assisted code often leaves patterns across commit history, project shape, documentation style, and static code structure. GitZero combines those signals into an explainable terminal report.

The project was built as a data, software, and AI systems portfolio piece:

  • Data pipeline: batch scanning, labeled corpus export, feature columns, and model-ready JSONL/CSV output.
  • Software engineering: Typer CLI, Rich terminal UI, test coverage, linting, local and GitHub URL support, and safe temporary clone cleanup.
  • AI evaluation: heuristic scoring plus an optional experimental ML model for calibration.
  • Responsible language: risk bands are triage labels, not authorship claims.

Screenshot Slots

Use these slots for screenshots before publishing the README:

Slot What to capture Suggested file
CLI summary A normal gitzero scan <repo> report showing the summary and signal map. docs/images/scan-summary.png
Hard evidence A scan where GitZero finds an AI config file or explicit README phrase. docs/images/hard-evidence.png
JSON output A terminal/editor view of gitzero scan <repo> --json. docs/images/json-output.png
Batch workflow A corpus scan or evaluation output. docs/images/batch-evaluation.png

After adding images, place them near the relevant sections, for example:

![GitZero scan summary](docs/images/scan-summary.png)

How It Works

GitZero runs a multi-stage scan:

  1. Load the repository

    • Accepts a local folder, public GitHub URL, or public git URL.
    • URL scans are cloned into a temporary directory and deleted after the scan.
  2. Analyze git history

    • Looks for large commit bursts, file creation waves, short project timelines, single/drop-style histories, no-merge histories, formulaic commit messages, author uniformity, and unusual time clustering.
  3. Analyze source files

    • Uses Python ast, regex heuristics, and radon complexity metrics.
    • Supports Python, JavaScript, TypeScript, JSX/TSX, notebooks, and common source files.
    • Ignores common generated files, lockfiles, vendor libraries, framework config files, build output, caches, virtual environments, and oversized files.
  4. Detect hard evidence

    • Flags explicit AI-assistant project files such as AGENTS.md, CLAUDE.md, .cursorrules, Copilot instructions, .aider, .continue, Windsurf/Cline/Roo rules, and README phrases like built with ChatGPT, made with Cursor, or built with help from ChatGPT.
  5. Apply false-positive dampeners

    • Reduces risk when the repo shows organic development patterns: long-lived history, multiple authors, merge commits, debug residue, personal TODOs, substantive tests, README/code alignment, used dependencies, and starter-template patterns.
  6. Explain the result

    • Prints a risk band, confidence score, top signals, highest-signal files, skipped-file counts, and optional verbose per-file findings.

Install

GitZero is a Python package with a CLI entrypoint named gitzero.

Recommended: install from GitHub with pipx

pipx installs CLI tools into isolated environments.

pipx install git+https://github.com/Ivansost/gitzero.git

Then run:

gitzero help

Install with pip

python -m pip install git+https://github.com/Ivansost/gitzero.git

Local development install

git clone https://github.com/Ivansost/gitzero.git
cd gitzero
python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev,ml,parsing]"

Run the local CLI:

.venv/bin/gitzero help

Usage

Scan a public GitHub repository:

gitzero scan https://github.com/user/project

Scan a local repository:

gitzero scan ./my-local-repo

Print machine-readable JSON:

gitzero scan ./my-local-repo --json

Show every per-file signal:

gitzero scan ./my-local-repo --verbose

Skip git history and score only the current source tree:

gitzero scan ./my-local-repo --no-git-history

Exclude folders or globs:

gitzero scan ./my-local-repo --exclude node_modules --exclude dist

Use the optional experimental ML model:

gitzero scan ./my-local-repo --ml-model ./model.joblib

--ml-model is experimental. Use the probability as a calibration aid next to the heuristic score, not as a standalone authorship claim.

Risk Bands

Band Range Meaning
Low 0-39 Few signals consistent with AI-assisted code.
Medium 40-69 Several signals are elevated. Review the top findings. This is not an AI claim.
High 70-100 Many signals are elevated. Inspect history and files closely.

Batch And Corpus Workflow

Create a labeled fixture corpus:

gitzero fixtures ./fixtures/gitzero-corpus

Scan a labeled corpus into JSONL:

gitzero batch ./fixtures/gitzero-corpus \
  --labels ./fixtures/gitzero-corpus/labels.csv \
  --format jsonl \
  --output ./fixtures/results.jsonl

Scan a two-level corpus layout:

corpus/
  ai_generated/repo-a
  ai_assisted/repo-b
  human/repo-c
  template/repo-d
gitzero batch ./corpus --recursive --label-from-parent --format jsonl -o corpus.jsonl

Batch rows include inspection fields and ML-ready feature columns:

signal.git.large_commits_present
signal.git.large_commits_score
signal.git.large_commits_weight
signal.dampener.git.multi_author_history_score
signal.dampener.static.personal_todo_patterns_score

Evaluation

GitZero currently uses heuristic scoring as the primary product behavior. The ML model is kept optional because the live tests showed the heuristic is more reliable for public scans.

Current validation artifacts:

  • Cleaned labeled corpus: 129 repositories across ai_generated, ai_assisted, human, and template.
  • Grouped cross-validation: grouped by repository owner to reduce leakage.
  • Ablation model without hard evidence: ROC-AUC 0.903, PR-AUC 0.853.
  • Live out-of-corpus smoke test: 60 GitHub repositories.
    • Hard-evidence AI: 17/20 scored High.
    • Human OSS: 0/20 scored High.
    • AI-assisted candidates: mostly Medium/High, with intentionally conservative ML scores.

The main takeaway: GitZero is useful as an explainable review tool. It should not be framed as a definitive detector.

Tech Stack

  • Python package with a gitzero console script.
  • Typer for CLI commands.
  • Rich for terminal UI.
  • PyDriller plus git fallback for history analysis.
  • radon for complexity metrics.
  • Optional scikit-learn / joblib model loading for experimental ML probability.

Development

python -m pip install -e ".[dev,ml,parsing]"
python -m pytest
python -m ruff check .

Demo

Full demo, screenshots, and technical write-up:

GitZero project write-up and demo

Replace the demo link with your portfolio URL after publishing the write-up.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gitzero-0.1.0.tar.gz (56.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gitzero-0.1.0-py3-none-any.whl (49.6 kB view details)

Uploaded Python 3

File details

Details for the file gitzero-0.1.0.tar.gz.

File metadata

  • Download URL: gitzero-0.1.0.tar.gz
  • Upload date:
  • Size: 56.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for gitzero-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ce8dd9d0346b12356b4ea3bd4fe5e29c814878988d0538c0d52228eaa65b9488
MD5 94fec9e73c2b9aa6862dbb0bfb6c418e
BLAKE2b-256 68b6e81cffd23bc4dcff15b7da2aab0b86a2325eaaee449013a9e2318646d05a

See more details on using hashes here.

File details

Details for the file gitzero-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gitzero-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 49.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for gitzero-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ce067d17b57fbcb027ee183b9b5dbb951f20a9a85bbbaf40acd3f4d0856186b
MD5 d8f24b24c415a7f26fdd526ce3403867
BLAKE2b-256 aa23bff4c307c6384f0a093fa531196ef6f858875e62a86ec5b417bb23a242c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page