Skip to main content

Git history documentation and consolidation tool

Project description

repogerbil

License Python 3.11+ uv Ruff CI Mutation

Git history documentation and consolidation tool.

Turns messy git histories into clean, documented daily commits by combining changelog generation with commit consolidation. Use it to produce per-day YAML changelog records, distill a noisy branch in place, or emit an entirely fresh repo with a clean derived history (private→public, monorepo→public, ecosystem→single timeline).

Install

pip install repogerbil
# or
uv add repogerbil

# Run without a permanent install
uvx --from repogerbil gerbil --help

# Optional: vector database for semantic search
pip install repogerbil[vectordb]

Quick Start

# See what's in a repo
gerbil status /path/to/repo

# Generate a changelog for today
gerbil changelog /path/to/repo --date 2026-04-07 --analyze

# Generate an LLM prompt with diffs
gerbil changelog /path/to/repo --date 2026-04-07 --prompt

# Generate a release-span changelog prompt
gerbil changelog-span /path/to/repo --from v0.3.21 --to v0.4.0 --output prompt.md

# Audit commit message quality
gerbil audit /path/to/repo --show-bad

# Verify changelog accuracy
gerbil verify /path/to/changelogs /path/to/repo

# Fix stats to match git truth
gerbil fix-stats /path/to/changelogs /path/to/repo

# Enrich changelogs with per-section stats + impact
gerbil enrich /path/to/changelogs /path/to/repo --depth package

# Generate weekly summary
gerbil summary /path/to/changelogs --year 2026 --week 15

# Show missing changelog dates across all tracked repos
gerbil missing /path/to/changelogs --config .repogerbil.toml

# Backfill all missing changelogs
gerbil backfill /path/to/changelogs --config .repogerbil.toml

# Preview a distill
gerbil distill /path/to/repo --dry-run

# Distill with changelog-based commit messages
gerbil distill /path/to/repo --changelog-dir /path/to/changelogs

# Inspect a source repo before distilling — surface artifacts to exclude
gerbil preflight /path/to/repo

# Emit ready-to-paste --exclude-path flags for gerbil snapshot
gerbil preflight /path/to/repo --emit-flags

# Create an independent distilled snapshot repo
gerbil snapshot /path/to/source /path/to/dest \
  --cadence gap:15m \
  --exclude-path '^\.claude(/|$)' \
  --exclude-path '\.lock$' \
  --time-window-start 20:00 \
  --time-window-end 00:00 \
  --timezone America/Los_Angeles

# Merge multiple repos into one ecosystem-labeled distilled snapshot
gerbil multi-snapshot /path/to/dest \
  --repo api:/path/to/api \
  --repo web:/path/to/web \
  --ecosystem-label my-platform \
  --timezone America/Los_Angeles

# Index changelogs for semantic search (requires vectordb extra)
gerbil index /path/to/changelogs

# Semantic search across all changelogs
gerbil search "security hardening" --top 5

# Find related cross-repo work
gerbil related provide-telemetry --date 2026-04-07

# Find similar file-change history from path signatures
gerbil similar src/repogerbil/cli/main.py tests/cli/test_main.py --top 5

# Search likely impact context from indexed path/diff history
gerbil impact "src/repogerbil/cli/main.py" --source filepaths --top 5
gerbil impact "retry backoff" --source diffs --top 5

# Record missing commit metadata in sidecar records after history changes
gerbil catch-up /path/to/repo /path/to/repo.summaries.jsonl
gerbil realign /path/to/repo /path/to/repo.summaries.jsonl

Global Options

--verbose is a group-level flag (no short form — -v is reserved for per-command use such as preflight -v). Pass it at the group level, before the subcommand, to enable INFO-level logging from repogerbil.* loggers:

gerbil --verbose snapshot /path/to/source /path/to/dest --cadence daily

Without --verbose, logging defaults to WARNING level.

Commands

Command What it does
status Show repo info: active dates, date range
changelog Generate changelog YAML (draft, analyze, or prompt mode)
changelog-span Generate a release-span prompt or synthesized changelog for from..to
fix-stats Correct changelog stats to match git truth
verify Check stats accuracy + file coverage
enrich Add per-section stats + import impact to changelogs
audit Report commit message prefix adoption
preflight Scan a source repo — classify committed files as artifact/source/unknown, emit exclude flags
snapshot Create an independent repo with distilled history
multi-snapshot Merge multiple source repos into one distilled snapshot
distill Consolidate commits into daily/weekly groups (same repo, destructive)
distill-ecosystem Distill multiple repos in parallel with conventional commits
preview Rich table preview of what distillation would produce
export-cadence Export cadence-grouped commits as JSON
probe Probe candidate commit sources for a repo/date pair
summary Generate weekly cross-repo summary
missing Show missing changelog dates across tracked repos
backfill Batch generate changelogs for all missing dates
catch-up Record missing HEAD commit metadata to a .summaries.jsonl sidecar
append Legacy alias for catch-up
realign Re-key legacy .summaries.jsonl records to current local commit SHAs
lint Validate changelog YAML files against schema
plugin Export or install bundled assistant plugin files
index Index changelogs into vector database (requires [vectordb])
search Semantic search across changelogs (requires [vectordb])
related Find related work in other repos (requires [vectordb])
similar Find changelogs that touched similar file paths (requires [vectordb])
impact Search filepath/diff history for impact context (requires [vectordb])

Snapshot Workflow

snapshot creates an entirely independent destination repo with a clean, distilled history derived from the source. The source is never modified.

# 1. Inspect the source repo — see what would be excluded
gerbil preflight /path/to/source
gerbil preflight /path/to/source --verbose   # also show source files
gerbil preflight /path/to/source --emit-flags  # print ready-to-paste flags

# 2. Create the snapshot
gerbil snapshot /path/to/source /path/to/dest \
  --cadence gap:15m \
  --exclude-path '__pycache__' \
  --exclude-path '(poetry|yarn|Pipfile|Gemfile|Cargo|composer|packages|uv)\.lock$' \
  --exclude-path '^\.claude(/|$)' \
  --time-window-start 20:00 \
  --time-window-end 00:00 \
  --timezone America/Los_Angeles

--exclude-path

Full Python re.search() regex. Matched paths are stripped from every committed tree. Repeatable.

Pattern Excludes
__pycache__ All __pycache__ dirs
\.lock$ All lock files
`^.claude(/ $)`
^mutants/ Mutation testing output
\.bak$ Stale backup files

--time-window-start / --time-window-end

Spread snapshot commits across a daily time window (HH:MM format). Commits are spaced proportionally by number of changed files with random jitter — makes reconstructed history look organic. Requires --timezone. Mutually exclusive with --commit-time.

--time-window-start 20:00 --time-window-end 00:00 --timezone America/Los_Angeles
# 3 commits on 2026-04-10 land at e.g. 20:14, 21:47, 23:22

Windows crossing midnight are supported (23:0001:00).

Preflight artifact categories

preflight classifies every committed file path against known artifact patterns:

Category Examples
Python bytecode __pycache__/, .pyc, .pyo, .pytest_cache, .mypy_cache
Lock files poetry.lock, yarn.lock, go.sum, go.mod, package-lock.json
Build artifacts dist/, build/, .egg-info/, .so, .zip
Generated stubs .pyi
Mutation testing mutants/, .meta
Backup files .bak
Coverage reports htmlcov/, .coverage, cov.xml, coverage.xml
AI tool configs .claude/, .codex/, .cursor/, .aider/, .continue/
IDE configs .idea/, .vscode/
VCS meta CODEOWNERS
Ephemeral docs HANDOFF.md, SCRATCH.md, NOTES.md, .provide/
Tool configs .python-version, .actrc, .pyre_configuration
Vendored deps vendor/, node_modules/
Binary fixtures .msgpack
OS noise .DS_Store, Thumbs.db

Changelog Modes

  • Draft (default): Skeleton with Draft: placeholders, commit subjects as points
  • Analyze (--analyze): Complete changelog with real titles, summaries, grouped sections
  • Prompt (--prompt): LLM-ready markdown with diffs for external analysis

Reproducibility

  • Non-LLM workflows are deterministic and reproducible for the same inputs/config.
  • Snapshot time-window jitter is deterministic by default (stable seeded output).
  • LLM-generated commit messages are the only intentionally non-deterministic surface.

Vector Database

With pip install repogerbil[vectordb], changelogs are indexed into 4 ChromaDB collections:

Collection What it stores
changelogs Title + summary embeddings with repo/date/stats/category/quality metadata
changes Per-section title + point embeddings with category, severity, scope metadata
filepaths Space-joined file paths per changelog
diffs Optional per-file diff chunks when indexing with source repos

Those collections support 7 practical search facets:

Dimension What it enables
Title + summary Semantic search across repos
Change sections Per-section search, category filtering
File paths "What else changed when login.py was modified?"
Diff content (opt-in) Code-level semantic search
Category distribution Work pattern matching
Scopes search --scope parity across all repos
Quality metrics Surface changelogs needing the most work

Configuration

Create .repogerbil.toml in your project root:

cadence = "daily"
message_depth = "subject"       # subject | refs | full
backfill_depth = "heuristic"    # heuristic | thorough
tolerance = 20                  # verify_stats % tolerance

llm_ollama_url = "http://localhost:11434"
llm_model = "qwen3-coder-next:q8_0"
llm_temperature = 0.0
llm_timeout_seconds = 120.0
llm_concurrency = 1
llm_refine = false               # when true, snapshot/multi-snapshot auto-enable LLM refinement

[[file_rules]]
pattern = "*.lock"
action = "bulk"
category = "baseline"
reason = "Lock file update"

[[file_rules]]
pattern = "*.pyc"
action = "skip"

[repos.my-important-repo]
backfill_depth = "thorough"
message_depth = "refs"
skip_dates = ["2026-04-01"]

[tracked]
uwarp-space = "/path/to/uwarp-space"
provide-telemetry = "/path/to/provide-telemetry"

Resolution order: CLI flags > env vars (REPOGERBIL_*) > walked .repogerbil.toml > ~/.config/repogerbil/config.toml > defaults

Vocabulary

Category Conventional Description
instantiate feat New capability or feature
remediate fix Bug fix
decouple refactor Reduce coupling, improve modularity
deprecate remove Retire dead/unused code
interface feat Define connections between subsystems
specify docs Documentation, specs
qualify test Tests, verification
margin fix Add buffer/slack (timeouts, limits)
harden fix Resist failure/attack (validation, retries)
streamline perf Performance optimization
baseline chore Dependencies, config, environment
Severity Semver Description
architectural major Breaking change or foundational redesign
behavioral minor Observable behavior change
internal patch Implementation detail only
errata Cosmetic, near-invisible

File Rules

Control how files are handled during --analyze:

  • bulk: Count toward bulk entries, remove from detailed changes
  • skip: Ignore entirely (not in stats, bulk, or changes)
  • classify: Keep in changes but force a specific category

AI Plugin Integration

repogerbil ships a shared plugin at plugins/repogerbil/ with:

  • Skill (gerbil): Context-aware changelog and history management
  • Agent (analyzer): Deep diff analysis for thorough changelog generation
  • Claude manifest: plugins/repogerbil/.claude-plugin/plugin.json
  • Codex manifest: plugins/repogerbil/.codex-plugin/plugin.json

For Claude Code development and testing:

claude --plugin-dir ./plugins

Codex uses the same shared plugin directory, with local marketplace metadata in .agents/plugins/marketplace.json.

To install the bundled plugin files from an installed package:

# Codex: writes plugin files into ~/.codex/plugins/repogerbil and marketplace metadata into ~/.agents/plugins/marketplace.json
uvx --from repogerbil gerbil plugin install --target codex

# Claude Code: writes into ./plugins/repogerbil and ./plugins/.claude-plugin/marketplace.json from the current directory
uvx --from repogerbil gerbil plugin install --target claude

Development

uv sync --all-extras
make quality          # Run all quality gates
make test             # Run tests (100% coverage required)
make lint             # ruff format + check
make type-check       # mypy strict
make security         # bandit
make complexity       # xenon
make dead-code        # vulture
make mutation         # mutmut

License

Apache-2.0 — © 2026 provide.io llc. See REUSE.toml for SPDX metadata.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repogerbil-0.4.0.tar.gz (123.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repogerbil-0.4.0-py3-none-any.whl (146.2 kB view details)

Uploaded Python 3

File details

Details for the file repogerbil-0.4.0.tar.gz.

File metadata

  • Download URL: repogerbil-0.4.0.tar.gz
  • Upload date:
  • Size: 123.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.8

File hashes

Hashes for repogerbil-0.4.0.tar.gz
Algorithm Hash digest
SHA256 ac62ae263a83cbad51754c5f9d98a06032be4fc04528934d1c6d0de6f38a12e2
MD5 358248b1a8a3f33b6c741e74c0392a0b
BLAKE2b-256 4c65dda63f85654930d9f73df83ae18672c683343d5b141c3f3ce908486b8fed

See more details on using hashes here.

File details

Details for the file repogerbil-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: repogerbil-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 146.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.8

File hashes

Hashes for repogerbil-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c3635d33c3eabea171d826dc3f6cdf48bbff11a2b31478843b379cd28382d45
MD5 2e68393bfb9fe2f2931e2249f66ad178
BLAKE2b-256 f693080a512ec6e3f5f27bf2a3f4ee5f5a5e2a929d9d682d4790d7d171e62073

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page