Git history documentation and consolidation tool
Project description
repogerbil
Git history documentation and consolidation tool.
Turns messy git histories into clean, documented daily commits by combining changelog generation with commit consolidation. Use it to produce per-day YAML changelog records, distill a noisy branch in place, or emit an entirely fresh repo with a clean derived history (private→public, monorepo→public, ecosystem→single timeline).
- Source: https://github.com/livingstaccato/repogerbil
- Issues: https://github.com/livingstaccato/repogerbil/issues
- Releases: https://github.com/livingstaccato/repogerbil/releases
- Changelog: CHANGELOG.md
- Architecture: docs/ARCHITECTURE.md
- Configuration: docs/CONFIGURATION.md
- Schema reference: docs/SCHEMA.md
- Vocabulary: docs/VOCABULARY.md
- Vector DB design: docs/VECTOR-DB-DESIGN.md
- Assistant integration: docs/INTEGRATION.md
Install
pip install repogerbil
# or
uv add repogerbil
# Run without a permanent install
uvx --from repogerbil gerbil --help
# Optional: vector database for semantic search
pip install repogerbil[vectordb]
Quick Start
# See what's in a repo
gerbil status /path/to/repo
# Generate a changelog for today
gerbil changelog /path/to/repo --date 2026-04-07 --analyze
# Generate an LLM prompt with diffs
gerbil changelog /path/to/repo --date 2026-04-07 --prompt
# Generate a release-span changelog prompt
gerbil changelog-span /path/to/repo --from v0.3.21 --to v0.4.0 --output prompt.md
# Audit commit message quality
gerbil audit /path/to/repo --show-bad
# Verify changelog accuracy
gerbil verify /path/to/changelogs /path/to/repo
# Fix stats to match git truth
gerbil fix-stats /path/to/changelogs /path/to/repo
# Enrich changelogs with per-section stats + impact
gerbil enrich /path/to/changelogs /path/to/repo --depth package
# Generate weekly summary
gerbil summary /path/to/changelogs --year 2026 --week 15
# Show missing changelog dates across all tracked repos
gerbil missing /path/to/changelogs --config .repogerbil.toml
# Backfill all missing changelogs
gerbil backfill /path/to/changelogs --config .repogerbil.toml
# Preview a distill
gerbil distill /path/to/repo --dry-run
# Distill with changelog-based commit messages
gerbil distill /path/to/repo --changelog-dir /path/to/changelogs
# Inspect a source repo before distilling — surface artifacts to exclude
gerbil preflight /path/to/repo
# Emit ready-to-paste --exclude-path flags for gerbil snapshot
gerbil preflight /path/to/repo --emit-flags
# Create an independent distilled snapshot repo
gerbil snapshot /path/to/source /path/to/dest \
--cadence gap:15m \
--exclude-path '^\.claude(/|$)' \
--exclude-path '\.lock$' \
--time-window-start 20:00 \
--time-window-end 00:00 \
--timezone America/Los_Angeles
# Merge multiple repos into one ecosystem-labeled distilled snapshot
gerbil multi-snapshot /path/to/dest \
--repo api:/path/to/api \
--repo web:/path/to/web \
--ecosystem-label my-platform \
--timezone America/Los_Angeles
# Index changelogs for semantic search (requires vectordb extra)
gerbil index /path/to/changelogs
# Semantic search across all changelogs
gerbil search "security hardening" --top 5
# Find related cross-repo work
gerbil related provide-telemetry --date 2026-04-07
# Find similar file-change history from path signatures
gerbil similar src/repogerbil/cli/main.py tests/cli/test_main.py --top 5
# Search likely impact context from indexed path/diff history
gerbil impact "src/repogerbil/cli/main.py" --source filepaths --top 5
gerbil impact "retry backoff" --source diffs --top 5
# Record missing commit metadata in sidecar records after history changes
gerbil catch-up /path/to/repo /path/to/repo.summaries.jsonl
gerbil realign /path/to/repo /path/to/repo.summaries.jsonl
Global Options
--verbose is a group-level flag (no short form — -v is reserved for per-command use such as preflight -v). Pass it at the group level, before the subcommand, to enable INFO-level logging from repogerbil.* loggers:
gerbil --verbose snapshot /path/to/source /path/to/dest --cadence daily
Without --verbose, logging defaults to WARNING level.
Commands
| Command | What it does |
|---|---|
status |
Show repo info: active dates, date range |
changelog |
Generate changelog YAML (draft, analyze, or prompt mode) |
changelog-span |
Generate a release-span prompt or synthesized changelog for from..to |
fix-stats |
Correct changelog stats to match git truth |
verify |
Check stats accuracy + file coverage |
enrich |
Add per-section stats + import impact to changelogs |
audit |
Report commit message prefix adoption |
preflight |
Scan a source repo — classify committed files as artifact/source/unknown, emit exclude flags |
snapshot |
Create an independent repo with distilled history |
multi-snapshot |
Merge multiple source repos into one distilled snapshot |
distill |
Consolidate commits into daily/weekly groups (same repo, destructive) |
distill-ecosystem |
Distill multiple repos in parallel with conventional commits |
preview |
Rich table preview of what distillation would produce |
export-cadence |
Export cadence-grouped commits as JSON |
probe |
Probe candidate commit sources for a repo/date pair |
summary |
Generate weekly cross-repo summary |
missing |
Show missing changelog dates across tracked repos |
backfill |
Batch generate changelogs for all missing dates |
catch-up |
Record missing HEAD commit metadata to a .summaries.jsonl sidecar |
append |
Legacy alias for catch-up |
realign |
Re-key legacy .summaries.jsonl records to current local commit SHAs |
lint |
Validate changelog YAML files against schema |
plugin |
Export or install bundled assistant plugin files |
index |
Index changelogs into vector database (requires [vectordb]) |
search |
Semantic search across changelogs (requires [vectordb]) |
related |
Find related work in other repos (requires [vectordb]) |
similar |
Find changelogs that touched similar file paths (requires [vectordb]) |
impact |
Search filepath/diff history for impact context (requires [vectordb]) |
Snapshot Workflow
snapshot creates an entirely independent destination repo with a clean, distilled history derived from the source. The source is never modified.
# 1. Inspect the source repo — see what would be excluded
gerbil preflight /path/to/source
gerbil preflight /path/to/source --verbose # also show source files
gerbil preflight /path/to/source --emit-flags # print ready-to-paste flags
# 2. Create the snapshot
gerbil snapshot /path/to/source /path/to/dest \
--cadence gap:15m \
--exclude-path '__pycache__' \
--exclude-path '(poetry|yarn|Pipfile|Gemfile|Cargo|composer|packages|uv)\.lock$' \
--exclude-path '^\.claude(/|$)' \
--time-window-start 20:00 \
--time-window-end 00:00 \
--timezone America/Los_Angeles
--exclude-path
Full Python re.search() regex. Matched paths are stripped from every committed tree. Repeatable.
| Pattern | Excludes |
|---|---|
__pycache__ |
All __pycache__ dirs |
\.lock$ |
All lock files |
| `^.claude(/ | $)` |
^mutants/ |
Mutation testing output |
\.bak$ |
Stale backup files |
--time-window-start / --time-window-end
Spread snapshot commits across a daily time window (HH:MM format). Commits are spaced proportionally by number of changed files with random jitter — makes reconstructed history look organic. Requires --timezone. Mutually exclusive with --commit-time.
--time-window-start 20:00 --time-window-end 00:00 --timezone America/Los_Angeles
# 3 commits on 2026-04-10 land at e.g. 20:14, 21:47, 23:22
Windows crossing midnight are supported (23:00–01:00).
Preflight artifact categories
preflight classifies every committed file path against known artifact patterns:
| Category | Examples |
|---|---|
| Python bytecode | __pycache__/, .pyc, .pyo, .pytest_cache, .mypy_cache |
| Lock files | poetry.lock, yarn.lock, go.sum, go.mod, package-lock.json |
| Build artifacts | dist/, build/, .egg-info/, .so, .zip |
| Generated stubs | .pyi |
| Mutation testing | mutants/, .meta |
| Backup files | .bak |
| Coverage reports | htmlcov/, .coverage, cov.xml, coverage.xml |
| AI tool configs | .claude/, .codex/, .cursor/, .aider/, .continue/ |
| IDE configs | .idea/, .vscode/ |
| VCS meta | CODEOWNERS |
| Ephemeral docs | HANDOFF.md, SCRATCH.md, NOTES.md, .provide/ |
| Tool configs | .python-version, .actrc, .pyre_configuration |
| Vendored deps | vendor/, node_modules/ |
| Binary fixtures | .msgpack |
| OS noise | .DS_Store, Thumbs.db |
Changelog Modes
- Draft (default): Skeleton with
Draft:placeholders, commit subjects as points - Analyze (
--analyze): Complete changelog with real titles, summaries, grouped sections - Prompt (
--prompt): LLM-ready markdown with diffs for external analysis
Reproducibility
- Non-LLM workflows are deterministic and reproducible for the same inputs/config.
- Snapshot time-window jitter is deterministic by default (stable seeded output).
- LLM-generated commit messages are the only intentionally non-deterministic surface.
Vector Database
With pip install repogerbil[vectordb], changelogs are indexed into 4 ChromaDB collections:
| Collection | What it stores |
|---|---|
changelogs |
Title + summary embeddings with repo/date/stats/category/quality metadata |
changes |
Per-section title + point embeddings with category, severity, scope metadata |
filepaths |
Space-joined file paths per changelog |
diffs |
Optional per-file diff chunks when indexing with source repos |
Those collections support 7 practical search facets:
| Dimension | What it enables |
|---|---|
| Title + summary | Semantic search across repos |
| Change sections | Per-section search, category filtering |
| File paths | "What else changed when login.py was modified?" |
| Diff content (opt-in) | Code-level semantic search |
| Category distribution | Work pattern matching |
| Scopes | search --scope parity across all repos |
| Quality metrics | Surface changelogs needing the most work |
Configuration
Create .repogerbil.toml in your project root:
cadence = "daily"
message_depth = "subject" # subject | refs | full
backfill_depth = "heuristic" # heuristic | thorough
tolerance = 20 # verify_stats % tolerance
llm_ollama_url = "http://localhost:11434"
llm_model = "qwen3-coder-next:q8_0"
llm_temperature = 0.0
llm_timeout_seconds = 120.0
llm_concurrency = 1
llm_refine = false # when true, snapshot/multi-snapshot auto-enable LLM refinement
[[file_rules]]
pattern = "*.lock"
action = "bulk"
category = "baseline"
reason = "Lock file update"
[[file_rules]]
pattern = "*.pyc"
action = "skip"
[repos.my-important-repo]
backfill_depth = "thorough"
message_depth = "refs"
skip_dates = ["2026-04-01"]
[tracked]
uwarp-space = "/path/to/uwarp-space"
provide-telemetry = "/path/to/provide-telemetry"
Resolution order: CLI flags > env vars (REPOGERBIL_*) > walked .repogerbil.toml > ~/.config/repogerbil/config.toml > defaults
Vocabulary
| Category | Conventional | Description |
|---|---|---|
instantiate |
feat | New capability or feature |
remediate |
fix | Bug fix |
decouple |
refactor | Reduce coupling, improve modularity |
deprecate |
remove | Retire dead/unused code |
interface |
feat | Define connections between subsystems |
specify |
docs | Documentation, specs |
qualify |
test | Tests, verification |
margin |
fix | Add buffer/slack (timeouts, limits) |
harden |
fix | Resist failure/attack (validation, retries) |
streamline |
perf | Performance optimization |
baseline |
chore | Dependencies, config, environment |
| Severity | Semver | Description |
|---|---|---|
architectural |
major | Breaking change or foundational redesign |
behavioral |
minor | Observable behavior change |
internal |
patch | Implementation detail only |
errata |
— | Cosmetic, near-invisible |
File Rules
Control how files are handled during --analyze:
- bulk: Count toward bulk entries, remove from detailed changes
- skip: Ignore entirely (not in stats, bulk, or changes)
- classify: Keep in changes but force a specific category
AI Plugin Integration
repogerbil ships a shared plugin at plugins/repogerbil/ with:
- Skill (
gerbil): Context-aware changelog and history management - Agent (
analyzer): Deep diff analysis for thorough changelog generation - Claude manifest:
plugins/repogerbil/.claude-plugin/plugin.json - Codex manifest:
plugins/repogerbil/.codex-plugin/plugin.json
For Claude Code development and testing:
claude --plugin-dir ./plugins
Codex uses the same shared plugin directory, with local marketplace metadata in .agents/plugins/marketplace.json.
To install the bundled plugin files from an installed package:
# Codex: writes plugin files into ~/.codex/plugins/repogerbil and marketplace metadata into ~/.agents/plugins/marketplace.json
uvx --from repogerbil gerbil plugin install --target codex
# Claude Code: writes into ./plugins/repogerbil and ./plugins/.claude-plugin/marketplace.json from the current directory
uvx --from repogerbil gerbil plugin install --target claude
Development
uv sync --all-extras
make quality # Run all quality gates
make test # Run tests (100% coverage required)
make lint # ruff format + check
make type-check # mypy strict
make security # bandit
make complexity # xenon
make dead-code # vulture
make mutation # mutmut
License
Apache-2.0 — © 2026 provide.io llc. See REUSE.toml for SPDX metadata.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repogerbil-0.4.0.tar.gz.
File metadata
- Download URL: repogerbil-0.4.0.tar.gz
- Upload date:
- Size: 123.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac62ae263a83cbad51754c5f9d98a06032be4fc04528934d1c6d0de6f38a12e2
|
|
| MD5 |
358248b1a8a3f33b6c741e74c0392a0b
|
|
| BLAKE2b-256 |
4c65dda63f85654930d9f73df83ae18672c683343d5b141c3f3ce908486b8fed
|
File details
Details for the file repogerbil-0.4.0-py3-none-any.whl.
File metadata
- Download URL: repogerbil-0.4.0-py3-none-any.whl
- Upload date:
- Size: 146.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c3635d33c3eabea171d826dc3f6cdf48bbff11a2b31478843b379cd28382d45
|
|
| MD5 |
2e68393bfb9fe2f2931e2249f66ad178
|
|
| BLAKE2b-256 |
f693080a512ec6e3f5f27bf2a3f4ee5f5a5e2a929d9d682d4790d7d171e62073
|