LLM-friendly replacement for git diff --staged
Project description
llm-commit-helper
A smarter replacement for git diff --staged, designed to feed LLMs a clean,
size-bounded summary of staged changes rather than raw diff noise.
The core problem with git diff --staged for LLM commit message generation:
- Large files (netlists, SVD files) flood the context window
- Reformatted Python files produce massive diffs with zero logic change
- Verilog AUTO-expanded sections (
AUTOWIRE,AUTOINST, …) produce long, order-dependent diffs that obscure real changes - Submodule updates show only a hash pair — no indication of what actually changed
llm-commit-helper handles all of these.
Requirements
- Python 3.11+
gitinPATH- Optional:
black(for Python formatting isolation) - Optional:
emacswithverilog-mode(for Verilog AUTO stripping)
No extra Python packages are required. Run directly from the source tree.
Installation
pip install -e /path/to/llm-commit-helper
Or from inside the project directory:
pip install -e .
# or
make install
Quick start
# From inside a git repository, with staged changes:
llm-commit-helper
Pipe the output to an LLM to generate a commit message:
llm-commit-helper | llm "Write a commit message"
Options
| Flag | Description |
|---|---|
--config PATH |
Use a specific config file instead of searching the hierarchy |
--max-total-size SIZE |
Override the output size limit (e.g. 500, 10KB, 1MB) |
-v, --verbose |
Print diagnostics to stderr (git root, config file used, budget) |
Output format
=== Staged Changes Summary ===
Files: 8 total (4 modified, 2 added, 1 excluded, 1 submodule)
Config: /home/user/project/config.jsonc
--- File: src/foo.py [modified] ---
@@ -10,6 +10,8 @@
...
--- File: src/bar.py [modified] [formatting-only] ---
[no logic changes - formatting only]
--- File: chip.netlist.v [excluded] ---
[changed - excluded by rule]
--- File: data/blob.bin [binary] ---
[binary file changed]
--- File: include/new_header.h [added] ---
[new file - contents not shown]
--- Submodule: support/imported/socbuilder ---
Updated: b2ae0f8 -> 153b625
153b625 Update create_svd.py
bfee9c1 feat(latex): Add LaTeX table generation
=== End of Staged Changes (3842 chars) ===
Diagnostic messages (config warnings, fallback notices) go to stderr. The staged-changes summary goes to stdout, so it can be piped cleanly.
File handling
Added files
New files are reported as [added] with no content. Adding file content to a
commit message prompt is rarely useful and wastes context budget.
Deleted files
Reported as [deleted] with no diff shown.
Binary files
Detected via git diff --numstat (shows - - for binary). Reported as
[binary file changed].
Excluded files
Files matching an exclude pattern in the config are reported as
[changed - excluded by rule]. Useful for generated files, netlists, or
anything too noisy to be useful in a commit message.
Files exceeding max_file_size
Reported as [changed - file too large]. Default threshold: 200 MB.
Submodules
For each updated submodule, the output includes a git log --oneline of the
commits between the old and new hash. If the submodule is not initialized on
disk, a warning is printed to stderr and the section is skipped.
Smart formatters
Python (.py)
Runs black --quiet on temporary copies of both the old and new versions of
the file, then diffs the formatted results. If the formatted versions are
identical, the file is marked [formatting-only] and no diff is shown. This
suppresses the large diffs produced by black reformatting passes.
Falls back to generic formatting if black is not installed.
Verilog (.v, .sv)
Detects files using AUTO macros (AUTOARG, AUTOINPUT, AUTOOUTPUT,
AUTOINST, AUTOWIRE, AUTOREG, …). When found, runs
emacs --batch -f verilog-batch-delete-auto -f save-buffer on temporary copies
of both versions to delete the AUTO-generated sections before diffing.
This is the correct approach: diffing the hand-written source (before AUTO expansion) rather than the expanded output, which is order-dependent and produces spurious diffs even when nothing real changed.
Falls back to generic formatting if emacs is not installed or if no AUTO
macros are detected.
Generic (all other files)
Per-hunk whitespace normalization: strips leading/trailing whitespace and
collapses internal runs of spaces/tabs, then checks if normalized removed and
added lines are equal. Hunks that are whitespace-only are annotated
[formatting-only] inline in the diff.
Configuration
llm-commit-helper searches for config.jsonc starting from the current
working directory, walking up to the git root, then falling back to a global
location. The first file found wins.
Search order:
<cwd>/config.jsonc<cwd>/.llm-commit-helper/config.jsonc- Same two patterns repeated for each parent directory up to the git root
~/.config/llm-commit-helper/config.jsonc
To skip the search and use a specific file:
python -m llm_commit_helper --config /path/to/my-config.jsonc
Config file format
The file is JSONC — standard JSON with // line comments and trailing commas
allowed.
{
"version": 1,
"rules": {
// Glob patterns for files to suppress (report as 'changed' only)
"exclude": [
"sim/firmware_ctests/**",
"atpg/from_genus_1d-comp-sdf/chip.test_netlist.v",
"*.netlist.v"
],
// Files larger than this are suppressed (supports B, KB, MB, GB)
"max_file_size": "200MB",
// Total output budget; truncates with a summary when exceeded
"max_total_size": 20000,
}
}
Defaults (no config file)
| Setting | Default |
|---|---|
exclude |
[] (nothing excluded) |
max_file_size |
200MB |
max_total_size |
20000 (chars) |
Size values
max_file_size and max_total_size accept:
- Plain integers:
20000 - Suffixed strings:
200MB,20KB,1GB,4096B max_total_sizeon the CLI (--max-total-size) accepts the same formats
Output truncation
When the accumulated output exceeds max_total_size, remaining files are
listed by name only, followed by an [OUTPUT TRUNCATED] notice. The footer
always shows the actual character count of the output produced.
Running tests
cd support/local/llm-commit-helper
python -m pytest tests/ -v
# or
make test
Tests use pytest with unittest.mock for all subprocess calls — no git
repository or external tools required.
Project layout
llm_commit_helper/
├── __main__.py # python -m llm_commit_helper entry point
├── cli.py # argument parsing and main pipeline
├── config.py # JSONC loading and hierarchical config search
├── git_staged.py # staged file listing and classification
├── submodule.py # submodule log retrieval and formatting
├── diff_engine.py # difflib wrapper with formatting-only annotation
├── output.py # size-budgeted output assembly
├── utils.py # subprocess, size parsing, glob matching
└── formatters/
├── __init__.py # extension-based dispatcher
├── generic_fmt.py # whitespace normalization
├── python_fmt.py # black-based logic/formatting separation
└── verilog_fmt.py # emacs AUTO deletion
tests/
├── test_config.py
├── test_formatters.py
├── test_git_staged.py
├── test_output.py
└── test_submodule.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_commit_helper-0.1.0.tar.gz.
File metadata
- Download URL: llm_commit_helper-0.1.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e839f872b64c81f5da461543fa1bf91e93a594bb3617572429598c1bd97a069
|
|
| MD5 |
d2af15c958b5918557b72c8ea0ec50a0
|
|
| BLAKE2b-256 |
7917b9aa71f30d9c7cdfd27df3474e00c3f9596589a967f6dc7c3ec66d969510
|
Provenance
The following attestation bundles were made for llm_commit_helper-0.1.0.tar.gz:
Publisher:
publish.yml on rbarzic/llm-commit-helper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_commit_helper-0.1.0.tar.gz -
Subject digest:
5e839f872b64c81f5da461543fa1bf91e93a594bb3617572429598c1bd97a069 - Sigstore transparency entry: 1554390319
- Sigstore integration time:
-
Permalink:
rbarzic/llm-commit-helper@1e049994e2e8297d8848d996d203afaf2efb3a09 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/rbarzic
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1e049994e2e8297d8848d996d203afaf2efb3a09 -
Trigger Event:
release
-
Statement type:
File details
Details for the file llm_commit_helper-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_commit_helper-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f021ea3d7c0049713f897d5204e81387670297a15a694d819edc170c1619e71a
|
|
| MD5 |
475fa9ec067ac7cdf836db89ac8ef110
|
|
| BLAKE2b-256 |
b2a7b487c96e0aa3a65cd4dacefe19b1c562a434ee84b15e1530856921a2b04f
|
Provenance
The following attestation bundles were made for llm_commit_helper-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on rbarzic/llm-commit-helper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_commit_helper-0.1.0-py3-none-any.whl -
Subject digest:
f021ea3d7c0049713f897d5204e81387670297a15a694d819edc170c1619e71a - Sigstore transparency entry: 1554390375
- Sigstore integration time:
-
Permalink:
rbarzic/llm-commit-helper@1e049994e2e8297d8848d996d203afaf2efb3a09 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/rbarzic
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1e049994e2e8297d8848d996d203afaf2efb3a09 -
Trigger Event:
release
-
Statement type: