Search git commit messages by semantic similarity with embeddings from sentence-transformers.
Project description
git-semantic-similarity
Search git commit messages by semantic similarity with embeddings from sentence-transformers.
Embeddings are stored on disk for faster retrieval, and can easily be checked into git.
$ gitsem "project scaffolding"
Commit 403836d2ee4900579b0d1e8169dd4bfebddab0ba
Author: Foo Bar <foo@bar.com>
Date: 2024-09-23 19:08:05
Similarity: 0.2299
Change model, add src folder
Commit d2909a8ec352a881ab05cab8b8a67038b063f37a
Author: Foo Bar <foo@bar.com>
Date: 2024-09-23 19:08:05
Similarity: 0.2086
Initial commit
...
Commit a09923166072aca4910e92272ef161e3398b1d89
Author: Foo Bar <foo@bar.com>
Date: 2024-09-23 19:08:05
Similarity: -0.0716
Remove buggy rounding
Installation
First, install pipx. Then, install with pipx:
pipx install git-semantic-similarity
Usage
In a git repository, run:
gitsem "query string"
To only show the 10 most relevant commits:
gitsem "changes to project documentation" -n 10
To use another pretrained model, for example a smaller and faster model:
gitsem "user service refactoring" --model sentence-transformers/all-MiniLM-L6-v2
A list of supported models can be found here
The tool supports forwarding arguments to git rev-list
For example, to only search in the 10 most recent commits:
gitsem "query string" -- -n 10
Or to filter by a specific author:
gitsem "query string" -- --author bob
Or you can format the output in a single line for further shell processing:
gitsem "query string" --sort False --oneline -- n 100 | sort -n -r | head -n 10
Arguments
-
-m, --model [STRING]:
A sentence-transformers model to use for embeddings. Default isall-mpnet-base-v2. -
-c, --cache [BOOLEAN]:
Whether to cache commit embeddings on disk for faster retrieval. Default isTrue. -
--cache-dir [PATH]:
Directory to store cached embeddings. If not specified, defaults togit_root/.git_semsim/model_name. -
--oneline:
Use a concise output format. -
--sort [BOOLEAN]:
Sort results by similarity score. Default isTrue. -
-n, --max-count [INTEGER]:
Limit the number of results displayed. If not provided, no limit is applied. -
-b, --batch-size [INTEGER]:
Batch size for embedding commits. Default is100. -
query [STRING]:
The query string to compare against commit messages. -
git_args [STRING...]:
Arguments after--will be forwarded togit rev-list.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file git_semantic_similarity-1.0.4.tar.gz.
File metadata
- Download URL: git_semantic_similarity-1.0.4.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97f06131e46abba99622261fdd708cbccb81f85b1a20aaed4b5385a501d7d842
|
|
| MD5 |
b91f7a2b7578dd866aea467ff4f81703
|
|
| BLAKE2b-256 |
0a0f59b273876d9621c3136660433ae146714ee8a096c0e42c093c52d1664f8c
|
Provenance
The following attestation bundles were made for git_semantic_similarity-1.0.4.tar.gz:
Publisher:
build-and-deploy.yml on adrianmfi/git-semantic-similarity
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
git_semantic_similarity-1.0.4.tar.gz -
Subject digest:
97f06131e46abba99622261fdd708cbccb81f85b1a20aaed4b5385a501d7d842 - Sigstore transparency entry: 145913482
- Sigstore integration time:
-
Permalink:
adrianmfi/git-semantic-similarity@23aa24b0f046738367b2541cce2732ccb495c4df -
Branch / Tag:
refs/tags/v.1.0.4 - Owner: https://github.com/adrianmfi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-and-deploy.yml@23aa24b0f046738367b2541cce2732ccb495c4df -
Trigger Event:
release
-
Statement type:
File details
Details for the file git_semantic_similarity-1.0.4-py3-none-any.whl.
File metadata
- Download URL: git_semantic_similarity-1.0.4-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
194d2307de6667f4f57564a1d076977bfd2d8c6a317f2d66c294780601219765
|
|
| MD5 |
bc95a4b2e85b2e7bcb68b57c386d53fe
|
|
| BLAKE2b-256 |
d35e3d3682c56ecbbd6bf526a595d0cc0fbb20122b511b16f0244b04f0237641
|
Provenance
The following attestation bundles were made for git_semantic_similarity-1.0.4-py3-none-any.whl:
Publisher:
build-and-deploy.yml on adrianmfi/git-semantic-similarity
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
git_semantic_similarity-1.0.4-py3-none-any.whl -
Subject digest:
194d2307de6667f4f57564a1d076977bfd2d8c6a317f2d66c294780601219765 - Sigstore transparency entry: 145913484
- Sigstore integration time:
-
Permalink:
adrianmfi/git-semantic-similarity@23aa24b0f046738367b2541cce2732ccb495c4df -
Branch / Tag:
refs/tags/v.1.0.4 - Owner: https://github.com/adrianmfi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-and-deploy.yml@23aa24b0f046738367b2541cce2732ccb495c4df -
Trigger Event:
release
-
Statement type: