Skip to main content

Search git commit messages by semantic similarity with embeddings from sentence-transformers.

Project description

git-semantic-similarity

Search git commit messages by semantic similarity with embeddings from sentence-transformers.

Embeddings are generated locally, can be stored on disk for faster reuse, and can be checked into git for sharing embeddings with other users.

$ gitsem "project scaffolding"

Commit 403836d2ee4900579b0d1e8169dd4bfebddab0ba
Author: Foo Bar <foo@bar.com>
Date:   2024-09-23 19:08:05
Similarity: 0.2299

    Change model, add src folder

Commit d2909a8ec352a881ab05cab8b8a67038b063f37a
Author: Foo Bar <foo@bar.com>
Date:   2024-09-23 19:08:05
Similarity: 0.2086

    Initial commit

...

Commit a09923166072aca4910e92272ef161e3398b1d89
Author: Foo Bar <foo@bar.com>
Date:   2024-09-23 19:08:05
Similarity: -0.0716

    Remove buggy rounding

Installation

First, install pipx. Then, install with pipx:

pipx install git-semantic-similarity

Usage

In a git repository, run: gitsem "query string"

To only show the 10 most relevant commits:

gitsem "changes to project documentation" -n 10

To use another pretrained model, for example a smaller and faster model:

gitsem "user service refactoring" --model sentence-transformers/all-MiniLM-L6-v2

A list of supported models can be found here

The tool supports forwarding arguments to git rev-list For example, to only search in the 10 most recent commits:

gitsem "query string" -- -n 10

Or to filter by a specific author:

gitsem "query string" -- --author bob

Or you can format the output in a single line for further shell processing:

gitsem "query string" --sort False --oneline -- n 100 | sort -n -r | head -n 10

Arguments

  • -m, --model [STRING]:
    A sentence-transformers model to use for embeddings. Default is all-mpnet-base-v2.

  • --model-args [STRING]:
    Additional arguments for SentenceTransformers model initialization in format: key1=value1,key2=value2. For example: truncate_dim=256,trust_remote_code=true

  • -c, --cache [BOOLEAN]:
    Whether to cache commit embeddings on disk for faster retrieval. Default is True.

  • --cache-dir [PATH]:
    Directory to store cached embeddings. If not specified, defaults to git_root/.gitsem/model_name.

  • --oneline:
    Use a concise output format.

  • --sort [BOOLEAN]:
    Sort results by similarity score. Default is True.

  • -n, --max-count [INTEGER]:
    Limit the number of results displayed. If not provided, no limit is applied.

  • -b, --batch-size [INTEGER]:
    Batch size for embedding commits. Default is 100.

  • query [STRING]:
    The query string to compare against commit messages.

  • git_args [STRING...]:
    Arguments after -- will be forwarded to git rev-list.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

git_semantic_similarity-1.0.7.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

git_semantic_similarity-1.0.7-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file git_semantic_similarity-1.0.7.tar.gz.

File metadata

  • Download URL: git_semantic_similarity-1.0.7.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for git_semantic_similarity-1.0.7.tar.gz
Algorithm Hash digest
SHA256 65a2edce83ff824595eba082918164d8f0a4fe02e9056ff92c2e8be6769c4836
MD5 0775a792b6e85fb24d9493c7e7e6c900
BLAKE2b-256 c0db3f2bf211524732d425a1e427c15dfcf23b2e9971fafd266686f897b9fba3

See more details on using hashes here.

Provenance

The following attestation bundles were made for git_semantic_similarity-1.0.7.tar.gz:

Publisher: build-and-deploy.yml on adrianmfi/git-semantic-similarity

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file git_semantic_similarity-1.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for git_semantic_similarity-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 fae33e92c9f0752e6090cd213f7f2d72e8b5e6e6b89c126cc65fbae5346f2b7d
MD5 e537970226ae00279da8e023514cf97d
BLAKE2b-256 bc6054fdf60ca4180a680b4ca0ccea5f228769c04db11ce483a5591423c92898

See more details on using hashes here.

Provenance

The following attestation bundles were made for git_semantic_similarity-1.0.7-py3-none-any.whl:

Publisher: build-and-deploy.yml on adrianmfi/git-semantic-similarity

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page