Skip to main content

Semantic search for Git commit history, powered by TurboQuant vector compression

Project description

CommitMind

Semantic search for Git commit history, powered by TurboQuant vector compression (ICLR 2026).

Stop searching by keywords. Search by meaning.

PyPI version Python 3.9+ License: MIT

The Problem

# Current: keyword matching only
git log --grep="memory leak"     # Only finds commits with exact text "memory leak"
                                  # Misses: "fix kfree_skb double free"
                                  # Misses: "plug UAF in reset path"
                                  # Misses: "resolve dangling pointer"

The Solution

# CommitMind: semantic search
commitmind search "memory leak"
# >> #1 [0.94] a3f2c1d  Fix kfree_skb double free in netfilter
# >> #2 [0.91] b7e4a2f  Plug use-after-free in device reset path
# >> #3 [0.87] c9d1b3e  Resolve dangling pointer in slab allocator

CommitMind understands the meaning of your query and finds semantically related commits - even when the exact words don't match.

How It Works

Git commits --> Sentence embeddings --> TurboQuant compression --> Semantic search
                (all-MiniLM-L6-v2)      (7.6x compression)       (asymmetric scoring)
  1. Extract commit messages + file change metadata from git history
  2. Embed each commit into a 384-dimensional vector (local model, no API needed)
  3. Compress vectors with TurboQuant (Google's ICLR 2026 algorithm) - 87% memory savings
  4. Search using asymmetric inner-product estimation (no decompression needed)

Installation

pip install commitmind

Or install from source:

git clone https://github.com/wjddusrb03/commitmind.git
cd commitmind
pip install -e ".[dev]"

Quick Start

# 1. Index your repository
cd your-project
commitmind index

# Output:
# Indexing complete!
#   > 3,842 commits indexed
#   > Compressed: 18.2 MB -> 2.4 MB (7.6x)
#   > Saved to .commitmind/index.pkl

# 2. Search by meaning
commitmind search "authentication bug fix"

# 3. View stats
commitmind stats

CLI Commands

Command Description
commitmind index Index commits with TurboQuant compression
commitmind search "query" Semantic search over commits
commitmind stats Show index statistics
commitmind update Add new commits to existing index

Options

# Index with options
commitmind index --max-commits 1000    # Limit to recent 1000 commits
commitmind index --branch main         # Index specific branch
commitmind index --bits 2              # Use 2-bit quantization (more compression)

# Search with options
commitmind search "query" -k 10        # Return top 10 results

Use Cases

  • New team member: "What authentication changes were made recently?"
  • Bug tracking: "Find commits related to network timeout issues"
  • Security audit: "Show all SQL injection related fixes"
  • Code archaeology: Search Linux kernel's 1M+ commits by meaning
  • Cross-language: Search English commits with Korean queries (and vice versa)

Memory Efficiency

Thanks to TurboQuant compression:

Commits Uncompressed CommitMind Savings
1,000 1.5 MB 0.2 MB 87%
10,000 15 MB 2.0 MB 87%
100,000 150 MB 20 MB 87%
1,000,000 1.5 GB 200 MB 87%

How TurboQuant Works

CommitMind uses TurboQuant (Google Research, ICLR 2026):

  1. PolarQuant: Random orthogonal rotation + Lloyd-Max scalar quantization (3-bit)
  2. QJL: Quantized Johnson-Lindenstrauss residual correction (1-bit)
  3. Asymmetric scoring: Compute similarity WITHOUT decompressing vectors

This achieves ~7.6x compression with minimal accuracy loss.

Requirements

  • Python 3.9+
  • Git repository
  • CPU only (no GPU required)
  • ~500 MB disk for embedding model (downloaded once)

Contributing

Issues and pull requests are welcome! If you find a bug or have suggestions, please open an issue.

License

MIT License

Citation

If you use CommitMind in your research:

@software{commitmind2026,
  title={CommitMind: Semantic Git Commit Search with TurboQuant Compression},
  author={wjddusrb03},
  year={2026},
  url={https://github.com/wjddusrb03/commitmind}
}

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

commitmind-0.1.0.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

commitmind-0.1.0-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file commitmind-0.1.0.tar.gz.

File metadata

  • Download URL: commitmind-0.1.0.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for commitmind-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1197da59c9b95e35d6b7e0d3701dcae262c469a7a7cb0b768a2e83ccc653ef33
MD5 72d59ab724c6c7291d556d9a40057425
BLAKE2b-256 6bf2d90445acefed2653a5b72f9a989812a59a9fa1e979ca2f29cc09db73861c

See more details on using hashes here.

File details

Details for the file commitmind-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: commitmind-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for commitmind-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ffc2c82d1e6143018dde1a3ed10beb3b95c806e7b52fe08849ed1d10bcd7a04
MD5 3b4d94f32df4a5f3ac1d08575a21418d
BLAKE2b-256 303c47dc8af7ac3fb9915e3e608e4d26e64de12b3c74601f8b1f99590b112cc4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page