Semantic search for Git commit history, powered by TurboQuant vector compression
Project description
CommitMind
Semantic search for Git commit history, powered by TurboQuant vector compression (ICLR 2026).
Stop searching by keywords. Search by meaning.
The Problem
# Current: keyword matching only
git log --grep="memory leak" # Only finds commits with exact text "memory leak"
# Misses: "fix kfree_skb double free"
# Misses: "plug UAF in reset path"
# Misses: "resolve dangling pointer"
The Solution
# CommitMind: semantic search
commitmind search "memory leak"
# >> #1 [0.94] a3f2c1d Fix kfree_skb double free in netfilter
# >> #2 [0.91] b7e4a2f Plug use-after-free in device reset path
# >> #3 [0.87] c9d1b3e Resolve dangling pointer in slab allocator
CommitMind understands the meaning of your query and finds semantically related commits - even when the exact words don't match.
How It Works
Git commits --> Sentence embeddings --> TurboQuant compression --> Semantic search
(all-MiniLM-L6-v2) (7.6x compression) (asymmetric scoring)
- Extract commit messages + file change metadata from git history
- Embed each commit into a 384-dimensional vector (local model, no API needed)
- Compress vectors with TurboQuant (Google's ICLR 2026 algorithm) - 87% memory savings
- Search using asymmetric inner-product estimation (no decompression needed)
Installation
pip install commitmind
Or install from source:
git clone https://github.com/wjddusrb03/commitmind.git
cd commitmind
pip install -e ".[dev]"
Quick Start
# 1. Index your repository
cd your-project
commitmind index
# Output:
# Indexing complete!
# > 3,842 commits indexed
# > Compressed: 18.2 MB -> 2.4 MB (7.6x)
# > Saved to .commitmind/index.pkl
# 2. Search by meaning
commitmind search "authentication bug fix"
# 3. View stats
commitmind stats
CLI Commands
| Command | Description |
|---|---|
commitmind index |
Index commits with TurboQuant compression |
commitmind search "query" |
Semantic search over commits |
commitmind stats |
Show index statistics |
commitmind update |
Add new commits to existing index |
Options
# Index with options
commitmind index --max-commits 1000 # Limit to recent 1000 commits
commitmind index --branch main # Index specific branch
commitmind index --bits 2 # Use 2-bit quantization (more compression)
# Search with options
commitmind search "query" -k 10 # Return top 10 results
Use Cases
- New team member: "What authentication changes were made recently?"
- Bug tracking: "Find commits related to network timeout issues"
- Security audit: "Show all SQL injection related fixes"
- Code archaeology: Search Linux kernel's 1M+ commits by meaning
- Cross-language: Search English commits with Korean queries (and vice versa)
Memory Efficiency
Thanks to TurboQuant compression:
| Commits | Uncompressed | CommitMind | Savings |
|---|---|---|---|
| 1,000 | 1.5 MB | 0.2 MB | 87% |
| 10,000 | 15 MB | 2.0 MB | 87% |
| 100,000 | 150 MB | 20 MB | 87% |
| 1,000,000 | 1.5 GB | 200 MB | 87% |
How TurboQuant Works
CommitMind uses TurboQuant (Google Research, ICLR 2026):
- PolarQuant: Random orthogonal rotation + Lloyd-Max scalar quantization (3-bit)
- QJL: Quantized Johnson-Lindenstrauss residual correction (1-bit)
- Asymmetric scoring: Compute similarity WITHOUT decompressing vectors
This achieves ~7.6x compression with minimal accuracy loss.
Requirements
- Python 3.9+
- Git repository
- CPU only (no GPU required)
- ~500 MB disk for embedding model (downloaded once)
Contributing
Issues and pull requests are welcome! If you find a bug or have suggestions, please open an issue.
License
MIT License
Citation
If you use CommitMind in your research:
@software{commitmind2026,
title={CommitMind: Semantic Git Commit Search with TurboQuant Compression},
author={wjddusrb03},
year={2026},
url={https://github.com/wjddusrb03/commitmind}
}
Related
- langchain-turboquant - LangChain VectorStore with TurboQuant compression
- TurboQuant paper - Original ICLR 2026 paper by Google Research
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file commitmind-0.1.0.tar.gz.
File metadata
- Download URL: commitmind-0.1.0.tar.gz
- Upload date:
- Size: 36.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1197da59c9b95e35d6b7e0d3701dcae262c469a7a7cb0b768a2e83ccc653ef33
|
|
| MD5 |
72d59ab724c6c7291d556d9a40057425
|
|
| BLAKE2b-256 |
6bf2d90445acefed2653a5b72f9a989812a59a9fa1e979ca2f29cc09db73861c
|
File details
Details for the file commitmind-0.1.0-py3-none-any.whl.
File metadata
- Download URL: commitmind-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ffc2c82d1e6143018dde1a3ed10beb3b95c806e7b52fe08849ed1d10bcd7a04
|
|
| MD5 |
3b4d94f32df4a5f3ac1d08575a21418d
|
|
| BLAKE2b-256 |
303c47dc8af7ac3fb9915e3e608e4d26e64de12b3c74601f8b1f99590b112cc4
|