Local semantic search — embedding-powered grep for files, zero external services.
Project description
embgrep
Local semantic search — embedding-powered grep for files, zero external services.
Search your codebase and documentation by meaning, not just keywords. embgrep indexes files into local embeddings and lets you run semantic queries — no API keys, no cloud services, no vector database servers.
Features
- Local embeddings — Uses fastembed (ONNX Runtime), no API keys needed
- SQLite storage — Single-file index, no external vector DB
- Incremental indexing — Only re-indexes changed files (SHA-256 hash comparison)
- Smart chunking — Function-level splitting for code, heading-level for docs
- MCP native — 4-tool FastMCP server for LLM agent integration
- 15+ file types —
.py,.js,.ts,.java,.go,.rs,.md,.txt,.yaml,.json,.toml, and more
Install
pip install embgrep # core (fastembed + numpy)
pip install embgrep[cli] # + click/rich CLI
pip install embgrep[mcp] # + FastMCP server
pip install embgrep[all] # everything
Quick Start
Python API
from embgrep import EmbGrep
eg = EmbGrep()
# Index a directory
eg.index("./my-project", patterns=["*.py", "*.md"])
# Semantic search
results = eg.search("database connection pooling", top_k=5)
for r in results:
print(f"{r.file_path}:{r.line_start}-{r.line_end} (score: {r.score:.4f})")
print(f" {r.chunk_text[:80]}...")
# Incremental update (only changed files)
eg.update()
# Index statistics
status = eg.status()
print(f"{status.total_files} files, {status.total_chunks} chunks, {status.index_size_mb} MB")
eg.close()
CLI
# Index a project
embgrep index ./my-project --patterns "*.py,*.md"
# Search
embgrep search "error handling patterns"
# Filter by file type
embgrep search "async database query" --path-filter "%.py"
# Check status
embgrep status
# Update changed files
embgrep update
Convenience functions
import embgrep
embgrep.index("./src")
results = embgrep.search("authentication middleware")
status = embgrep.status()
embgrep.update()
MCP Server
Add to your Claude Desktop / MCP client configuration:
{
"mcpServers": {
"embgrep": {
"command": "embgrep-mcp"
}
}
}
Or with uvx:
{
"mcpServers": {
"embgrep": {
"command": "uvx",
"args": ["--from", "embgrep[mcp]", "embgrep-mcp"]
}
}
}
MCP Tools
| Tool | Description |
|---|---|
index_directory |
Index files in a directory for semantic search |
semantic_search |
Search indexed files using natural language |
index_status |
Get current index statistics |
update_index |
Incremental update — re-index changed files only |
How It Works
-
Chunking — Files are split into semantically meaningful chunks:
- Code files (
.py,.js,.ts, etc.): split by function/class boundaries - Documents (
.md,.txt): split by headings or paragraph breaks - Config files: fixed-size chunking
- Code files (
-
Embedding — Each chunk is converted to a 384-dimensional vector using BGE-small-en-v1.5 via ONNX Runtime (no PyTorch needed)
-
Storage — Embeddings are stored as BLOBs in a local SQLite database
-
Search — Query text is embedded and compared against all chunks using cosine similarity
Configuration
| Parameter | Default | Description |
|---|---|---|
db_path |
~/.local/share/embgrep/embgrep.db |
SQLite database location |
model |
BAAI/bge-small-en-v1.5 |
fastembed model name |
max_chunk_size |
1000 chars | Maximum chunk size for fixed-size splitting |
top_k |
5 | Number of search results |
QuartzUnit Ecosystem
| Package | Description |
|---|---|
| markgrab | HTML/YouTube/PDF/DOCX to LLM-ready markdown |
| snapgrab | URL to screenshot + metadata |
| docpick | OCR + LLM document structure extraction |
| browsegrab | Local LLM browser agent |
| feedkit | RSS feed collection + MCP |
| embgrep | Local semantic search for files |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embgrep-0.1.1.tar.gz.
File metadata
- Download URL: embgrep-0.1.1.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d48ad0c5c5e67ee178c3744866cdde39589835d74a0fc2182cdf5e668328bc3
|
|
| MD5 |
45abbf6a19e57683ff26a9fe79236b36
|
|
| BLAKE2b-256 |
8e984b76bb54df2162c6b2d0894659bef542d573d7b0d7578137510ff3dd0a27
|
Provenance
The following attestation bundles were made for embgrep-0.1.1.tar.gz:
Publisher:
publish.yml on QuartzUnit/embgrep
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embgrep-0.1.1.tar.gz -
Subject digest:
8d48ad0c5c5e67ee178c3744866cdde39589835d74a0fc2182cdf5e668328bc3 - Sigstore transparency entry: 1178846587
- Sigstore integration time:
-
Permalink:
QuartzUnit/embgrep@d73e058c3b8a80eba393501bf82e1c795ed265c4 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/QuartzUnit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d73e058c3b8a80eba393501bf82e1c795ed265c4 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file embgrep-0.1.1-py3-none-any.whl.
File metadata
- Download URL: embgrep-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e05921223f884c57b2bd3a1483083769b5b627007df2c025b35577d98707064
|
|
| MD5 |
5083244815725f8a781be79474bde703
|
|
| BLAKE2b-256 |
91d022517f4caeb216124b0f8be45a5e6a98946e7e8ac8a262095cb6abc19abb
|
Provenance
The following attestation bundles were made for embgrep-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on QuartzUnit/embgrep
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embgrep-0.1.1-py3-none-any.whl -
Subject digest:
6e05921223f884c57b2bd3a1483083769b5b627007df2c025b35577d98707064 - Sigstore transparency entry: 1178846590
- Sigstore integration time:
-
Permalink:
QuartzUnit/embgrep@d73e058c3b8a80eba393501bf82e1c795ed265c4 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/QuartzUnit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d73e058c3b8a80eba393501bf82e1c795ed265c4 -
Trigger Event:
workflow_dispatch
-
Statement type: