Skip to main content

Local semantic search — embedding-powered grep for files, zero external services.

Project description

embgrep

한국어 문서

Local semantic search — embedding-powered grep for files, zero external services.

PyPI Python License: MIT

Search your codebase and documentation by meaning, not just keywords. embgrep indexes files into local embeddings and lets you run semantic queries — no API keys, no cloud services, no vector database servers.

Features

  • Local embeddings — Uses fastembed (ONNX Runtime), no API keys needed
  • SQLite storage — Single-file index, no external vector DB
  • Incremental indexing — Only re-indexes changed files (SHA-256 hash comparison)
  • Smart chunking — Function-level splitting for code, heading-level for docs
  • MCP native — 4-tool FastMCP server for LLM agent integration
  • 15+ file types.py, .js, .ts, .java, .go, .rs, .md, .txt, .yaml, .json, .toml, and more

Install

pip install embgrep              # core (fastembed + numpy)
pip install embgrep[cli]         # + click/rich CLI
pip install embgrep[mcp]         # + FastMCP server
pip install embgrep[all]         # everything

Quick Start

Python API

from embgrep import EmbGrep

eg = EmbGrep()

# Index a directory
eg.index("./my-project", patterns=["*.py", "*.md"])

# Semantic search
results = eg.search("database connection pooling", top_k=5)
for r in results:
    print(f"{r.file_path}:{r.line_start}-{r.line_end} (score: {r.score:.4f})")
    print(f"  {r.chunk_text[:80]}...")

# Incremental update (only changed files)
eg.update()

# Index statistics
status = eg.status()
print(f"{status.total_files} files, {status.total_chunks} chunks, {status.index_size_mb} MB")

eg.close()

CLI

# Index a project
embgrep index ./my-project --patterns "*.py,*.md"

# Search
embgrep search "error handling patterns"

# Filter by file type
embgrep search "async database query" --path-filter "%.py"

# Check status
embgrep status

# Update changed files
embgrep update

Convenience functions

import embgrep

embgrep.index("./src")
results = embgrep.search("authentication middleware")
status = embgrep.status()
embgrep.update()

MCP Server

Add to your Claude Desktop / MCP client configuration:

{
  "mcpServers": {
    "embgrep": {
      "command": "embgrep-mcp"
    }
  }
}

Or with uvx:

{
  "mcpServers": {
    "embgrep": {
      "command": "uvx",
      "args": ["--from", "embgrep[mcp]", "embgrep-mcp"]
    }
  }
}

MCP Tools

Tool Description
index_directory Index files in a directory for semantic search
semantic_search Search indexed files using natural language
index_status Get current index statistics
update_index Incremental update — re-index changed files only

How It Works

  1. Chunking — Files are split into semantically meaningful chunks:

    • Code files (.py, .js, .ts, etc.): split by function/class boundaries
    • Documents (.md, .txt): split by headings or paragraph breaks
    • Config files: fixed-size chunking
  2. Embedding — Each chunk is converted to a 384-dimensional vector using BGE-small-en-v1.5 via ONNX Runtime (no PyTorch needed)

  3. Storage — Embeddings are stored as BLOBs in a local SQLite database

  4. Search — Query text is embedded and compared against all chunks using cosine similarity

Configuration

Parameter Default Description
db_path ~/.local/share/embgrep/embgrep.db SQLite database location
model BAAI/bge-small-en-v1.5 fastembed model name
max_chunk_size 1000 chars Maximum chunk size for fixed-size splitting
top_k 5 Number of search results

QuartzUnit Ecosystem

Package Description
markgrab HTML/YouTube/PDF/DOCX to LLM-ready markdown
snapgrab URL to screenshot + metadata
docpick OCR + LLM document structure extraction
browsegrab Local LLM browser agent
feedkit RSS feed collection + MCP
embgrep Local semantic search for files

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embgrep-0.1.1.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embgrep-0.1.1-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file embgrep-0.1.1.tar.gz.

File metadata

  • Download URL: embgrep-0.1.1.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embgrep-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8d48ad0c5c5e67ee178c3744866cdde39589835d74a0fc2182cdf5e668328bc3
MD5 45abbf6a19e57683ff26a9fe79236b36
BLAKE2b-256 8e984b76bb54df2162c6b2d0894659bef542d573d7b0d7578137510ff3dd0a27

See more details on using hashes here.

Provenance

The following attestation bundles were made for embgrep-0.1.1.tar.gz:

Publisher: publish.yml on QuartzUnit/embgrep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file embgrep-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: embgrep-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embgrep-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6e05921223f884c57b2bd3a1483083769b5b627007df2c025b35577d98707064
MD5 5083244815725f8a781be79474bde703
BLAKE2b-256 91d022517f4caeb216124b0f8be45a5e6a98946e7e8ac8a262095cb6abc19abb

See more details on using hashes here.

Provenance

The following attestation bundles were made for embgrep-0.1.1-py3-none-any.whl:

Publisher: publish.yml on QuartzUnit/embgrep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page