MCP server for semantic code search using local embeddings
Project description
Semantic Search MCP Server
An MCP server that provides semantic code search using local embeddings. Search your codebase with natural language queries like "authentication middleware" or "database connection pooling".
Features
- Hybrid search: Combines vector similarity (Jina code embeddings) with FTS5 keyword matching using Reciprocal Rank Fusion
- 30+ languages: Tree-sitter parsing for Python, TypeScript, JavaScript, Go, Rust, Java, C/C++, Ruby, PHP, and more
- Incremental indexing: File watcher automatically detects additions, modifications, and deletions
- Respects .gitignore: Honors your project's
.gitignorefiles (including nested ones) - Auto-initialization: Model loads and codebase indexes in the background on server startup
- Zero external APIs: All embeddings generated locally with FastEmbed
Installation
uvx install semantic-search-mcp
Or with pip:
pip install semantic-search-mcp
Quick Start
Add to Claude Code
Option A: Project-level config
Create .mcp.json in your project root:
{
"mcpServers": {
"semantic-search": {
"command": "uvx",
"args": ["semantic-search-mcp"]
}
}
}
Option B: CLI
claude mcp add semantic-search -- uvx semantic-search-mcp
Use
The server auto-initializes on startup. Available tools:
search_code- Search with natural language queriesinitialize- Force re-index if neededreindex_file- Manually reindex a specific file
How It Works
Indexing
On startup, the server:
- Scans your codebase for supported file types
- Parses code into semantic chunks (functions, classes, methods) using Tree-sitter
- Generates embeddings for each chunk using Jina's code embedding model
- Stores everything in a local SQLite database with vector search support
File Watching
The server monitors your codebase for changes in real-time:
| Event | Action |
|---|---|
| File created | Parsed, embedded, and added to index |
| File modified | Re-indexed if content hash changed |
| File deleted | Removed from index |
Changes are debounced (default 1s) to batch rapid modifications.
What Gets Indexed
Included:
- Files with code extensions:
.py,.js,.ts,.tsx,.jsx,.go,.rs,.java,.c,.cpp,.h,.rb,.php,.swift,.kt,.scala, and more
Excluded:
- Files matching
.gitignorepatterns (all.gitignorefiles in your project are respected) - Common non-code directories:
node_modules,__pycache__,.venv,build,dist,.git,vendor, etc. - Binary files and non-code file types
Configuration
Environment variables:
| Variable | Default | Description |
|---|---|---|
SEMANTIC_SEARCH_DB_PATH |
.semantic-search/index.db |
Index database location |
SEMANTIC_SEARCH_EMBEDDING_MODEL |
jinaai/jina-embeddings-v2-base-code |
Embedding model |
SEMANTIC_SEARCH_MIN_SCORE |
0.3 |
Minimum relevance threshold (0-1) |
SEMANTIC_SEARCH_DEBOUNCE_MS |
1000 |
File watcher debounce in milliseconds |
Requirements
- Python 3.11+
- ~700MB disk for embedding model (downloaded on first run)
- ~1GB RAM for embedding model
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semantic_search_mcp-0.1.0.tar.gz.
File metadata
- Download URL: semantic_search_mcp-0.1.0.tar.gz
- Upload date:
- Size: 29.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1454a179f8051b16c81da7bc19ce6e7b2b7a9fab7a148a4cce2206f05fefe3c
|
|
| MD5 |
87fd08e397a2b4cc204cebaae11c316f
|
|
| BLAKE2b-256 |
6d521e7514eb80e5ad5b0d37bbadf1dc16856258b40133b37244dd2ba5e5855c
|
File details
Details for the file semantic_search_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: semantic_search_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75aba1656da90302a6cc049a746e4e3bc04f25f7f5fc7b33cc9e11fe82745492
|
|
| MD5 |
3ea375a9c2a02e7ba94b8f4f550b0499
|
|
| BLAKE2b-256 |
3f54233207a2d44fe02fe77d7dc769f77067df3392c3ec7853b3265e7a2ea2aa
|