MCP server: semantic code search with SQLite + local free embeddings
Project description
semantic-code-index-mcp
MCP server for Claude Code: semantic code search with SQLite + local embeddings (free, no API key needed).
Instead of reading your entire codebase, Claude searches semantically — finding relevant code by meaning, not just keywords. Saves 80-90% tokens per query.
Quick Install (npx)
cd /path/to/your/project
npx semantic-code-index-mcp install
This automatically:
- Creates
.claude/mcp.jsonand.mcp.json(merged with existing config) - Adds
.claude/rules/semantic-search.mdso Claude prefers semantic search - Updates
.gitignore
Requires Python 3.11+ and uv (brew install uv or pip install uv).
Uninstall
npx semantic-code-index-mcp uninstall
Cleanly removes all config. If you have other MCP servers configured, they are preserved.
Install from source (dev)
git clone https://github.com/thinhdo/semantic-code-index-mcp
cd semantic-code-index-mcp
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
# Install into a project using local binary
semantic-code-index-mcp install /path/to/your/project
# Or via npx with --local flag
npx semantic-code-index-mcp install /path/to/project --local .venv/bin/semantic-code-index-mcp
Usage
Once installed, open Claude Code in your project. First time, ask Claude:
Index this project
After that, Claude will automatically use semantic_search for code exploration. The index auto-syncs when files change — no manual steps needed.
Tools
| Tool | Description |
|---|---|
index_project |
Full re-index of the codebase |
sync_index |
Incremental sync (new/changed/deleted files only) |
semantic_search |
Hybrid search: semantic vectors + keyword BM25. Auto-syncs before searching |
list_indexed_files |
List all indexed files with token count and chunk count |
get_file_chunks |
Get full content of a file's indexed chunks |
token_usage_stats |
Compare: tokens if reading full repo vs tokens used by searches |
search_log |
View recent search history with token usage and savings |
How it works
- Chunking — source files are split into overlapping chunks (~100 lines, 15-line overlap)
- Embedding — each chunk is vectorized locally using
fastembed(BAAI/bge-small-en-v1.5, ONNX) - Storage — vectors + metadata stored in SQLite (at
~/.cache/semantic-code-index/<hash>/) - Search — hybrid retrieval: cosine similarity + FTS5 BM25, fused with Reciprocal Rank Fusion
- Auto-sync — on each search, changed files are detected and re-indexed automatically
Token savings
Each search returns only relevant snippets instead of the full repo. Example on a ~10k token repo:
| Query | Result tokens | Full repo | Saved |
|---|---|---|---|
| "how does embedding work" | 1,569 | 9,577 | 83% |
| "install setup" | 925 | 9,577 | 90% |
| "chunking strategy" | 1,331 | 9,577 | 86% |
On larger repos (100k+ tokens), savings are even more significant.
Environment variables
SEMANTIC_CODE_ROOTorWORKSPACE_ROOT: root directory of the project to index (default: MCP server's working directory)
Notes
- Token counting uses
tiktokenencodingcl100k_base(approximate for Claude/GPT-4), not actual billing - First run downloads the ONNX embedding model (~30MB)
- Vector search scans all chunks in SQLite; very large repos may need scaling (sqlite-vec / ANN)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semantic_code_index_mcp-0.2.3.tar.gz.
File metadata
- Download URL: semantic_code_index_mcp-0.2.3.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6112006f355873ade9a0260f988604f47a0fa106d553ec2354116d1964ac93f9
|
|
| MD5 |
ccec2e03daffebd680e333f99a7bacbe
|
|
| BLAKE2b-256 |
bd56fd930420e2e80cd89d3c2acaf49d7d86212c051d626fe200a5b59c7927f1
|
File details
Details for the file semantic_code_index_mcp-0.2.3-py3-none-any.whl.
File metadata
- Download URL: semantic_code_index_mcp-0.2.3-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
174dbbead5611d67b23203dbbac5fe525ae4b68f3affe0bf1a17a6b84b8372fb
|
|
| MD5 |
ceac020fb79beef176d4337c6cce3af9
|
|
| BLAKE2b-256 |
21481aaca607ea4f7e5c43ce5ef6a596a21ef45ffdf2974c224d8b6ed1d3ae16
|