Skip to main content

MCP server: semantic code search with SQLite + local free embeddings

Project description

semantic-code-index-mcp

MCP server for Claude Code: semantic code search with SQLite + local embeddings (free, no API key needed).

Instead of reading your entire codebase, Claude searches semantically — finding relevant code by meaning, not just keywords. Saves 80-90% tokens per query.

Quick Install (npx)

cd /path/to/your/project
npx semantic-code-index-mcp install

This automatically:

  • Creates .claude/mcp.json and .mcp.json (merged with existing config)
  • Adds .claude/rules/semantic-search.md so Claude prefers semantic search
  • Updates .gitignore

Requires Python 3.11+ and uv (brew install uv or pip install uv).

Uninstall

npx semantic-code-index-mcp uninstall

Cleanly removes all config. If you have other MCP servers configured, they are preserved.

Install from source (dev)

git clone https://github.com/thinhdo/semantic-code-index-mcp
cd semantic-code-index-mcp
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

# Install into a project using local binary
semantic-code-index-mcp install /path/to/your/project

# Or via npx with --local flag
npx semantic-code-index-mcp install /path/to/project --local .venv/bin/semantic-code-index-mcp

Usage

Once installed, open Claude Code in your project. First time, ask Claude:

Index this project

After that, Claude will automatically use semantic_search for code exploration. The index auto-syncs when files change — no manual steps needed.

Tools

Tool Description
index_project Full re-index of the codebase
sync_index Incremental sync (new/changed/deleted files only)
semantic_search Hybrid search: semantic vectors + keyword BM25. Auto-syncs before searching
list_indexed_files List all indexed files with token count and chunk count
get_file_chunks Get full content of a file's indexed chunks
token_usage_stats Compare: tokens if reading full repo vs tokens used by searches
search_log View recent search history with token usage and savings

How it works

  1. Chunking — source files are split into overlapping chunks (~100 lines, 15-line overlap)
  2. Embedding — each chunk is vectorized locally using fastembed (BAAI/bge-small-en-v1.5, ONNX)
  3. Storage — vectors + metadata stored in SQLite (at ~/.cache/semantic-code-index/<hash>/)
  4. Search — hybrid retrieval: cosine similarity + FTS5 BM25, fused with Reciprocal Rank Fusion
  5. Auto-sync — on each search, changed files are detected and re-indexed automatically

Token savings

Each search returns only relevant snippets instead of the full repo. Example on a ~10k token repo:

Query Result tokens Full repo Saved
"how does embedding work" 1,569 9,577 83%
"install setup" 925 9,577 90%
"chunking strategy" 1,331 9,577 86%

On larger repos (100k+ tokens), savings are even more significant.

Environment variables

  • SEMANTIC_CODE_ROOT or WORKSPACE_ROOT: root directory of the project to index (default: MCP server's working directory)

Notes

  • Token counting uses tiktoken encoding cl100k_base (approximate for Claude/GPT-4), not actual billing
  • First run downloads the ONNX embedding model (~30MB)
  • Vector search scans all chunks in SQLite; very large repos may need scaling (sqlite-vec / ANN)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_code_index_mcp-0.2.4.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_code_index_mcp-0.2.4-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file semantic_code_index_mcp-0.2.4.tar.gz.

File metadata

  • Download URL: semantic_code_index_mcp-0.2.4.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for semantic_code_index_mcp-0.2.4.tar.gz
Algorithm Hash digest
SHA256 8345ab3517ec939e4ac001f7331f4a13aad5c2cb5edb1eee82bb3a40a3dc58be
MD5 705b8ba4fe4d0fc8d4bb4baf79c569f8
BLAKE2b-256 ce8ccb29934b9fb512168bf4e98dcfdd9e6c0a66c4c2622f89354f9eb40da96f

See more details on using hashes here.

File details

Details for the file semantic_code_index_mcp-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: semantic_code_index_mcp-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for semantic_code_index_mcp-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 932a024277c695a9dbb83ae3f914fd405f1027958124688f30f68500925b10f9
MD5 1b1f6626263a665e8c52238c8dba4aa7
BLAKE2b-256 69f793c8d28ddb8e4827870bacf47f1d0d76c50b2d0868d1055600022cad3409

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page