Skip to main content

SQLite-backed code index for Claude Code, exposed via MCP

Project description

code-index

A local, SQLite-backed code index for Claude Code, exposed over MCP. It replaces blind Read / Grep / Glob exploration with targeted retrieval — "where is parseAuthToken defined", "what calls Indexer.reindex_all", "find the rate-limiting code" — answered in milliseconds against an offline index.

No API keys. No external services. The embedder runs locally on your machine.

How it works (30-second tour)

  1. Parse your repo with tree-sitter (Python, TypeScript/JavaScript, Go, Rust).
  2. Chunk code per symbol and expand identifiers (getUserAuthTokenget user auth token) so search matches both styles.
  3. Embed each chunk locally with jina-embeddings-v2-base-code (768-dim) via sentence-transformers.
  4. Store symbols, chunks, vectors, and call/import edges in .claude/index.db (SQLite + sqlite-vec + FTS5).
  5. Serve 14 retrieval tools + 1 admin tool over MCP (see Tools).
  6. Stay fresh via an optional PostToolUse hook that incrementally re-indexes touched files.

Tools

Retrieval

Tool Purpose
code_search Hybrid (vector + FTS) search for conceptual queries (e.g., "auth flow", "where do we parse JSON").
symbol_lookup Exact-name lookup of functions / classes / methods / types. Prefer over code_search for identifiers.
file_outline Symbols (with signatures) in a file, in source order. Use instead of Read when you only need shape.
module_outline Symbols across a directory subtree in one call. Use instead of looping file_outline.
where_am_i Given path + line, returns the innermost symbol and the full enclosing chain.
get_symbol_body Full chunk for a symbol_id from symbol_lookup / code_search / file_outline.
get_symbol_bodies Batch version of get_symbol_body (up to 20 ids per call).
callers Symbols that CALL the given symbol. depth (1-5) expands transitively.
callees Symbols that the given symbol CALLS. depth (1-5) expands transitively.
references Non-call uses (subclasses, free identifier references). Companion to callers / callees.
trace Build a call-graph tree from an entry symbol; flat=true returns nodes/edges for cheap LLM scans.
file_imports Files this file imports (direction=imports) or that import it (direction=imported_by).
recent_changes Files touched in the last N git commits.
propose_rename v1: same-file rename. Returns an edit list the agent applies via its own Edit tool; refuses on clash.

Admin

Tool / op Purpose
admin op=init Build or refresh the index. Incremental by default; force=true rebuilds from scratch.
admin op=setup_check Diagnose hook wiring + embedder + host. Round-trip-tests the hook end-to-end.
admin op=install_hook Wire the auto-reindex PostToolUse hook into .claude/settings.json. Idempotent.
admin op=stats Read-only: file counts by language, symbol totals, embed model fingerprint, last-index time.
admin op=verify Integrity sweep: orphan rows, parse-failure files, dangling edges.

embed_query_debug is a dev-only ranking diagnostic, hidden from list_tools unless CODE_INDEX_DEBUG=1 is set.

All tools return bounded JSON; large bodies use get_symbol_body rather than inlining whole files.

Requirements

  • Python 3.10+ with loadable SQLite extension support (required by sqlite-vec).
    • Python 3.13 has this enabled by default.
    • On 3.10–3.12, install via the python.org installer or via pyenv with PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions pyenv install 3.12.x.
    • Homebrew Python often ships without the extension hook — use one of the two methods above instead.
  • uv / uvx (install) — recommended runner. Or pip if you prefer a permanent install.
  • ~600 MB free disk for the embedding model on first init.

Quick start (Claude Code)

One command, no API keys:

claude mcp add-json -s user code-index "$(cat <<'JSON'
{
  "type": "stdio",
  "command": "uvx",
  "args": ["--refresh", "--from", "mcp-code-index", "code-index-mcp"]
}
JSON
)"

Then open Claude Code in any repo and ask:

"Build the code index for this repo."

Claude calls the init MCP tool, which writes .claude/index.db. From then on, ask things like "where is parseAuthToken defined?" or "what calls Indexer.reindex_all?" — Claude routes them through symbol_lookup / callers / code_search instead of grepping.

What --refresh does — fetches the latest PyPI release on every Claude Code launch. Convenient during preview; drop it once you want to pin a version (saves ~1s of startup).

Project-only install — drop -s user to register the server in the current project's .claude/settings.json instead of the global ~/.claude.json.

First-run model download — the first init pulls jina-embeddings-v2-base-code (~600 MB) into ~/.cache/huggingface and caches it forever. Subsequent runs are fully offline. If your network blocks Hugging Face, pre-warm the cache from a machine that has access.

Already installed without --refresh? Run claude mcp remove code-index first, then re-run the command above.

Alternative: permanent install (no uvx)

pip install mcp-code-index
claude mcp add -s user code-index -- code-index-mcp

Optional: keep the index live as you edit

Without a hook, the index drifts when files change outside the agent (mv, git checkout, IDE saves) until you call init again. With one, every Edit / Write / MultiEdit Claude performs triggers an incremental reindex of the touched file.

Easiest path: ask Claude. On first use in a new project, ask "set up the code-index" — Claude calls setup_checkinstall_hookinit. The hook command is derived from how the MCP server was launched (uvx-aware), so it uses the same Python toolchain. Hook output goes to .claude/code-index-hook.log so failures are debuggable.

Manual install — add this block to the project's .claude/settings.json under hooks.PostToolUse (the version you want depends on how you launch the server — install_hook derives the right one for you):

{
  "matcher": "Edit|Write|MultiEdit",
  "hooks": [
    {
      "type": "command",
      "command": "uvx --with 'sentence-transformers<5' --with 'numpy<2' --from mcp-code-index code-index-hook"
    }
  ]
}

In other MCP-compatible agents

The server speaks standard MCP over stdio, so any client that supports MCP servers works (Cursor, Continue, Cody, Zed, etc.). Configure the client to launch uvx --refresh --from mcp-code-index code-index-mcp (or code-index-mcp after pip install mcp-code-index). Once connected, call the init tool from inside the client to bootstrap the index. Drop --refresh when you want to pin to a stable version instead of always pulling latest.

From source (development)

git clone https://github.com/achreftlili/code-index
cd code-index
pip install -e .
code-index init        # CLI alternative to the `init` MCP tool
code-index-mcp         # starts the MCP server on stdio (for manual wiring)

Configuration

All settings are optional — the defaults work out of the box. Override them via environment variables. Inside Claude Code, set them in the env block of your code-index server entry in ~/.claude.json (then reconnect the MCP server).

Common knobs (most users only ever touch these):

Var Default When to set it
CODE_INDEX_EMBED_DEVICE auto Force the torch device: cpu, mps, or cuda. Set cpu on Apple Silicon if init fails with MPS out-of-memory.
CODE_INDEX_EMBED_BATCH 32 Encode batch size. Lower (e.g. 8 or 4) to cut peak GPU memory while staying on mps/cuda.
CODE_INDEX_DB .claude/index.db Override the SQLite index path (e.g. to share an index across sibling worktrees).

Advanced (rarely needed):

Var Default Notes
CODE_INDEX_EMBEDDER jina Only jina (local sentence-transformers) is supported today; the variable exists for future expansion.
CODE_INDEX_EMBED_MODEL jinaai/jina-embeddings-v2-base-code HuggingFace model id. Only override if you know the model is dim-compatible (768d).
CODE_INDEX_EMBED_DIM 768 Must match the embedding model's output dimension.

Troubleshooting

init fails with MPS backend out of memory on Apple Silicon. A large file produced a chunk batch bigger than your GPU's free VRAM. Quickest fix — re-run on CPU (slower but bulletproof):

"env": {
  "CODE_INDEX_EMBED_DEVICE": "cpu"
}

To stay on the GPU, shrink the batch instead: "CODE_INDEX_EMBED_BATCH": "8". Reconnect the MCP server (/mcp → reconnect, or restart Claude Code) so the new env takes effect. init is incremental — already-embedded files are skipped on the retry.

init fails with a Hugging Face network error on first run. Your network is blocking model downloads. Pre-warm the cache on a machine that has access:

huggingface-cli download jinaai/jina-embeddings-v2-base-code
# then copy ~/.cache/huggingface/ to the offline machine

sqlite3.OperationalError: not authorized or sqlite-vec fails to load. Your Python build doesn't have loadable SQLite extensions. See Requirements — install via python.org or a pyenv build with PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions.

code_search / symbol_lookup returns stale paths after a refactor or branch checkout. The auto-reindex hook only fires on Claude's Edit / Write / MultiEdit. After bulk file moves outside the agent (mv, git checkout, IDE rename), re-run init (it's incremental). Or wire up the hook so the index keeps up with agent edits automatically.

Layout

src/code_index/
  db.py           SQLite schema, connection, sqlite-vec loading
  parser.py       Tree-sitter wrapper, symbol + edge extraction
  imports.py      Per-language import target → file path resolution
  chunker.py      Per-symbol chunks, identifier expansion
  embedder.py     Local Jina (sentence-transformers) backend
  indexer.py      Pipeline: walk → parse → chunk → embed → write
  reindexer.py    Per-root engine cache; one entry point for "reindex one file"
  retriever.py    Hybrid search (vector + FTS5) with RRF
  watcher.py      File watcher (watchdog)
  admin.py        setup_check / install_hook / init logic (pure, no MCP state)
  mcp_server.py   MCP wiring, shared helpers, schema fragments
  tool_registry.py  Shared `@_tool` decorator + `_TOOLS` registry
  tools/          Per-domain MCP handlers (graph, paths, refactor, …)
  hook.py         `code-index-hook` console script — the PostToolUse entry point
  cli.py          init / reindex / watch / stats

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_code_index-1.0.2.tar.gz (241.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_code_index-1.0.2-py3-none-any.whl (77.2 kB view details)

Uploaded Python 3

File details

Details for the file mcp_code_index-1.0.2.tar.gz.

File metadata

  • Download URL: mcp_code_index-1.0.2.tar.gz
  • Upload date:
  • Size: 241.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcp_code_index-1.0.2.tar.gz
Algorithm Hash digest
SHA256 e628669c357258d7bfdadc3cb3c5d881dc649edb3159081a6cf7142256b0d19c
MD5 ad25ddda13f7d3fc0f896c9f2a6ec25f
BLAKE2b-256 046a35700db7ef5e4fc61085b69ac7ec4c15a5aefb501b1ae13e48d1fd723d5b

See more details on using hashes here.

File details

Details for the file mcp_code_index-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: mcp_code_index-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 77.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcp_code_index-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f6bf351d82d7d19af428c5e2d5974f01a390522bc6535707a2ce86947e430c8d
MD5 48253cc2447eb1f090556bc66098b2b8
BLAKE2b-256 0eece6f5de61df92c8b46cc7ffc707e621ed70b93c514fe0f979578bceddb274

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page