SQLite-backed code index for Claude Code, exposed via MCP
Project description
code-index
A local, SQLite-backed code index for Claude Code, exposed over MCP. It
replaces blind Read / Grep / Glob exploration with targeted retrieval —
"where is parseAuthToken defined", "what calls Indexer.reindex_all", "find
the rate-limiting code" — answered in milliseconds against an offline index.
No API keys. No external services. The embedder runs locally on your machine.
How it works (30-second tour)
- Parse your repo with tree-sitter (Python, TypeScript/JavaScript, Go, Rust).
- Chunk code per symbol and expand identifiers (
getUserAuthToken→get user auth token) so search matches both styles. - Embed each chunk locally with
jina-embeddings-v2-base-code(768-dim) via sentence-transformers. - Store symbols, chunks, vectors, and call/import edges in
.claude/index.db(SQLite + sqlite-vec + FTS5). - Serve 8 retrieval tools + 3 admin tools over MCP (see Tools).
- Stay fresh via an optional
PostToolUsehook that incrementally re-indexes touched files.
Tools
| Tool | Purpose |
|---|---|
init |
Build or refresh the project's index. Incremental by default; force=true rebuilds from scratch. |
setup_check |
Diagnose hook wiring + embedder + host. Round-trip-tests the hook end-to-end. Suggests platform env (e.g. CPU on Apple Silicon). |
install_hook |
Wire the auto-reindex PostToolUse hook into .claude/settings.json, mirroring the MCP launch toolchain (uvx-aware). Idempotent. |
code_search |
Hybrid (vector + FTS) search for conceptual queries (e.g., "auth flow", "where do we parse JSON"). |
symbol_lookup |
Exact-name lookup of functions / classes / methods / types. Prefer over code_search for identifiers. |
file_outline |
Symbols (with signatures) in a file, in source order. Use instead of Read when you only need shape. |
get_symbol_body |
Full chunk for a symbol_id returned by symbol_lookup or code_search. |
callers |
Symbols that CALL the given symbol. depth (1-5) expands transitively. |
callees |
Symbols that the given symbol CALLS. depth (1-5) expands transitively. |
dependents |
Files that import the given file. |
dependencies |
Files that the given file imports. |
All tools return bounded JSON; large bodies use get_symbol_body rather than
inlining whole files.
Requirements
- Python 3.10+ with loadable SQLite extension support (required by
sqlite-vec).- Python 3.13 has this enabled by default.
- On 3.10–3.12, install via the python.org installer or via pyenv with
PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions pyenv install 3.12.x. - Homebrew Python often ships without the extension hook — use one of the two methods above instead.
uv/uvx(install) — recommended runner. Orpipif you prefer a permanent install.- ~600 MB free disk for the embedding model on first init.
Quick start (Claude Code)
One command, no API keys:
claude mcp add-json -s user code-index "$(cat <<'JSON'
{
"type": "stdio",
"command": "uvx",
"args": ["--refresh", "--from", "mcp-code-index", "code-index-mcp"]
}
JSON
)"
Then open Claude Code in any repo and ask:
"Build the code index for this repo."
Claude calls the init MCP tool, which writes .claude/index.db. From then on,
ask things like "where is parseAuthToken defined?" or "what calls
Indexer.reindex_all?" — Claude routes them through symbol_lookup /
callers / code_search instead of grepping.
What
--refreshdoes — fetches the latest PyPI release on every Claude Code launch. Convenient during preview; drop it once you want to pin a version (saves ~1s of startup).Project-only install — drop
-s userto register the server in the current project's.claude/settings.jsoninstead of the global~/.claude.json.First-run model download — the first
initpullsjina-embeddings-v2-base-code(~600 MB) into~/.cache/huggingfaceand caches it forever. Subsequent runs are fully offline. If your network blocks Hugging Face, pre-warm the cache from a machine that has access.Already installed without
--refresh? Runclaude mcp remove code-indexfirst, then re-run the command above.
Alternative: permanent install (no uvx)
pip install mcp-code-index
claude mcp add -s user code-index -- code-index-mcp
Optional: keep the index live as you edit
Without a hook, the index drifts when files change outside the agent (mv,
git checkout, IDE saves) until you call init again. With one, every
Edit / Write / MultiEdit Claude performs triggers an incremental reindex
of the touched file.
Easiest path: ask Claude. On first use in a new project, ask "set up the
code-index" — Claude calls setup_check → install_hook → init. The hook
command is derived from how the MCP server was launched (uvx-aware), so it
uses the same Python toolchain. Hook output goes to .claude/code-index-hook.log
so failures are debuggable.
Manual install — add this block to the project's .claude/settings.json
under hooks.PostToolUse (the version you want depends on how you launch the
server — install_hook derives the right one for you):
{
"matcher": "Edit|Write|MultiEdit",
"hooks": [
{
"type": "command",
"command": "uvx --with 'sentence-transformers<5' --with 'numpy<2' --from mcp-code-index code-index-hook"
}
]
}
In other MCP-compatible agents
The server speaks standard MCP over stdio, so any client that supports MCP
servers works (Cursor, Continue, Cody, Zed, etc.). Configure the client to
launch uvx --refresh --from mcp-code-index code-index-mcp (or
code-index-mcp after pip install mcp-code-index). Once connected, call the
init tool from inside the client to bootstrap the index. Drop --refresh
when you want to pin to a stable version instead of always pulling latest.
From source (development)
git clone https://github.com/achreftlili/code-index
cd code-index
pip install -e .
code-index init # CLI alternative to the `init` MCP tool
code-index-mcp # starts the MCP server on stdio (for manual wiring)
Configuration
All settings are optional — the defaults work out of the box. Override them via
environment variables. Inside Claude Code, set them in the env block of your
code-index server entry in ~/.claude.json (then reconnect the MCP server).
Common knobs (most users only ever touch these):
| Var | Default | When to set it |
|---|---|---|
CODE_INDEX_EMBED_DEVICE |
auto | Force the torch device: cpu, mps, or cuda. Set cpu on Apple Silicon if init fails with MPS out-of-memory. |
CODE_INDEX_EMBED_BATCH |
32 |
Encode batch size. Lower (e.g. 8 or 4) to cut peak GPU memory while staying on mps/cuda. |
CODE_INDEX_DB |
.claude/index.db |
Override the SQLite index path (e.g. to share an index across sibling worktrees). |
Advanced (rarely needed):
| Var | Default | Notes |
|---|---|---|
CODE_INDEX_EMBEDDER |
jina |
Only jina (local sentence-transformers) is supported today; the variable exists for future expansion. |
CODE_INDEX_EMBED_MODEL |
jinaai/jina-embeddings-v2-base-code |
HuggingFace model id. Only override if you know the model is dim-compatible (768d). |
CODE_INDEX_EMBED_DIM |
768 |
Must match the embedding model's output dimension. |
Troubleshooting
init fails with MPS backend out of memory on Apple Silicon. A large
file produced a chunk batch bigger than your GPU's free VRAM. Quickest fix —
re-run on CPU (slower but bulletproof):
"env": {
"CODE_INDEX_EMBED_DEVICE": "cpu"
}
To stay on the GPU, shrink the batch instead: "CODE_INDEX_EMBED_BATCH": "8".
Reconnect the MCP server (/mcp → reconnect, or restart Claude Code) so the
new env takes effect. init is incremental — already-embedded files are
skipped on the retry.
init fails with a Hugging Face network error on first run. Your network
is blocking model downloads. Pre-warm the cache on a machine that has access:
huggingface-cli download jinaai/jina-embeddings-v2-base-code
# then copy ~/.cache/huggingface/ to the offline machine
sqlite3.OperationalError: not authorized or sqlite-vec fails to load.
Your Python build doesn't have loadable SQLite extensions. See
Requirements — install via python.org or a pyenv build with
PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions.
code_search / symbol_lookup returns stale paths after a refactor or
branch checkout. The auto-reindex hook only fires on Claude's Edit /
Write / MultiEdit. After bulk file moves outside the agent (mv,
git checkout, IDE rename), re-run init (it's incremental). Or wire up the
hook so the index keeps up with
agent edits automatically.
Layout
src/code_index/
db.py SQLite schema, connection, sqlite-vec loading
parser.py Tree-sitter wrapper, symbol + edge extraction
imports.py Per-language import target → file path resolution
chunker.py Per-symbol chunks, identifier expansion
embedder.py Local Jina (sentence-transformers) backend
indexer.py Pipeline: walk → parse → chunk → embed → write
reindexer.py Per-root engine cache; one entry point for "reindex one file"
retriever.py Hybrid search (vector + FTS5) with RRF
watcher.py File watcher (watchdog)
admin.py setup_check / install_hook / init logic (pure, no MCP state)
mcp_server.py 11 MCP tools (8 retrieval + init / setup_check / install_hook)
hook.py `code-index-hook` console script — the PostToolUse entry point
cli.py init / reindex / watch / stats
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_code_index-0.4.0.tar.gz.
File metadata
- Download URL: mcp_code_index-0.4.0.tar.gz
- Upload date:
- Size: 45.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3390a96b6eeb83a803686f2e7bc51fabdaace3d6fb424e8cd2de4b242b01e68
|
|
| MD5 |
c61887ea1a6493b0c00caf3e9aadefb5
|
|
| BLAKE2b-256 |
684eed9ee61b8c2aaa4d4413e01adcfc8b4a15785b938e50002cb8f1d752c85c
|
File details
Details for the file mcp_code_index-0.4.0-py3-none-any.whl.
File metadata
- Download URL: mcp_code_index-0.4.0-py3-none-any.whl
- Upload date:
- Size: 44.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e380340c9a177bf4d1e9cb209c44f85b45d2d0fa0f41f8cc9ecfbcde97c8df63
|
|
| MD5 |
c59eafed20f35935e1dfc2ed81152aaa
|
|
| BLAKE2b-256 |
3a0a651c45c0d08ae2bc1e3d9980e52f3e6670fd40376b5bec0e42f134984893
|