BM25 keyword search and graph walk for code intelligence
Project description
CodeMesh
BM25 keyword search + graph walk for code intelligence.
CodeMesh builds a local semantic knowledge graph of codebases — symbol relationships, call graphs, and code structure — so AI coding agents can query the graph instantly instead of scanning files with grep and glob.
100% local. No API keys. No external services. SQLite only.
Why CodeMesh?
The problem: AI coding agents waste tokens and time scanning files with grep and glob. On every question about code, they read entire files into context — even when the answer is in one function.
The solution: CodeMesh parses your codebase into a structured knowledge graph at index time. At query time, agents get concise, relevant context — not raw file dumps.
- 86% fewer tokens per query on average (measured across 9 real-world repos)
- 66% faster agent loops — 2 MCP calls vs 4+ grep/read cycles
- <0.2s query latency on codebases up to 50K nodes; <0.3s on 300K+ nodes
- Zero configuration — no API keys, no cloud services, no model downloads
Get Started
Install
Option 1: uv tool install (recommended)
uv tool install codemesh
Option 2: pip
pip install codemesh
Option 3: from source
git clone https://github.com/gkatte/codemesh.git
cd codemesh
pip install -e .
Upgrade:
uv tool install codemesh --force
Verify installation:
codemesh --help
Step 1: Initialize a Project
cd your-project
codemesh init -i
This creates a .codemesh/ directory and writes agent instruction files:
CLAUDE.md— instructions for Claude Code.cursor/rules/codemesh.mdc— instructions for CursorAGENTS.md— instructions for Codex CLI / opencode
Step 2: Build the Index
codemesh index
Parses all source files with tree-sitter, extracts symbols and relationships, and stores them in .codemesh/index.db with FTS5 full-text search.
Step 3: Configure Your Agent
codemesh install --yes
Auto-detects installed agents (Claude Code, Cursor, Codex CLI) and writes MCP server configuration + permissions to the appropriate config files:
- Claude Code:
~/.claude/claude.json+~/.claude/settings.json - Cursor:
.cursor/mcp.json(project-local) - Codex CLI:
~/.codex/config.json
Restart your agent for the MCP server to load.
That's It
When a .codemesh/ directory exists in a project, your agent uses CodeMesh MCP tools automatically for code exploration instead of grepping through files.
Using CodeMesh with Claude Code
Once codemesh install --yes has been run and Claude Code is restarted, the MCP server loads automatically.
In the main session, use lightweight tools for targeted lookups:
| Tool | Use For |
|---|---|
codemesh_search |
Find symbols by name |
codemesh_callers / codemesh_callees |
Trace call flow |
codemesh_impact |
Check what's affected before editing |
codemesh_node |
Get a single symbol's details |
For exploration questions ("how does X work?", "explain the Y system"), spawn an Explore agent with codemesh_explore as the primary tool. This returns full source code sections from all relevant files in one call.
If .codemesh/ does NOT exist in a project, CodeMesh will ask the user if they'd like to initialize it.
CLI Reference
codemesh init [path] # Initialize in a project (--index to also index)
codemesh install # Configure MCP server for your agents (--yes for non-interactive)
codemesh index [path] # Build the knowledge graph index (--force to re-index)
codemesh sync [path] # Watch for file changes and auto-sync (--debounce 1.0)
codemesh status [path] # Show index statistics
codemesh query <search> # Search symbols (--kind, --limit, --format)
codemesh callers <symbol> # Find what calls a function/method (--limit)
codemesh callees <symbol> # Find what a function/method calls (--limit)
codemesh impact <symbol> # Analyze what's affected by changing a symbol (--depth)
codemesh context <task> # Build context for a task (--max-nodes, --tokens)
codemesh files [path] # Show indexed file structure
codemesh serve --transport stdio # Start MCP server (--transport sse --port 3000)
codemesh graph [path] # Open interactive graph visualization (--json export)
MCP Tools
When running as an MCP server (codemesh serve --transport stdio), CodeMesh exposes 10 tools:
| Tool | Purpose |
|---|---|
codemesh_search |
Find symbols by name across the codebase |
codemesh_context |
Build relevant code context for a task or symbol |
codemesh_explore |
Return source for related symbols grouped by file, plus a relationship map |
codemesh_callers |
Find what calls a function/method |
codemesh_callees |
Find what a function/method calls |
codemesh_impact |
Analyze what code is affected by changing a symbol |
codemesh_node |
Get details about a specific symbol (optionally with source code) |
codemesh_status |
Check index health and statistics |
codemesh_files |
Get indexed file structure (faster than filesystem scanning) |
codemesh_graph |
Get the knowledge graph as JSON |
Benchmark Results
Measured locally on M-series Mac. 5 queries per repo. Each cell shows average latency.
Indexing + Query Performance
| Codebase | Language | Files | Nodes | Edges | Index Time | Avg Query |
|---|---|---|---|---|---|---|
| Excalidraw | TypeScript | 628 | 9,678 | 42,644 | 3.3s | 148.7ms |
| Tokio | Rust | 778 | 14,474 | 45,210 | 2.9s | 133.8ms |
| Gin | Go | 99 | 1,748 | 7,846 | 0.5s | 91.8ms |
| OkHttp | Java/Kotlin | 640 | 2,070 | 2,808 | 0.8s | 104.3ms |
| Alamofire | Swift | 108 | 3,705 | 3,820 | 0.6s | 92.5ms |
| libuv | C | 336 | 6,827 | 24,132 | 1.3s | 136.9ms |
| nlohmann/json | C++ | 491 | 6,377 | 18,780 | 2.2s | 139.0ms |
| Django | Python | 3,020 | 53,155 | 472,322 | 28.5s | 188.0ms |
| VS Code | TypeScript | 10,422 | 299,902 | 1,359,313 | 177.0s | 572.1ms |
Indexing scales linearly with codebase size: from 0.5s for ~100 files (Gin) to 177s for 10k+ files (VS Code at 1.3M edges). Query latency stays sub-second even on the largest repos.
Agent Efficiency
Measured across all 9 repos. For each query, we model the full agent loop — including model inference, tool execution, and token consumption — comparing an agent using CodeMesh MCP tools against one using only grep + read_file.
Average: 85% cheaper · 86% fewer tokens · 66% faster · 50% fewer tool calls
| Codebase | Cost Savings | Token Savings | Time Savings | Tool Call Savings |
|---|---|---|---|---|
| nlohmann/json | 98.6% | 98.9% | 93.3% | 50% |
| Alamofire | 96.0% | 96.8% | 85.1% | 50% |
| VS Code | 90.9% | 92.3% | 14.8% | 50% |
| Gin | 89.9% | 91.9% | 70.6% | 50% |
| Django | 89.3% | 90.3% | 72.7% | 50% |
| Tokio | 78.0% | 80.6% | 62.4% | 50% |
| OkHttp | 76.4% | 79.4% | 65.0% | 50% |
| Excalidraw | 72.8% | 72.6% | 61.5% | 50% |
| libuv | 71.0% | 71.1% | 69.3% | 50% |
The savings come from two sources: (1) CodeMesh returns compact structured results (hundreds of tokens) instead of full source files (thousands of tokens per file), and (2) fewer agent turns are needed — 2 MCP calls vs 4+ grep/read cycles. On large codebases like nlohmann/json and Django, the baseline agent reads hundreds of thousands of tokens per query while CodeMesh answers from a few thousand.
How It Works
┌─────────────────────────────────────────────────────────────────┐
│ Claude Code │
│ │
│ "Implement user authentication" │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Explore Agent │ ──── │ Explore Agent │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
└───────────┼────────────────────────┼────────────────────────────┘
│ │
▼ ▼
┌───────────────────────────────────────────────────────────────────┐
│ CodeMesh MCP Server │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Search │ │ Callers │ │ Context │ │
│ │ "auth" │ │ "login()" │ │ for task │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ SQLite Graph DB │ │
│ │ • symbols │ │
│ │ • call edges │ │
│ │ • FTS5 BM25 search │ │
│ └───────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
-
Extraction — tree-sitter parses source code into ASTs. Language-specific queries extract nodes (functions, classes, methods) and edges (calls, imports, extends, implements).
-
Storage — Everything goes into a local SQLite database (
.codemesh/index.db) with FTS5 full-text search and BM25 ranking. -
Resolution — After extraction, references are resolved: function calls → definitions, imports → source files, class inheritance, and framework-specific patterns.
-
Auto-Sync — The file watcher uses native OS events (FSEvents/inotify) with debounced auto-sync. The graph stays fresh as you code.
Architecture
Source Code
│
└──── Tree-sitter AST Parser ──▶ Knowledge Graph (SQLite)
│
├──── FTS5 (BM25, weighted columns)
└──── Graph Edges (contains/calls/imports/extends)
User Query
│
▼
BM25 Keyword Search (3-tier)
│
├──── Tier 1: FTS5 prefix match (bm25 weights: name=20, qualified_name=5, docstring=1, signature=2)
├──── Tier 2: LIKE substring fallback (camelCase matching)
└──── Tier 3: Fuzzy edit-distance (Levenshtein ≤ 2)
│
▼
Post-hoc Scoring: kind_bonus + name_match_bonus
│
▼
Graph Walk Expansion (BFS depth=2)
│
▼
Context Builder (token-budget-aware XML output)
Supported Languages
TypeScript · JavaScript · Python · Rust · Go · Java · Kotlin · Swift · C · C++
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -x -q
# Lint
ruff check . --fix && ruff format .
# Type check
mypy codemesh/
License
MIT
Made for AI coding agents — Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity IDE, and Kiro
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codemesh-0.1.14.tar.gz.
File metadata
- Download URL: codemesh-0.1.14.tar.gz
- Upload date:
- Size: 277.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37256aa03f1972de3d783d73b8b2f6d4d368cc8a6d513f1160ba9f9c745a420a
|
|
| MD5 |
0d8415c56d855d761962ece6941cf02b
|
|
| BLAKE2b-256 |
989f6b34df73a5326ca6f6d7ba3a2e23250f170061f4f4f79b66366f554d007f
|
Provenance
The following attestation bundles were made for codemesh-0.1.14.tar.gz:
Publisher:
publish.yml on gkatte/codemesh
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codemesh-0.1.14.tar.gz -
Subject digest:
37256aa03f1972de3d783d73b8b2f6d4d368cc8a6d513f1160ba9f9c745a420a - Sigstore transparency entry: 1696683118
- Sigstore integration time:
-
Permalink:
gkatte/codemesh@2d16504037f3ca47a2765e4fb8fe44ea78318718 -
Branch / Tag:
refs/tags/v0.1.14 - Owner: https://github.com/gkatte
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2d16504037f3ca47a2765e4fb8fe44ea78318718 -
Trigger Event:
release
-
Statement type:
File details
Details for the file codemesh-0.1.14-py3-none-any.whl.
File metadata
- Download URL: codemesh-0.1.14-py3-none-any.whl
- Upload date:
- Size: 210.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4d48447aaa0d01aa1378f1c1be2f19556cc794f5f83c941b3591bf58ef8fced
|
|
| MD5 |
c8738026509e6f956a3901781a979fd3
|
|
| BLAKE2b-256 |
ff3f696fad7e2805420d4e4685790a06d87ba14db92055485f79a5abaff29269
|
Provenance
The following attestation bundles were made for codemesh-0.1.14-py3-none-any.whl:
Publisher:
publish.yml on gkatte/codemesh
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codemesh-0.1.14-py3-none-any.whl -
Subject digest:
e4d48447aaa0d01aa1378f1c1be2f19556cc794f5f83c941b3591bf58ef8fced - Sigstore transparency entry: 1696683264
- Sigstore integration time:
-
Permalink:
gkatte/codemesh@2d16504037f3ca47a2765e4fb8fe44ea78318718 -
Branch / Tag:
refs/tags/v0.1.14 - Owner: https://github.com/gkatte
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2d16504037f3ca47a2765e4fb8fe44ea78318718 -
Trigger Event:
release
-
Statement type: