Skip to main content

BM25 keyword search and graph walk for code intelligence

Project description

CodeMesh

PyPI version Python License: MIT CI Tests

BM25 keyword search + graph walk for code intelligence.

CodeMesh builds a local semantic knowledge graph of codebases — symbol relationships, call graphs, and code structure — so AI coding agents can query the graph instantly instead of scanning files with grep and glob.

100% local. No API keys. No external services. SQLite only.


Why CodeMesh?

The problem: AI coding agents waste tokens and time scanning files with grep and glob. On every question about code, they read entire files into context — even when the answer is in one function.

The solution: CodeMesh parses your codebase into a structured knowledge graph at index time. At query time, agents get concise, relevant context — not raw file dumps.

  • 86% fewer tokens per query on average (measured across 9 real-world repos)
  • 66% faster agent loops — 2 MCP calls vs 4+ grep/read cycles
  • <0.2s query latency on codebases up to 50K nodes; <0.3s on 300K+ nodes
  • Zero configuration — no API keys, no cloud services, no model downloads

Get Started

Install

Option 1: uv tool install (recommended)

uv tool install codemesh

Option 2: pip

pip install codemesh

Option 3: from source

git clone https://github.com/gkatte/codemesh.git
cd codemesh
pip install -e .

Upgrade:

uv tool install codemesh --force

Verify installation:

codemesh --help

Step 1: Initialize a Project

cd your-project
codemesh init -i

This creates a .codemesh/ directory and writes agent instruction files:

  • CLAUDE.md — instructions for Claude Code
  • .cursor/rules/codemesh.mdc — instructions for Cursor
  • AGENTS.md — instructions for Codex CLI / opencode

Step 2: Build the Index

codemesh index

Parses all source files with tree-sitter, extracts symbols and relationships, and stores them in .codemesh/index.db with FTS5 full-text search.

Step 3: Configure Your Agent

codemesh install --yes

Auto-detects installed agents (Claude Code, Cursor, Codex CLI) and writes MCP server configuration + permissions to the appropriate config files:

  • Claude Code: ~/.claude/claude.json + ~/.claude/settings.json
  • Cursor: .cursor/mcp.json (project-local)
  • Codex CLI: ~/.codex/config.json

Restart your agent for the MCP server to load.

That's It

When a .codemesh/ directory exists in a project, your agent uses CodeMesh MCP tools automatically for code exploration instead of grepping through files.


Using CodeMesh with Claude Code

Once codemesh install --yes has been run and Claude Code is restarted, the MCP server loads automatically.

In the main session, use lightweight tools for targeted lookups:

Tool Use For
codemesh_search Find symbols by name
codemesh_callers / codemesh_callees Trace call flow
codemesh_impact Check what's affected before editing
codemesh_node Get a single symbol's details

For exploration questions ("how does X work?", "explain the Y system"), spawn an Explore agent with codemesh_explore as the primary tool. This returns full source code sections from all relevant files in one call.

If .codemesh/ does NOT exist in a project, CodeMesh will ask the user if they'd like to initialize it.


CLI Reference

codemesh init [path]              # Initialize in a project (--index to also index)
codemesh install                  # Configure MCP server for your agents (--yes for non-interactive)
codemesh index [path]             # Build the knowledge graph index (--force to re-index)
codemesh sync [path]              # Watch for file changes and auto-sync (--debounce 1.0)
codemesh status [path]            # Show index statistics
codemesh query <search>           # Search symbols (--kind, --limit, --format)
codemesh callers <symbol>         # Find what calls a function/method (--limit)
codemesh callees <symbol>         # Find what a function/method calls (--limit)
codemesh impact <symbol>          # Analyze what's affected by changing a symbol (--depth)
codemesh context <task>           # Build context for a task (--max-nodes, --tokens)
codemesh files [path]             # Show indexed file structure
codemesh serve --transport stdio  # Start MCP server (--transport sse --port 3000)
codemesh graph [path]             # Open interactive graph visualization (--json export)

MCP Tools

When running as an MCP server (codemesh serve --transport stdio), CodeMesh exposes 10 tools:

Tool Purpose
codemesh_search Find symbols by name across the codebase
codemesh_context Build relevant code context for a task or symbol
codemesh_explore Return source for related symbols grouped by file, plus a relationship map
codemesh_callers Find what calls a function/method
codemesh_callees Find what a function/method calls
codemesh_impact Analyze what code is affected by changing a symbol
codemesh_node Get details about a specific symbol (optionally with source code)
codemesh_status Check index health and statistics
codemesh_files Get indexed file structure (faster than filesystem scanning)
codemesh_graph Get the knowledge graph as JSON

Benchmark Results

Measured locally on M-series Mac. 5 queries per repo. Each cell shows average latency.

Indexing + Query Performance

Codebase Language Files Nodes Edges Index Time Avg Query
Excalidraw TypeScript 628 9,678 42,644 3.3s 148.7ms
Tokio Rust 778 14,474 45,210 2.9s 133.8ms
Gin Go 99 1,748 7,846 0.5s 91.8ms
OkHttp Java/Kotlin 640 2,070 2,808 0.8s 104.3ms
Alamofire Swift 108 3,705 3,820 0.6s 92.5ms
libuv C 336 6,827 24,132 1.3s 136.9ms
nlohmann/json C++ 491 6,377 18,780 2.2s 139.0ms
Django Python 3,020 53,155 472,322 28.5s 188.0ms
VS Code TypeScript 10,422 299,902 1,359,313 177.0s 572.1ms

Indexing scales linearly with codebase size: from 0.5s for ~100 files (Gin) to 177s for 10k+ files (VS Code at 1.3M edges). Query latency stays sub-second even on the largest repos.

Agent Efficiency

Measured across all 9 repos. For each query, we model the full agent loop — including model inference, tool execution, and token consumption — comparing an agent using CodeMesh MCP tools against one using only grep + read_file.

Average: 85% cheaper · 86% fewer tokens · 66% faster · 50% fewer tool calls

Codebase Cost Savings Token Savings Time Savings Tool Call Savings
nlohmann/json 98.6% 98.9% 93.3% 50%
Alamofire 96.0% 96.8% 85.1% 50%
VS Code 90.9% 92.3% 14.8% 50%
Gin 89.9% 91.9% 70.6% 50%
Django 89.3% 90.3% 72.7% 50%
Tokio 78.0% 80.6% 62.4% 50%
OkHttp 76.4% 79.4% 65.0% 50%
Excalidraw 72.8% 72.6% 61.5% 50%
libuv 71.0% 71.1% 69.3% 50%

The savings come from two sources: (1) CodeMesh returns compact structured results (hundreds of tokens) instead of full source files (thousands of tokens per file), and (2) fewer agent turns are needed — 2 MCP calls vs 4+ grep/read cycles. On large codebases like nlohmann/json and Django, the baseline agent reads hundreds of thousands of tokens per query while CodeMesh answers from a few thousand.


How It Works

┌─────────────────────────────────────────────────────────────────┐
│                        Claude Code                              │
│                                                                 │
│  "Implement user authentication"                                │
│           │                                                     │
│           ▼                                                     │
│  ┌─────────────────┐      ┌─────────────────┐                   │
│  │  Explore Agent  │ ──── │  Explore Agent  │                   │
│  └────────┬────────┘      └────────┬────────┘                   │
│           │                        │                            │
└───────────┼────────────────────────┼────────────────────────────┘
            │                        │
            ▼                        ▼
┌───────────────────────────────────────────────────────────────────┐
│                     CodeMesh MCP Server                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                │
│  │   Search    │  │   Callers   │  │   Context   │                │
│  │  "auth"     │  │  "login()"  │  │  for task   │                │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘                │
│         │                │                │                       │
│         └────────────────┼────────────────┘                       │
│                          ▼                                        │
│              ┌───────────────────────┐                            │
│              │   SQLite Graph DB     │                            │
│              │   • symbols           │                            │
│              │   • call edges        │                            │
│              │   • FTS5 BM25 search  │                            │
│              └───────────────────────┘                            │
└───────────────────────────────────────────────────────────────────┘
  1. Extraction — tree-sitter parses source code into ASTs. Language-specific queries extract nodes (functions, classes, methods) and edges (calls, imports, extends, implements).

  2. Storage — Everything goes into a local SQLite database (.codemesh/index.db) with FTS5 full-text search and BM25 ranking.

  3. Resolution — After extraction, references are resolved: function calls → definitions, imports → source files, class inheritance, and framework-specific patterns.

  4. Auto-Sync — The file watcher uses native OS events (FSEvents/inotify) with debounced auto-sync. The graph stays fresh as you code.


Architecture

Source Code
    │
    └──── Tree-sitter AST Parser ──▶ Knowledge Graph (SQLite)
                                        │
                                        ├──── FTS5 (BM25, weighted columns)
                                        └──── Graph Edges (contains/calls/imports/extends)

User Query
    │
    ▼
BM25 Keyword Search (3-tier)
    │
    ├──── Tier 1: FTS5 prefix match (bm25 weights: name=20, qualified_name=5, docstring=1, signature=2)
    ├──── Tier 2: LIKE substring fallback (camelCase matching)
    └──── Tier 3: Fuzzy edit-distance (Levenshtein ≤ 2)
    │
    ▼
Post-hoc Scoring: kind_bonus + name_match_bonus
    │
    ▼
Graph Walk Expansion (BFS depth=2)
    │
    ▼
Context Builder (token-budget-aware XML output)

Supported Languages

TypeScript · JavaScript · Python · Rust · Go · Java · Kotlin · Swift · C · C++

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -x -q

# Lint
ruff check . --fix && ruff format .

# Type check
mypy codemesh/

License

MIT


Made for AI coding agents — Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity IDE, and Kiro

Report Bug · Request Feature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codemesh-0.1.14.tar.gz (277.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codemesh-0.1.14-py3-none-any.whl (210.0 kB view details)

Uploaded Python 3

File details

Details for the file codemesh-0.1.14.tar.gz.

File metadata

  • Download URL: codemesh-0.1.14.tar.gz
  • Upload date:
  • Size: 277.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codemesh-0.1.14.tar.gz
Algorithm Hash digest
SHA256 37256aa03f1972de3d783d73b8b2f6d4d368cc8a6d513f1160ba9f9c745a420a
MD5 0d8415c56d855d761962ece6941cf02b
BLAKE2b-256 989f6b34df73a5326ca6f6d7ba3a2e23250f170061f4f4f79b66366f554d007f

See more details on using hashes here.

Provenance

The following attestation bundles were made for codemesh-0.1.14.tar.gz:

Publisher: publish.yml on gkatte/codemesh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codemesh-0.1.14-py3-none-any.whl.

File metadata

  • Download URL: codemesh-0.1.14-py3-none-any.whl
  • Upload date:
  • Size: 210.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codemesh-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 e4d48447aaa0d01aa1378f1c1be2f19556cc794f5f83c941b3591bf58ef8fced
MD5 c8738026509e6f956a3901781a979fd3
BLAKE2b-256 ff3f696fad7e2805420d4e4685790a06d87ba14db92055485f79a5abaff29269

See more details on using hashes here.

Provenance

The following attestation bundles were made for codemesh-0.1.14-py3-none-any.whl:

Publisher: publish.yml on gkatte/codemesh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page