Skip to main content

BM25 keyword search and graph walk for code intelligence

Project description

CodeMesh

PyPI version Python License: MIT CI Tests

BM25 keyword search + graph walk for code intelligence.

CodeMesh builds a local semantic knowledge graph of codebases — symbol relationships, call graphs, and code structure — so AI coding agents can query the graph instantly instead of scanning files with grep and glob.

100% local. No API keys. No external services. SQLite only.


Why CodeMesh?

The problem: AI coding agents waste tokens and time scanning files with grep and glob. On every question about code, they read entire files into context — even when the answer is in one function.

The solution: CodeMesh parses your codebase into a structured knowledge graph at index time. At query time, agents get concise, relevant context — not raw file dumps.

  • 86% fewer tokens per query on average (measured across 9 real-world repos)
  • 66% faster agent loops — 2 MCP calls vs 4+ grep/read cycles
  • <0.2s query latency on codebases up to 50K nodes; <0.3s on 300K+ nodes
  • Zero configuration — no API keys, no cloud services, no model downloads

Get Started

Install

Option 1: uv tool install (recommended)

uv tool install codemesh

Option 2: pip

pip install codemesh

Option 3: from source

git clone https://github.com/gkatte/codemesh.git
cd codemesh
pip install -e .

Upgrade:

uv tool install codemesh --force

Verify installation:

codemesh --help

Step 1: Initialize a Project

cd your-project
codemesh init -i

This creates a .codemesh/ directory and writes agent instruction files:

  • CLAUDE.md — instructions for Claude Code
  • .cursor/rules/codemesh.mdc — instructions for Cursor
  • AGENTS.md — instructions for Codex CLI / opencode

Step 2: Build the Index

codemesh index

Parses all source files with tree-sitter, extracts symbols and relationships, and stores them in .codemesh/index.db with FTS5 full-text search.

Step 3: Configure Your Agent

codemesh install --yes

Auto-detects installed agents (Claude Code, Cursor, Codex CLI) and writes MCP server configuration + permissions to the appropriate config files:

  • Claude Code: ~/.claude/claude.json + ~/.claude/settings.json
  • Cursor: .cursor/mcp.json (project-local)
  • Codex CLI: ~/.codex/config.json

Restart your agent for the MCP server to load.

That's It

When a .codemesh/ directory exists in a project, your agent uses CodeMesh MCP tools automatically for code exploration instead of grepping through files.


Using CodeMesh with Claude Code

Once codemesh install --yes has been run and Claude Code is restarted, the MCP server loads automatically.

In the main session, use lightweight tools for targeted lookups:

Tool Use For
codemesh_search Find symbols by name
codemesh_callers / codemesh_callees Trace call flow
codemesh_impact Check what's affected before editing
codemesh_node Get a single symbol's details

For exploration questions ("how does X work?", "explain the Y system"), spawn an Explore agent with codemesh_explore as the primary tool. This returns full source code sections from all relevant files in one call.

If .codemesh/ does NOT exist in a project, CodeMesh will ask the user if they'd like to initialize it.


CLI Reference

codemesh init [path]              # Initialize in a project (--index to also index)
codemesh install                  # Configure MCP server for your agents (--yes for non-interactive)
codemesh index [path]             # Build the knowledge graph index (--force to re-index)
codemesh sync [path]              # Watch for file changes and auto-sync (--debounce 1.0)
codemesh status [path]            # Show index statistics
codemesh query <search>           # Search symbols (--kind, --limit, --format)
codemesh callers <symbol>         # Find what calls a function/method (--limit)
codemesh callees <symbol>         # Find what a function/method calls (--limit)
codemesh impact <symbol>          # Analyze what's affected by changing a symbol (--depth)
codemesh context <task>           # Build context for a task (--max-nodes, --tokens)
codemesh files [path]             # Show indexed file structure
codemesh serve --transport stdio  # Start MCP server (--transport sse --port 3000)
codemesh graph [path]             # Open interactive graph visualization (--json export)

MCP Tools

When running as an MCP server (codemesh serve --transport stdio), CodeMesh exposes 10 tools:

Tool Purpose
codemesh_search Find symbols by name across the codebase
codemesh_context Build relevant code context for a task or symbol
codemesh_explore Return source for related symbols grouped by file, plus a relationship map
codemesh_callers Find what calls a function/method
codemesh_callees Find what a function/method calls
codemesh_impact Analyze what code is affected by changing a symbol
codemesh_node Get details about a specific symbol (optionally with source code)
codemesh_status Check index health and statistics
codemesh_files Get indexed file structure (faster than filesystem scanning)
codemesh_graph Get the knowledge graph as JSON

Benchmark Results

Measured locally on M-series Mac. 5 queries per repo. Each cell shows average latency.

Indexing + Query Performance

Codebase Language Files Nodes Edges Index Time Avg Query
Excalidraw TypeScript 628 9,678 42,644 3.3s 148.7ms
Tokio Rust 778 14,474 45,210 2.9s 133.8ms
Gin Go 99 1,748 7,846 0.5s 91.8ms
OkHttp Java/Kotlin 640 2,070 2,808 0.8s 104.3ms
Alamofire Swift 108 3,705 3,820 0.6s 92.5ms
libuv C 336 6,827 24,132 1.3s 136.9ms
nlohmann/json C++ 491 6,377 18,780 2.2s 139.0ms
Django Python 3,020 53,155 472,322 28.5s 188.0ms
VS Code TypeScript 10,422 299,902 1,359,313 177.0s 572.1ms

Indexing scales linearly with codebase size: from 0.5s for ~100 files (Gin) to 177s for 10k+ files (VS Code at 1.3M edges). Query latency stays sub-second even on the largest repos.

Agent Efficiency

Measured across all 9 repos. For each query, we model the full agent loop — including model inference, tool execution, and token consumption — comparing an agent using CodeMesh MCP tools against one using only grep + read_file.

Average: 85% cheaper · 86% fewer tokens · 66% faster · 50% fewer tool calls

Codebase Cost Savings Token Savings Time Savings Tool Call Savings
nlohmann/json 98.6% 98.9% 93.3% 50%
Alamofire 96.0% 96.8% 85.1% 50%
VS Code 90.9% 92.3% 14.8% 50%
Gin 89.9% 91.9% 70.6% 50%
Django 89.3% 90.3% 72.7% 50%
Tokio 78.0% 80.6% 62.4% 50%
OkHttp 76.4% 79.4% 65.0% 50%
Excalidraw 72.8% 72.6% 61.5% 50%
libuv 71.0% 71.1% 69.3% 50%

The savings come from two sources: (1) CodeMesh returns compact structured results (hundreds of tokens) instead of full source files (thousands of tokens per file), and (2) fewer agent turns are needed — 2 MCP calls vs 4+ grep/read cycles. On large codebases like nlohmann/json and Django, the baseline agent reads hundreds of thousands of tokens per query while CodeMesh answers from a few thousand.


How It Works

┌─────────────────────────────────────────────────────────────────┐
│                        Claude Code                              │
│                                                                 │
│  "Implement user authentication"                                │
│           │                                                     │
│           ▼                                                     │
│  ┌─────────────────┐      ┌─────────────────┐                   │
│  │  Explore Agent  │ ──── │  Explore Agent  │                   │
│  └────────┬────────┘      └────────┬────────┘                   │
│           │                        │                            │
└───────────┼────────────────────────┼────────────────────────────┘
            │                        │
            ▼                        ▼
┌───────────────────────────────────────────────────────────────────┐
│                     CodeMesh MCP Server                           │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                │
│  │   Search    │  │   Callers   │  │   Context   │                │
│  │  "auth"     │  │  "login()"  │  │  for task   │                │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘                │
│         │                │                │                       │
│         └────────────────┼────────────────┘                       │
│                          ▼                                        │
│              ┌───────────────────────┐                            │
│              │   SQLite Graph DB     │                            │
│              │   • symbols           │                            │
│              │   • call edges        │                            │
│              │   • FTS5 BM25 search  │                            │
│              └───────────────────────┘                            │
└───────────────────────────────────────────────────────────────────┘
  1. Extraction — tree-sitter parses source code into ASTs. Language-specific queries extract nodes (functions, classes, methods) and edges (calls, imports, extends, implements).

  2. Storage — Everything goes into a local SQLite database (.codemesh/index.db) with FTS5 full-text search and BM25 ranking.

  3. Resolution — After extraction, references are resolved: function calls → definitions, imports → source files, class inheritance, and framework-specific patterns.

  4. Auto-Sync — The file watcher uses native OS events (FSEvents/inotify) with debounced auto-sync. The graph stays fresh as you code.


Architecture

Source Code
    │
    └──── Tree-sitter AST Parser ──▶ Knowledge Graph (SQLite)
                                        │
                                        ├──── FTS5 (BM25, weighted columns)
                                        └──── Graph Edges (contains/calls/imports/extends)

User Query
    │
    ▼
BM25 Keyword Search (3-tier)
    │
    ├──── Tier 1: FTS5 prefix match (bm25 weights: name=20, qualified_name=5, docstring=1, signature=2)
    ├──── Tier 2: LIKE substring fallback (camelCase matching)
    └──── Tier 3: Fuzzy edit-distance (Levenshtein ≤ 2)
    │
    ▼
Post-hoc Scoring: kind_bonus + name_match_bonus
    │
    ▼
Graph Walk Expansion (BFS depth=2)
    │
    ▼
Context Builder (token-budget-aware XML output)

Supported Languages

TypeScript · JavaScript · Python · Rust · Go · Java · Kotlin · Swift · C · C++

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -x -q

# Lint
ruff check . --fix && ruff format .

# Type check
mypy codemesh/

License

MIT


Made for AI coding agents — Claude Code, Cursor, Codex CLI, opencode, Hermes Agent, Gemini CLI, Antigravity IDE, and Kiro

Report Bug · Request Feature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codemesh-0.1.7.tar.gz (159.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codemesh-0.1.7-py3-none-any.whl (91.9 kB view details)

Uploaded Python 3

File details

Details for the file codemesh-0.1.7.tar.gz.

File metadata

  • Download URL: codemesh-0.1.7.tar.gz
  • Upload date:
  • Size: 159.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codemesh-0.1.7.tar.gz
Algorithm Hash digest
SHA256 26496face0fc88dfa5ea884d7c05344b4c8f32c585c29da9159e63fb883fdab9
MD5 6160e6856c397f81610740188c1fb903
BLAKE2b-256 85f9d72f41e1f78897676cbbfacba1414ad422d276b8b0436e98392293076f98

See more details on using hashes here.

Provenance

The following attestation bundles were made for codemesh-0.1.7.tar.gz:

Publisher: publish.yml on gkatte/codemesh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codemesh-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: codemesh-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 91.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codemesh-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 dbc17b4312a420975ae12914c44d4a6d80e4783c445b12695e7ec0dee9cc7a4d
MD5 b6bad2a1f1b90af3ae683dd2fdc6eb22
BLAKE2b-256 fb3757fbaeb7221024ec761cb8d23e08d68cb69553f539de4932d17fb7998d12

See more details on using hashes here.

Provenance

The following attestation bundles were made for codemesh-0.1.7-py3-none-any.whl:

Publisher: publish.yml on gkatte/codemesh

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page