Intelligent code indexing MCP server. Tree-sitter AST extraction, hybrid search (FTS5 + vector), call graphs, 10 languages, incremental indexing.
Project description
CodeMunch Pro
Intelligent code indexing MCP server. Tree-sitter AST extraction, hybrid search (FTS5 + vector), call graphs, 10 languages, incremental indexing.
Save 99% of tokens — get exact function source via byte-offset seek instead of reading entire files.
Install
pip install codemunch-pro
Quick Start
Claude Desktop / Cline
Add to your MCP client config:
{
"mcpServers": {
"codemunch-pro": {
"command": "codemunch-pro"
}
}
}
HTTP Server
codemunch-pro --transport streamable-http --port 5002
13 MCP Tools
| Tool | Description |
|---|---|
index_folder |
Index a local directory (incremental, SHA-256 based) |
index_repo |
Index a GitHub/GitLab repo (v1.1) |
list_repos |
List all indexed repositories with stats |
invalidate_cache |
Force re-index a repository |
file_tree |
Get directory tree with file counts |
file_outline |
List symbols in a single file |
repo_outline |
List all symbols in repo (summary) |
get_symbol |
Get full source of one symbol (O(1) byte seek) |
get_symbols |
Batch get multiple symbols |
search_symbols |
Hybrid search (FTS5 + vector RRF) |
search_text |
Full-text search in file contents |
get_callees |
What does this function call? |
get_callers |
Who calls this function? |
10 Languages
Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby
All via tree-sitter-language-pack — zero compilation, pre-built binaries.
Key Features
O(1) Symbol Retrieval
Every symbol stores its byte offset and length. get_symbol seeks directly to the function source — no reading entire files. A 200-byte function from a 40KB file = 99.5% token savings.
Incremental Indexing
Files are hashed (SHA-256). Only changed files are re-parsed. Re-indexing a 10K file repo after changing one file takes milliseconds.
Hybrid Search (FTS5 + Vector)
Combines BM25 keyword matching with semantic vector similarity using Reciprocal Rank Fusion. Search "authentication middleware" and find auth_middleware, verify_token, and login_handler.
Call Graphs
Traces function calls through the AST. get_callees("main") shows what main calls. get_callers("authenticate") shows who calls authenticate. Supports depth traversal.
Full-Text Content Search
Search raw file contents — string literals, TODO comments, config values, error messages. Not just symbol names.
How It Works
- Parse — tree-sitter builds an AST for each source file
- Extract — Walk AST to find functions, classes, methods, types, interfaces
- Store — SQLite database per repo with FTS5 virtual tables
- Embed — FastEmbed (ONNX, CPU-only) generates 384-dim vectors for semantic search
- Graph — Call expressions extracted from function bodies, edges stored and resolved
- Serve — FastMCP exposes 13 tools via stdio or HTTP
Architecture
~/.codemunch-pro/
├── myproject_a1b2c3d4e5f6.db # Per-repo SQLite database
├── otherproject_7890abcdef.db
└── ...
Each DB contains:
├── files # Indexed files with SHA-256 hashes
├── symbols # Functions, classes, methods, types
├── symbols_fts # FTS5 full-text search index
├── symbols_vec # sqlite-vec 384-dim vector index
├── call_edges # Call graph (caller → callee)
└── file_content_fts # Raw file content search
Use Cases
- AI Coding Agents: Give your agent surgical access to codebases without burning context
- Code Review: Find all callers of a function before changing its signature
- Onboarding: Search symbols semantically — "where is error handling?" finds relevant code
- Refactoring: Map call graphs before moving functions between modules
- Documentation: Extract all public APIs with signatures and docstrings
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codemunch_pro-0.1.0.tar.gz.
File metadata
- Download URL: codemunch_pro-0.1.0.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d99150308c7fc5da0e8cd1c4ffab25c48467fa09bd19a9300e11c96af3889276
|
|
| MD5 |
016b25e09dcf302194a9e681d938cb05
|
|
| BLAKE2b-256 |
91eff456a9af08110c4009bc8c726e5bbd64dc24d08ea725ce857eca6f0c6cac
|
File details
Details for the file codemunch_pro-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codemunch_pro-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7707bd711e58d9bf5ac5cf1a4162c66778c60908189680c55cda794c87c7902
|
|
| MD5 |
cce62d913643eca9df490c1eef0ffb48
|
|
| BLAKE2b-256 |
9725f97b9dc19246ed2a2d865c5e19c67f33e397773ce3c4a5777a22c813a7de
|