Skip to main content

Intelligent code indexing MCP server. Tree-sitter AST extraction, hybrid search (FTS5 + vector), call graphs, 10 languages, incremental indexing.

Project description

CodeMunch Pro

Intelligent code indexing MCP server. Tree-sitter AST extraction, hybrid search (FTS5 + vector), call graphs, 10 languages, incremental indexing.

Save 99% of tokens — get exact function source via byte-offset seek instead of reading entire files.

Install

pip install codemunch-pro

Quick Start

Claude Desktop / Cline

Add to your MCP client config:

{
  "mcpServers": {
    "codemunch-pro": {
      "command": "codemunch-pro"
    }
  }
}

HTTP Server

codemunch-pro --transport streamable-http --port 5002

13 MCP Tools

Tool Description
index_folder Index a local directory (incremental, SHA-256 based)
index_repo Index a GitHub/GitLab repo (v1.1)
list_repos List all indexed repositories with stats
invalidate_cache Force re-index a repository
file_tree Get directory tree with file counts
file_outline List symbols in a single file
repo_outline List all symbols in repo (summary)
get_symbol Get full source of one symbol (O(1) byte seek)
get_symbols Batch get multiple symbols
search_symbols Hybrid search (FTS5 + vector RRF)
search_text Full-text search in file contents
get_callees What does this function call?
get_callers Who calls this function?

10 Languages

Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby

All via tree-sitter-language-pack — zero compilation, pre-built binaries.

Key Features

O(1) Symbol Retrieval

Every symbol stores its byte offset and length. get_symbol seeks directly to the function source — no reading entire files. A 200-byte function from a 40KB file = 99.5% token savings.

Incremental Indexing

Files are hashed (SHA-256). Only changed files are re-parsed. Re-indexing a 10K file repo after changing one file takes milliseconds.

Hybrid Search (FTS5 + Vector)

Combines BM25 keyword matching with semantic vector similarity using Reciprocal Rank Fusion. Search "authentication middleware" and find auth_middleware, verify_token, and login_handler.

Call Graphs

Traces function calls through the AST. get_callees("main") shows what main calls. get_callers("authenticate") shows who calls authenticate. Supports depth traversal.

Full-Text Content Search

Search raw file contents — string literals, TODO comments, config values, error messages. Not just symbol names.

How It Works

  1. Parse — tree-sitter builds an AST for each source file
  2. Extract — Walk AST to find functions, classes, methods, types, interfaces
  3. Store — SQLite database per repo with FTS5 virtual tables
  4. Embed — FastEmbed (ONNX, CPU-only) generates 384-dim vectors for semantic search
  5. Graph — Call expressions extracted from function bodies, edges stored and resolved
  6. Serve — FastMCP exposes 13 tools via stdio or HTTP

Architecture

~/.codemunch-pro/
├── myproject_a1b2c3d4e5f6.db    # Per-repo SQLite database
├── otherproject_7890abcdef.db
└── ...

Each DB contains:
├── files          # Indexed files with SHA-256 hashes
├── symbols        # Functions, classes, methods, types
├── symbols_fts    # FTS5 full-text search index
├── symbols_vec    # sqlite-vec 384-dim vector index
├── call_edges     # Call graph (caller → callee)
└── file_content_fts  # Raw file content search

Use Cases

  • AI Coding Agents: Give your agent surgical access to codebases without burning context
  • Code Review: Find all callers of a function before changing its signature
  • Onboarding: Search symbols semantically — "where is error handling?" finds relevant code
  • Refactoring: Map call graphs before moving functions between modules
  • Documentation: Extract all public APIs with signatures and docstrings

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codemunch_pro-0.1.1.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codemunch_pro-0.1.1-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file codemunch_pro-0.1.1.tar.gz.

File metadata

  • Download URL: codemunch_pro-0.1.1.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for codemunch_pro-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3acb66cbfd227f0fe3543cb445f61b417d3a52422b93e1caf4add9428c5eae90
MD5 da16104d9d8356e33e45db56c67c3c1b
BLAKE2b-256 7329034a65b2af94b8e4e93464fb1489d28a038e7c3b3944faab276c1e645c22

See more details on using hashes here.

File details

Details for the file codemunch_pro-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: codemunch_pro-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for codemunch_pro-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dfc08fe9776d308454a0dfba93c526328cc7bcfb4ade52c36ec3155906349889
MD5 129ce4fce96ca8d29ddef95300cf3fc2
BLAKE2b-256 476048a8ce7a9c4b3672e3bab57b8b1ffb93ebb5d43fe20074f13d01e199a209

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page