Call-graph-aware code context retrieval for AI coding agents (MCP server + CLI)
Project description
promptify-cmax
Call-graph-aware code context retrieval for AI coding agents.
When you ask Claude Code or Cursor to fix a specific function, the agent typically falls back to grep and pulls every file that mentions the symbol — specs, plans, ADRs, unrelated definitions that happen to share a name. The model pays attention tax on all of it.
promptify-cmax returns the files most likely to need editing, ranked by distance in the call graph rather than surface name match. It exposes both an MCP server (drop-in for Claude Code, Cursor, Continue) and a CLI.
When to use this (vs. semantic retrieval)
The MCP-server space already has good tools for discovery — open-ended "how does auth work in this codebase?" questions. Tools like zilliztech/claude-context use dense embeddings to find code that's semantically close to your query. That's the right call when you don't yet know the names of the symbols you're looking for.
promptify-cmax solves the complementary problem: editing tasks where you already know the symbol. "Fix threshold_for_complexity," "why does run_query break when I change Index.upsert?" — these have a specific entry-point identifier, and the right files to read are the ones structurally connected to it (callers, callees, transitive). Embeddings can't see structural reachability; they retrieve based on token similarity, which lets unrelated namesakes contaminate results. We do FQN-aware call-graph BFS, so two helper() functions in different files are different graph nodes.
The two approaches are orthogonal and can run side-by-side as separate MCP tools. A capable agent will pick the right one for the task.
| If your task looks like… | Use |
|---|---|
| "How does X work?" / unfamiliar codebase exploration | semantic retrieval (e.g. claude-context) |
"Fix func_name" / "why does Y change when I edit Z?" / known-symbol editing |
promptify-cmax |
| Pattern-matching across the codebase ("find all calls to deprecated API") | ast-grep MCP |
Why call-graph and not embeddings, for editing tasks
On SWE-bench-Verified Python bug-fix tasks at a 30 000-token budget, structural retrieval surfaces the file the agent needs to edit at a +24.6 pp higher rate than substring grep — 41.0 % vs 16.4 % — robust across three pre-registered spikes (v0.4 / v5 / v6) at n=250, n=250, n=127. The v6 verdict is a clean PASS on a 127-instance sample fully disjoint from prior measurement runs:
| Budget | grep finds patch | structural finds patch | Δ |
|---|---|---|---|
| 5 000 tokens | 2.8 % | 16.6 % | +13.8 pp |
| 30 000 tokens | 16.4 % | 41.0 % | +24.6 pp |
| 100 000 tokens | 39.1 % | 58.6 % | +19.5 pp |
Statistics: paired Wilcoxon p = 1.2 × 10⁻⁵, BCa-99 lower bound +10.9 pp, McNemar p = 1.9 × 10⁻⁵, JZS Bayes factor ≈ 4 800, multiverse 5/5 budgets directional. Cross-spike effect-size: +0.170 → +0.213 → +0.246 (consistent across three independent samples).
Audit trail: the public claim above is lifted verbatim from SPIKE-PCM-BENCH-FULLDISJOINT-V6 VERDICT.md §"Construct ceiling". The full v0.4 → v5 → v6 spike chain — including a PARITY verdict (paired-median degenerate on binary outcomes) and a FLAGGED-PASS verdict (overlap > 30 % auto-downgrade) — is preserved in the research-spikes dossier. The discipline (ADR-0025) gates every public-surface number on a
closed-gospike's verdict.
What the bench measures: did the agent's structural retrieval surface the file the gold patch actually edits, anywhere in its ranked list, within a 30 000-token budget? It does NOT measure end-to-end editing success (whether the agent ultimately produces a correct fix); SWE-bench's evaluation harness is out of scope. The v3-era "49× lower token cost" framing was empirically falsified at n=109 and is retired.
The structural argument independent of the number: a senior engineer fixing a bug doesn't grep for the function name across the repo and read every match. They ask "what calls this, and what does this call?" That's a graph traversal, not a similarity ranking.
Status
v0.3 (general availability) — Python and TypeScript indexing, FQN-aware call resolution, MCP server, ~33 tests. Wedge claim audited via the v0.4 → v5 → v6 spike chain (see "Why call-graph and not embeddings, for editing tasks" above). License: Apache-2.0. Go / Rust / Java / C# planned for Pro tier.
Install
pip install promptify-cmax
Then index your project and wire it into Claude Code / Cursor / Continue. Five-minute walkthrough with copy-pasteable MCP config snippets and troubleshooting: QUICKSTART.md.
What it exposes
CLI:
promptify-cmax index --project-root <dir>— build / incrementally update the structural index (one-time per repo, then automatic-on-change)promptify-cmax query --project-root <dir> "<task>"— return ranked files for a task descriptionpromptify-cmax serve --project-root <dir>— run as an MCP server over stdio
MCP tools (when run as serve):
structural_context(task, top_k=5)— rank files by call-graph distance from the task's identifiersreindex()— rebuild after large code changes
How it works
- Index (one-time per repo, then incremental on file change): tree-sitter walks every Python and TypeScript source file, extracts function definitions, intra-function call sites, and module-level imports; persists everything to a single SQLite file at
.promptify/code-index.db. - Resolve (query time): given a natural-language task, extract candidate identifiers (backtick / CamelCase / snake_case / dotted paths) and intersect with the symbols actually in the index.
- BFS (query time): walk the call graph two hops in both directions; resolve each call edge to a specific
(file, function)tuple via the caller's import bindings and same-file scope, so two functions namedhelperin different files never collapse into one node. - Rank: group reached nodes by file, sort by
(distance ASC, affected-function-count DESC), return the top-k.
The discipline that makes this useful: fully-qualified-name resolution, not bare-name matching. A naive call graph treats every def main(): ... in the repo as the same node — typically 100+ collisions in any non-trivial Python project. We resolve through imports, so cross-file false positives don't enter the BFS frontier.
Roadmap
- Python + TypeScript indexing (v0.1)
- FQN-aware call resolution
- MCP server, CLI
- Go, Rust, Java, C# (Pro)
- Hosted multi-repo index (Pro)
- PR-bot / CI integration (Team)
- VSCode + JetBrains extensions
Pro / Team
This package is the open-source core. Promptify is building a hosted layer for teams (multi-repo indexing that survives laptop churn, additional language support, token-savings analytics, editor extensions, SSO/SAML, CI integration). Pricing and signup haven't shipped yet — watch the repo or open an issue if you'd like a heads-up when the hosted tier launches.
Contributing
See CONTRIBUTING.md. Issues and PRs welcome.
License
Apache-2.0. Copyright © 2026 Promptify LLC. See LICENSE and NOTICE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptify_cmax-0.3.0.tar.gz.
File metadata
- Download URL: promptify_cmax-0.3.0.tar.gz
- Upload date:
- Size: 30.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8083a30e4435b57153003bc4b284320f2904c4532270d4f0f48cf13667c6ab6f
|
|
| MD5 |
3e436f15149c656f3b2dc07c88483db4
|
|
| BLAKE2b-256 |
27bbb720302580f307918d80df8037ffa0a655da299edeaa1063032a9e575101
|
File details
Details for the file promptify_cmax-0.3.0-py3-none-any.whl.
File metadata
- Download URL: promptify_cmax-0.3.0-py3-none-any.whl
- Upload date:
- Size: 24.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3f300df44e567f54b10ca835d3fb8cbe0d1639f6a4a8369a29ff520e730da99
|
|
| MD5 |
5550d4cd52a22ee9471cd2ceb0c7bc0b
|
|
| BLAKE2b-256 |
3b6a42a10038ed917b44a2071b10abdacaca783c77128cb9bfdf5360aa8e0cec
|