Skip to main content

Call-graph-aware code context retrieval for AI coding agents (MCP server + CLI)

Project description

promptify-cmax

Call-graph-aware code context retrieval for AI coding agents.

When you ask Claude Code or Cursor to fix a specific function, the agent typically falls back to grep and pulls every file that mentions the symbol — specs, plans, ADRs, unrelated definitions that happen to share a name. The model pays attention tax on all of it.

promptify-cmax returns the files most likely to need editing, ranked by distance in the call graph rather than surface name match. It exposes both an MCP server (drop-in for Claude Code, Cursor, Continue) and a CLI.

When to use this (vs. semantic retrieval)

The MCP-server space already has good tools for discovery — open-ended "how does auth work in this codebase?" questions. Tools like zilliztech/claude-context use dense embeddings to find code that's semantically close to your query. That's the right call when you don't yet know the names of the symbols you're looking for.

promptify-cmax solves the complementary problem: editing tasks where you already know the symbol. "Fix threshold_for_complexity," "why does run_query break when I change Index.upsert?" — these have a specific entry-point identifier, and the right files to read are the ones structurally connected to it (callers, callees, transitive). Embeddings can't see structural reachability; they retrieve based on token similarity, which lets unrelated namesakes contaminate results. We do FQN-aware call-graph BFS, so two helper() functions in different files are different graph nodes.

The two approaches are orthogonal and can run side-by-side as separate MCP tools. A capable agent will pick the right one for the task.

If your task looks like… Use
"How does X work?" / unfamiliar codebase exploration semantic retrieval (e.g. claude-context)
"Fix func_name" / "why does Y change when I edit Z?" / known-symbol editing promptify-cmax
Pattern-matching across the codebase ("find all calls to deprecated API") ast-grep MCP

Why call-graph and not embeddings, for editing tasks

On SWE-bench-Verified Python bug-fix tasks at a 30 000-token budget, structural retrieval surfaces the file the agent needs to edit at a +24.6 pp higher rate than substring grep — 41.0 % vs 16.4 % — robust across three pre-registered spikes (v0.4 / v5 / v6) at n=250, n=250, n=127. The v6 verdict is a clean PASS on a 127-instance sample fully disjoint from prior measurement runs:

Budget grep finds patch structural finds patch Δ
5 000 tokens 2.8 % 16.6 % +13.8 pp
30 000 tokens 16.4 % 41.0 % +24.6 pp
100 000 tokens 39.1 % 58.6 % +19.5 pp

Statistics: paired Wilcoxon p = 1.2 × 10⁻⁵, BCa-99 lower bound +10.9 pp, McNemar p = 1.9 × 10⁻⁵, JZS Bayes factor ≈ 4 800, multiverse 5/5 budgets directional. Cross-spike effect-size: +0.170 → +0.213 → +0.246 (consistent across three independent samples).

Audit trail: the public claim above is lifted verbatim from SPIKE-PCM-BENCH-FULLDISJOINT-V6 VERDICT.md §"Construct ceiling". The full v0.4 → v5 → v6 spike chain — including a PARITY verdict (paired-median degenerate on binary outcomes) and a FLAGGED-PASS verdict (overlap > 30 % auto-downgrade) — is preserved in the research-spikes dossier. The discipline (ADR-0025) gates every public-surface number on a closed-go spike's verdict.

What the bench measures: did the agent's structural retrieval surface the file the gold patch actually edits, anywhere in its ranked list, within a 30 000-token budget? It does NOT measure end-to-end editing success (whether the agent ultimately produces a correct fix); SWE-bench's evaluation harness is out of scope. The v3-era "49× lower token cost" framing was empirically falsified at n=109 and is retired.

The structural argument independent of the number: a senior engineer fixing a bug doesn't grep for the function name across the repo and read every match. They ask "what calls this, and what does this call?" That's a graph traversal, not a similarity ranking.

Status

v0.3 (general availability) — Python and TypeScript indexing, FQN-aware call resolution, MCP server, ~33 tests. Wedge claim audited via the v0.4 → v5 → v6 spike chain (see "Why call-graph and not embeddings, for editing tasks" above). License: Apache-2.0. Go / Rust / Java / C# planned for Pro tier.

Install

pip install promptify-cmax

Then index your project and wire it into Claude Code / Cursor / Continue. Five-minute walkthrough with copy-pasteable MCP config snippets and troubleshooting: QUICKSTART.md.

What it exposes

CLI:

  • promptify-cmax index --project-root <dir> — build / incrementally update the structural index (one-time per repo, then automatic-on-change)
  • promptify-cmax query --project-root <dir> "<task>" — return ranked files for a task description
  • promptify-cmax serve --project-root <dir> — run as an MCP server over stdio

MCP tools (when run as serve):

  • structural_context(task, top_k=5) — rank files by call-graph distance from the task's identifiers
  • reindex() — rebuild after large code changes

How it works

  1. Index (one-time per repo, then incremental on file change): tree-sitter walks every Python and TypeScript source file, extracts function definitions, intra-function call sites, and module-level imports; persists everything to a single SQLite file at .promptify/code-index.db.
  2. Resolve (query time): given a natural-language task, extract candidate identifiers (backtick / CamelCase / snake_case / dotted paths) and intersect with the symbols actually in the index.
  3. BFS (query time): walk the call graph two hops in both directions; resolve each call edge to a specific (file, function) tuple via the caller's import bindings and same-file scope, so two functions named helper in different files never collapse into one node.
  4. Rank: group reached nodes by file, sort by (distance ASC, affected-function-count DESC), return the top-k.

The discipline that makes this useful: fully-qualified-name resolution, not bare-name matching. A naive call graph treats every def main(): ... in the repo as the same node — typically 100+ collisions in any non-trivial Python project. We resolve through imports, so cross-file false positives don't enter the BFS frontier.

Roadmap

  • Python + TypeScript indexing (v0.1)
  • FQN-aware call resolution
  • MCP server, CLI
  • Go, Rust, Java, C# (Pro)
  • Hosted multi-repo index (Pro)
  • PR-bot / CI integration (Team)
  • VSCode + JetBrains extensions

Pro / Team

This package is the open-source core. Promptify is building a hosted layer for teams (multi-repo indexing that survives laptop churn, additional language support, token-savings analytics, editor extensions, SSO/SAML, CI integration). Pricing and signup haven't shipped yet — watch the repo or open an issue if you'd like a heads-up when the hosted tier launches.

Contributing

See CONTRIBUTING.md. Issues and PRs welcome.

License

Apache-2.0. Copyright © 2026 Promptify LLC. See LICENSE and NOTICE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptify_cmax-0.3.0.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptify_cmax-0.3.0-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file promptify_cmax-0.3.0.tar.gz.

File metadata

  • Download URL: promptify_cmax-0.3.0.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for promptify_cmax-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8083a30e4435b57153003bc4b284320f2904c4532270d4f0f48cf13667c6ab6f
MD5 3e436f15149c656f3b2dc07c88483db4
BLAKE2b-256 27bbb720302580f307918d80df8037ffa0a655da299edeaa1063032a9e575101

See more details on using hashes here.

File details

Details for the file promptify_cmax-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: promptify_cmax-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for promptify_cmax-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b3f300df44e567f54b10ca835d3fb8cbe0d1639f6a4a8369a29ff520e730da99
MD5 5550d4cd52a22ee9471cd2ceb0c7bc0b
BLAKE2b-256 3b6a42a10038ed917b44a2071b10abdacaca783c77128cb9bfdf5360aa8e0cec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page