Skip to main content

Local code-intelligence engine: one call returns all the code related to a question, explained with the real code spliced in.

Project description

megabrain

megabrain

One call returns all the code related to a question
— explained like a senior engineer, with the real code spliced in.

Python 3.10+ No LLM in the retrieval path Zero code hallucination MCP ready


megabrain is a local code-intelligence engine. It replaces minutes of file-by-file crawling — grep, read, explore-agent chains — with a single grounded answer. Index a repo once; every later question retrieves all the related code and stitches it into a walkthrough narrated by an LLM that can only point at code, never rewrite it — so nothing is hallucinated.

Install

No packaging step — runs straight from a clone:

git clone https://github.com/pinecall/megabrain.git
cd megabrain
pip install numpy                                # core (Python indexing)
pip install tree_sitter tree_sitter_typescript  # TS/JS (+ tree_sitter_ruby tree_sitter_go for Ruby/Go)
alias megabrain='python3 -m megabrain.cli'       # optional: clean invocation

Keys are read from the environment (with a ~/.zshrc fallback):

export PERPLEXITY_API_KEY=...   # required — embeddings
export ANTHROPIC_API_KEY=...    # only for `ask` and `--best`

Usage

megabrain index  ~/repo                                      # incremental (sha256), no daemon
megabrain ask    ~/repo "how does auth work end to end"      # walkthrough + real code (~6–20s)
megabrain ask    ~/repo "how do I configure X" --docs        # explain the docs instead of code
megabrain query  ~/repo "request retry logic"                # raw code map, no LLM (~200ms)
megabrain get    ~/repo src/x.py --symbol Class.method       # one file or symbol

Indexes code (.py · .ts · .tsx · .js · .jsx · .mjs · .cjs · Ruby · Go) and markdown (.md · .markdown · .mdx) through a strategy registry — adding a language or content type is a config entry, not a branch in the indexer.

How it works

A three-stage pipeline. Only ask calls an LLM — and only to narrate.

stage what it does
index cAST chunk → Perplexity embed (int8, L2-normalized) → SQLite. Incremental by sha256, no watcher.
query No-LLM retrieval (~200ms): dense-chunk + file-skeleton fusion, with import/call-graph candidates. Returns a map — CORE (full code of the top files) + RELATED (every connected file with its best chunk).
ask One streamed Haiku call writes the walkthrough and cites code as [[k]]; the engine replaces each citation with the verbatim block (real file, real line numbers). Non-cited related files are listed at the end. Fail-open: any API error falls back to the full query bundle.

Because the model only emits citations and the engine splices code from disk, code cannot be hallucinated or rewritten.

MCP

Use it from Claude Code or any MCP client:

claude mcp add megabrain -- python3 -m megabrain.mcp_server

Tools: megabrain_ask (primary), megabrain_query, megabrain_get, megabrain_index. The server auto-refreshes a stale index before answering, so results always match disk.

Design

Every choice below is backed by an internal golden set (30 verified queries):

decision evidence
cAST chunking (4K nws chars, breadcrumbs, partition-guaranteed) unit-tested; every line lands in exactly one chunk — no gaps, no overlaps
pplx-embed-v1 (1024-d, int8 wire, L2-normalized) beats openai-3-large on code; ~$0.0016/repo
dense chunk + 0.5 × file-skeleton score dual-granularity; precision up, no downside
graph (import + call edges) for candidates only PageRank-as-ranking rejected by data (Acc@1 0.91 → 0.73)
no LLM in the retrieval path every LLM prune variant cost completeness; ask explains, it never prunes

Engine retrieval (internal golden set): R@1 0.86 · bundle_full 1.00 · p50 8 ms warm. SWE-bench Lite localization (no training): retrieval Acc@1 ≈ 0.52 / @5 ≈ 0.83 — on par with the trained CodeRankEmbed retriever.

Project layout

megabrain/   engine — chunkers, embeddings, SQLite store, graph, indexer, query, ask, cli, mcp_server
evals/       golden.json (30 verified queries) + swebench harness
tests/       engine + chunker gates

github.com/pinecall/megabrain

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megabrain-0.1.1.tar.gz (53.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

megabrain-0.1.1-py3-none-any.whl (52.1 kB view details)

Uploaded Python 3

File details

Details for the file megabrain-0.1.1.tar.gz.

File metadata

  • Download URL: megabrain-0.1.1.tar.gz
  • Upload date:
  • Size: 53.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for megabrain-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2b4d86b5340ca23e5eaa9bd54a011727a77f70cdba0c4fc483f7229776a733a3
MD5 c1833bf215803c98823b2583b5acab73
BLAKE2b-256 c171f348607ecf4b49e75cb2cf64ea97ec657556276d9e7ae4eaa39dbee5327c

See more details on using hashes here.

File details

Details for the file megabrain-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: megabrain-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 52.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for megabrain-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4c073d5698292c2eda69e71440ba451f565f34bfa9169853e3e915b0f1398075
MD5 499a41a600b1c18ec74c07fa965af587
BLAKE2b-256 186c6c28db29c6c9bcb0820c763c5acd8f826dc205253ae1d37f424f3e82d178

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page