Skip to main content

Local code-intelligence engine: one call returns all the code related to a question, explained with the real code spliced in.

Project description

megabrain

megabrain

One call returns all the code related to a question
— explained like a senior engineer, with the real code spliced in.

Python 3.10+ No LLM in the retrieval path Zero code hallucination MCP ready


megabrain is a local code-intelligence engine. It replaces minutes of file-by-file crawling — grep, read, explore-agent chains — with a single grounded answer. Index a repo once; every later question retrieves all the related code and stitches it into a walkthrough narrated by an LLM that can only point at code, never rewrite it — so nothing is hallucinated.

Install

No packaging step — runs straight from a clone:

git clone https://github.com/pinecall/megabrain.git
cd megabrain
pip install numpy                                # core (Python indexing)
pip install tree_sitter tree_sitter_typescript  # TS/JS (+ tree_sitter_ruby tree_sitter_go for Ruby/Go)
alias megabrain='python3 -m megabrain.cli'       # optional: clean invocation

Keys are read from the environment (with a ~/.zshrc fallback):

export PERPLEXITY_API_KEY=...   # required — embeddings
export ANTHROPIC_API_KEY=...    # only for `ask` and `--best`

Usage

megabrain index  ~/repo                                      # incremental (sha256), no daemon
megabrain ask    ~/repo "how does auth work end to end"      # walkthrough + real code (~6–20s)
megabrain ask    ~/repo "how do I configure X" --docs        # explain the docs instead of code
megabrain query  ~/repo "request retry logic"                # raw code map, no LLM (~200ms)
megabrain get    ~/repo src/x.py --symbol Class.method       # one file or symbol

Indexes code (.py · .ts · .tsx · .js · .jsx · .mjs · .cjs · Ruby · Go) and markdown (.md · .markdown · .mdx) through a strategy registry — adding a language or content type is a config entry, not a branch in the indexer.

How it works

A three-stage pipeline. Only ask calls an LLM — and only to narrate.

stage what it does
index cAST chunk → Perplexity embed (int8, L2-normalized) → SQLite. Incremental by sha256, no watcher.
query No-LLM retrieval (~200ms): dense-chunk + file-skeleton fusion, with import/call-graph candidates. Returns a map — CORE (full code of the top files) + RELATED (every connected file with its best chunk).
ask One streamed Haiku call writes the walkthrough and cites code as [[k]]; the engine replaces each citation with the verbatim block (real file, real line numbers). Non-cited related files are listed at the end. Fail-open: any API error falls back to the full query bundle.

Because the model only emits citations and the engine splices code from disk, code cannot be hallucinated or rewritten.

MCP

Use it from Claude Code or any MCP client:

claude mcp add megabrain -- python3 -m megabrain.mcp_server

Tools: megabrain_ask (primary), megabrain_query, megabrain_get, megabrain_index. The server auto-refreshes a stale index before answering, so results always match disk.

Design

Every choice below is backed by an internal golden set (30 verified queries):

decision evidence
cAST chunking (4K nws chars, breadcrumbs, partition-guaranteed) unit-tested; every line lands in exactly one chunk — no gaps, no overlaps
pplx-embed-v1 (1024-d, int8 wire, L2-normalized) beats openai-3-large on code; ~$0.0016/repo
dense chunk + 0.5 × file-skeleton score dual-granularity; precision up, no downside
graph (import + call edges) for candidates only PageRank-as-ranking rejected by data (Acc@1 0.91 → 0.73)
no LLM in the retrieval path every LLM prune variant cost completeness; ask explains, it never prunes

Engine retrieval (internal golden set): R@1 0.86 · bundle_full 1.00 · p50 8 ms warm. SWE-bench Lite localization (no training): retrieval Acc@1 ≈ 0.52 / @5 ≈ 0.83 — on par with the trained CodeRankEmbed retriever.

Project layout

megabrain/   engine — chunkers, embeddings, SQLite store, graph, indexer, query, ask, cli, mcp_server
evals/       golden.json (30 verified queries) + swebench harness
tests/       engine + chunker gates

github.com/pinecall/megabrain

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megabrain-0.1.2.tar.gz (53.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

megabrain-0.1.2-py3-none-any.whl (52.6 kB view details)

Uploaded Python 3

File details

Details for the file megabrain-0.1.2.tar.gz.

File metadata

  • Download URL: megabrain-0.1.2.tar.gz
  • Upload date:
  • Size: 53.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for megabrain-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3af17ddc383e7212164b588a0bab9b5a76b3829a1b3ca9ca3cbf44e079c45357
MD5 87334c090fe2f57bee18a5c82cc71c4e
BLAKE2b-256 fb4a0b7797d53c1723014e0bec9d05a2cc88d72f4b11be359f1367ff3b8cc122

See more details on using hashes here.

File details

Details for the file megabrain-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: megabrain-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 52.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for megabrain-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 69ed7f2b29738915b1bcba4673805929cc27d74a5d66b3857f3956b5ec266646
MD5 ffb72ec054fb8b8c009b68f477742502
BLAKE2b-256 ce09fa29ae34e9195a539abf123253d49162fa2263b4dfaa344f01b3daeaf0e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page