Local code-intelligence engine: one call returns all the code related to a question, explained with the real code spliced in.
Project description
megabrain
One call returns all the code related to a question
— explained like a senior engineer, with the real code spliced in.
megabrain is a local code-intelligence engine. It replaces minutes of file-by-file crawling — grep, read, explore-agent chains — with a single grounded answer. Index a repo once; every later question retrieves all the related code and stitches it into a walkthrough narrated by an LLM that can only point at code, never rewrite it — so nothing is hallucinated.
Install
No packaging step — runs straight from a clone:
git clone https://github.com/pinecall/megabrain.git
cd megabrain
pip install numpy # core (Python indexing)
pip install tree_sitter tree_sitter_typescript # TS/JS (+ tree_sitter_ruby tree_sitter_go for Ruby/Go)
alias megabrain='python3 -m megabrain.cli' # optional: clean invocation
Keys are read from the environment (with a ~/.zshrc fallback):
export PERPLEXITY_API_KEY=... # required — embeddings
export ANTHROPIC_API_KEY=... # only for `ask` and `--best`
Usage
megabrain index ~/repo # incremental (sha256), no daemon
megabrain ask ~/repo "how does auth work end to end" # walkthrough + real code (~6–20s)
megabrain ask ~/repo "how do I configure X" --docs # explain the docs instead of code
megabrain query ~/repo "request retry logic" # raw code map, no LLM (~200ms)
megabrain get ~/repo src/x.py --symbol Class.method # one file or symbol
Indexes code (.py · .ts · .tsx · .js · .jsx · .mjs · .cjs · Ruby · Go) and
markdown (.md · .markdown · .mdx) through a strategy registry — adding a language
or content type is a config entry, not a branch in the indexer.
How it works
A three-stage pipeline. Only ask calls an LLM — and only to narrate.
| stage | what it does |
|---|---|
| index | cAST chunk → Perplexity embed (int8, L2-normalized) → SQLite. Incremental by sha256, no watcher. |
| query | No-LLM retrieval (~200ms): dense-chunk + file-skeleton fusion, with import/call-graph candidates. Returns a map — CORE (full code of the top files) + RELATED (every connected file with its best chunk). |
| ask | One streamed Haiku call writes the walkthrough and cites code as [[k]]; the engine replaces each citation with the verbatim block (real file, real line numbers). Non-cited related files are listed at the end. Fail-open: any API error falls back to the full query bundle. |
Because the model only emits citations and the engine splices code from disk, code cannot be hallucinated or rewritten.
MCP
Use it from Claude Code or any MCP client:
claude mcp add megabrain -- python3 -m megabrain.mcp_server
Tools: megabrain_ask (primary), megabrain_query, megabrain_get, megabrain_index.
The server auto-refreshes a stale index before answering, so results always match disk.
Design
Every choice below is backed by an internal golden set (30 verified queries):
| decision | evidence |
|---|---|
| cAST chunking (4K nws chars, breadcrumbs, partition-guaranteed) | unit-tested; every line lands in exactly one chunk — no gaps, no overlaps |
pplx-embed-v1 (1024-d, int8 wire, L2-normalized) |
beats openai-3-large on code; ~$0.0016/repo |
| dense chunk + 0.5 × file-skeleton score | dual-granularity; precision up, no downside |
| graph (import + call edges) for candidates only | PageRank-as-ranking rejected by data (Acc@1 0.91 → 0.73) |
| no LLM in the retrieval path | every LLM prune variant cost completeness; ask explains, it never prunes |
Engine retrieval (internal golden set): R@1 0.86 · bundle_full 1.00 · p50 8 ms warm. SWE-bench Lite localization (no training): retrieval Acc@1 ≈ 0.52 / @5 ≈ 0.83 — on par with the trained CodeRankEmbed retriever.
Project layout
megabrain/ engine — chunkers, embeddings, SQLite store, graph, indexer, query, ask, cli, mcp_server
evals/ golden.json (30 verified queries) + swebench harness
tests/ engine + chunker gates
github.com/pinecall/megabrain
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file megabrain-0.1.2.tar.gz.
File metadata
- Download URL: megabrain-0.1.2.tar.gz
- Upload date:
- Size: 53.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3af17ddc383e7212164b588a0bab9b5a76b3829a1b3ca9ca3cbf44e079c45357
|
|
| MD5 |
87334c090fe2f57bee18a5c82cc71c4e
|
|
| BLAKE2b-256 |
fb4a0b7797d53c1723014e0bec9d05a2cc88d72f4b11be359f1367ff3b8cc122
|
File details
Details for the file megabrain-0.1.2-py3-none-any.whl.
File metadata
- Download URL: megabrain-0.1.2-py3-none-any.whl
- Upload date:
- Size: 52.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69ed7f2b29738915b1bcba4673805929cc27d74a5d66b3857f3956b5ec266646
|
|
| MD5 |
ffb72ec054fb8b8c009b68f477742502
|
|
| BLAKE2b-256 |
ce09fa29ae34e9195a539abf123253d49162fa2263b4dfaa344f01b3daeaf0e1
|