Privacy-first, offline knowledge graph for developers
Project description
NervaPack
NervaPack is a privacy-first, offline knowledge graph for your codebase. It solves two fundamental problems with standard Vector RAG:
- Token waste — chunk-based RAG retrieves blobs of text that may only tangentially relate to your query, bloating your context window.
- Privacy risk — sending code to cloud embedding APIs leaks your proprietary logic.
NervaPack runs 100% on your machine. It uses tree-sitter to parse your codebase into a deterministic Abstract Syntax Tree graph, then uses a local Ollama model to draw hard semantic edges between your documentation and your code. Queries traverse this graph with a K-Hop BFS, returning a hyper-targeted, token-efficient context window — no cloud required.
Why NervaPack vs. standard Vector RAG
| Standard Vector RAG | NervaPack | |
|---|---|---|
| Parsing | Arbitrary text chunks | Deterministic AST nodes (class, function, import) |
| Retrieval | Nearest-neighbor blob | K-Hop BFS on a structural graph |
| Doc ↔ Code links | None | Hard EXPLAINS edges drawn by local LLM |
| Privacy | Cloud embeddings | 100% local (ChromaDB + Ollama) |
| Incremental sync | Re-index everything | Surgical per-file update via GitPython diff |
| Token savings | No measurement | Built-in dashboard shows exact reduction per query |
| Graph visibility | Black box | Interactive HTML visualization of every node and edge |
Prerequisites
- Python 3.10+
- Ollama — install from ollama.com, then pull a model:
ollama pull llama3
NervaPack defaults tollama3. Any model that can follow instructions works. - Git — your project must be a git repository (
git initif not).
Installation
Option A — Homebrew (Mac/Linux, recommended)
brew tap ramdhavepreetam/nervapack
brew install nervapack
Option B — pipx (any platform, cleanest Python install)
pipx install nervapack
Option C — pip
pip install nervapack
With exact token counting:
pip install "nervapack[metrics]" # adds tiktoken for precise token counts
With MCP server (for Claude Code / Cursor / any MCP-compatible tool):
pip install "nervapack[mcp]"
On first run,
chromadbdownloadsonnxruntimeembedding models to your cache andtree-sittercompiles its language bindings. This is a one-time setup (~1–2 min).
Quick Start
cd your-project/
# 1. Build the knowledge graph (run once)
nervapack ingest .
# 2. Query for context — see focused results + token savings dashboard
nervapack query "How does authentication work?"
# 3. Visualize the graph in your browser
nervapack visualize
# 4. After modifying files, sync the graph incrementally
nervapack sync .
# 5. Check graph health
nervapack status
Command Reference
nervapack ingest [PATH]
Scans PATH (default: .) and builds the full knowledge graph.
What happens:
tree-sitterparses source files into Classes, Functions, and Imports — exact AST nodes, not text chunks.- All
.mdfiles are chunked by header hierarchy. - Each Markdown chunk is sent to your local Ollama model. If the model identifies a code entity the prose explains, a hard
EXPLAINSedge is written into the graph. - All nodes are embedded and stored in a local ChromaDB instance (
.nervapack/chroma_db).
The initial LLM binding pass is the slowest step. On a large repo with many docs, budget several minutes.
Supported languages (bundled): Python, JavaScript, JSX, TypeScript, TSX
Additional languages (optional extras):
pip install "nervapack[go]" # Go
pip install "nervapack[rust]" # Rust
pip install "nervapack[java]" # Java
pip install "nervapack[c]" # C / C headers
pip install "nervapack[cpp]" # C++
pip install "nervapack[ruby]" # Ruby
pip install "nervapack[csharp]" # C#
pip install "nervapack[all-languages]" # everything above
nervapack query PROMPT
Retrieves context from the graph for a natural-language prompt, then prints a token savings dashboard comparing NervaPack against naive RAG.
What happens:
- The prompt is embedded and ChromaDB returns the top-3 most semantically similar nodes.
- Those nodes seed a K-Hop Breadth-First Search (
max_hops=1) through the NetworkX graph. - Adjacent nodes — including any Markdown docs linked via
EXPLAINSedges — are collected into a compressed Markdown snippet. - The token efficiency panel is printed showing how many tokens were saved vs. sending the raw files.
Example output:
Running query: How does the CLI work?
Found 3 seed nodes. Traversing graph...
--- Retrieved Context ---
# NervaPack Context Retrieval
## File: src/nervapack/cli.py
### FUNCTION: query (L200-L242)
...
--- End Context ---
╭────────────── NervaPack Token Efficiency ──────────────╮
│ Strategy Tokens Visual Relative │
│ Naive RAG (3 files) 12,840 ████████████████ 100% │
│ NervaPack 1,180 █░░░░░░░░░░░░░░░ 9.2% │
│ ──────────────────────────────────────────────────────────│
│ Tokens saved: 11,660 Reduction: 90.8% │
│ Cost saved (GPT-4o $2.50/1M): $0.0292 per query │
│ Cost saved (Claude Sonnet $3/1M): $0.0350 per query │
╰───────────────────────────────────────────────────────────╯
"Naive RAG" is defined as the full content of every source file that contains a matched node — the maximum a standard "find relevant files, dump them whole" approach would send to an LLM. The comparison is honest and conservative.
Install nervapack[metrics] for exact token counts via tiktoken. Without it, a character-based estimate is used and marked with ~.
The context output is designed to be pasted directly into an LLM prompt.
nervapack visualize
Renders the knowledge graph as an interactive HTML file and opens it in your browser.
nervapack visualize # saves to .nervapack/graph.html
nervapack visualize --output ~/my-graph.html # custom output path
nervapack visualize --no-browser # generate without opening
What the visualization shows:
- Node shapes: diamonds = files, dots = all other entities
- Node colors: blue = file, green = function, amber = class, gray = import, lavender = markdown
- Edge styles: solid =
DEFINES, dashed =EXPLAINS - Hover tooltips: type, name, file, line range, and a code preview
- Interactive: drag, zoom, click — spring-force physics layout
The graph is a static HTML file with no external dependencies — share it, open it offline, or embed it in docs.
nervapack sync [PATH]
Incrementally updates the graph for files changed since the last ingest.
What happens:
GitPythondiffs your working tree to find modified and deleted files.- For each changed file, old graph nodes and ChromaDB vectors are pruned.
- Only the changed files are re-parsed and re-ingested.
A full ingest on a large codebase can take minutes. sync turns that into a 2–5 second surgical update.
nervapack status
Prints the current state of the graph: node count, edge count, and any files that are out of sync with the graph.
Configuration
NervaPack reads the Ollama model from the LLMSummarizer class (src/nervapack/llm/summarizer.py). To use a different model, set model to any model you have pulled locally:
# src/nervapack/llm/summarizer.py
self.model = "phi3" # or "mistral", "codellama", etc.
Ollama is expected at http://localhost:11434 (its default). To use a remote Ollama instance, set OLLAMA_HOST:
OLLAMA_HOST=http://my-server:11434 nervapack ingest .
Architecture
nervapack ingest .
│
├─ ASTParser (tree-sitter) 16 extensions, 9 languages
│ └─ ParsedEntity[]: class, function, import
│
├─ GraphBuilder (NetworkX DiGraph)
│ ├─ Nodes: file, class, function, import, markdown
│ └─ Edges: DEFINES, EXPLAINS
│
├─ LLMSummarizer (Ollama)
│ └─ Draws EXPLAINS edges: markdown → code entity
│
└─ VectorStore (ChromaDB)
└─ Embeds node summaries for semantic search
nervapack query "..."
│
├─ VectorStore.search() → seed node IDs
├─ GraphRetriever.retrieve_context() → BFS subgraph → Markdown
└─ TokenMeter → savings vs. naive RAG (tokens, %, cost)
nervapack visualize
│
└─ Visualizer (pyvis) → .nervapack/graph.html
Storage layout (inside your project root):
.nervapack/
├── graph.graphml # NetworkX graph (deterministic structure)
├── graph.html # Interactive visualization (generated by visualize)
└── chroma_db/ # ChromaDB (semantic embeddings)
Source modules:
| Module | Responsibility |
|---|---|
nervapack.parser.language_registry |
Declarative registry of 16 file extensions and their tree-sitter grammars |
nervapack.parser.ast_parser |
Tree-sitter parsing → ParsedEntity objects |
nervapack.parser.md_chunker |
Markdown → header-delimited chunks |
nervapack.graph.builder |
Build and persist the NetworkX DiGraph |
nervapack.graph.vector_store |
ChromaDB ingest and semantic search |
nervapack.graph.retrieval |
K-Hop BFS context extraction |
nervapack.graph.visualizer |
pyvis interactive HTML export |
nervapack.graph.token_meter |
Token counting and savings panel |
nervapack.llm.summarizer |
Local Ollama interface for LLM binding |
nervapack.git.tracker |
GitPython diff for incremental sync |
Privacy
NervaPack is 100% offline. No code, documentation, or query ever leaves your machine:
- Embeddings are generated by ChromaDB's built-in local model.
- LLM calls go exclusively to
localhost:11434(your Ollama instance). - All graph and vector data is stored in
.nervapack/inside your project.
Add .nervapack/ to your .gitignore to keep it out of version control.
Using NervaPack in LLM Developer Tools
NervaPack ships a built-in MCP server, so any MCP-compatible tool (Claude Code, Cursor, etc.) can use it as a native context provider — no custom code required.
Setup (one-time per project)
1. Install the MCP extra:
pip install "nervapack[mcp]"
2. Build the graph:
nervapack ingest .
3. Add .mcp.json to your project root:
{
"mcpServers": {
"nervapack": {
"command": "nervapack-mcp",
"description": "NervaPack knowledge graph — query_codebase, graph_status, list_entities"
}
}
}
That's it. Reload your MCP-compatible tool and NervaPack's tools appear automatically.
Tools exposed
| Tool | What it does |
|---|---|
query_codebase(prompt, max_hops?) |
Vector search → K-Hop BFS → focused Markdown context + token savings summary |
graph_status() |
Node/edge counts by type, language breakdown, unsynced file warnings |
list_entities(entity_type?, file_path?) |
Browse all indexed classes, functions, imports, markdown docs |
How Claude Code uses it
Once .mcp.json is in place, Claude Code automatically calls query_codebase before answering questions about the codebase. Instead of reading whole files, it gets a surgical subgraph of only the relevant code — the same token savings you see in the CLI dashboard, applied to every single response.
You: "How does the sync command decide which files to re-ingest?"
Claude: → calls query_codebase("sync command file re-ingest logic")
→ gets 1,180 tokens of focused context (vs 12,840 tokens naive)
→ answers precisely, citing exact line numbers
Keeping the graph fresh
# After modifying files
nervapack sync .
# Check if Claude's context is stale
# (graph_status tool reports this automatically)
Python SDK (for building your own tool)
from nervapack.graph.builder import GraphBuilder
from nervapack.graph.vector_store import VectorStore
from nervapack.graph.retrieval import GraphRetriever
graph = GraphBuilder().load_graph()
retriever = GraphRetriever(graph)
results = VectorStore().search("your query", n_results=3)
start_nodes = results["ids"][0]
subgraph = retriever.retrieve_context(start_nodes, max_hops=1)
context = retriever.format_as_markdown(subgraph)
# Inject context into your LLM system prompt
Contributing
- Fork the repo and create a branch.
- Make your changes with tests where applicable.
- Open a pull request against
master.
Bug reports and feature requests go to the issue tracker.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nervapack-0.3.0.tar.gz.
File metadata
- Download URL: nervapack-0.3.0.tar.gz
- Upload date:
- Size: 30.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76c880ba3897923de8e37afed5f34efeff940ef915ce5ea0cf4404524fdf3c03
|
|
| MD5 |
65ba23f32cc1ee69258c1421888deae5
|
|
| BLAKE2b-256 |
4f263b7bb41d58f8e98bc0732cf7c5cadfd46cb028e86b283a0ba2b45cc29bf0
|
File details
Details for the file nervapack-0.3.0-py3-none-any.whl.
File metadata
- Download URL: nervapack-0.3.0-py3-none-any.whl
- Upload date:
- Size: 29.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3eb6529e3680b2aa7923a243408902ba0867c93b9bc780d551d2bf6f66b53568
|
|
| MD5 |
855282153f681c4e1152ef00324e5109
|
|
| BLAKE2b-256 |
f36cfd8f1610b7ed81b084b30313bb4ff2f8edc150a6989b4aa953c1e3506fe7
|