GraphRAG knowledge base for codebases
Project description
nelgraph ๐
An autonomous, zero-configuration GraphRAG (Graph Retrieval-Augmented Generation) knowledge base builder and semantic search engine optimized for local codebases and autonomous AI coding agents.
It automatically parses source code, builds Abstract Syntax Tree (AST) call graphs, resolves class hierarchies, maps Git commit histories, and ingests them into a unified hybrid database system (Neo4j for structural structural relations + ChromaDB for vector semantic indexes) powered by DeepSeek V4-Flash.
๐ Table of Contents
- Core Philosophy
- Key Features
- Technology Stack
- Project Structure
- Installation & Setup
- CLI Command Reference
- Programmatic Python API
- Interactive Visualization Dashboard
- Autonomous Test Generation & Self-Healing
- Automated Testing Environment
- Agent Skill Integration
๐ก Core Philosophy
AI coding agents struggle with large codebases because reading raw files is slow, expensive, and lacks structural context. nelgraph bridges this gap:
- Graph-Driven Navigation: Instead of searching files blindly, agents query a structured knowledge graph to instantly understand function calls, dependencies, and inheritance paths.
- Isolated Zero-Config Storage: All database assets, environment details, and sync status profiles are nested locally within the target codebase's hidden directory (
.graphrag_data/). No centralized servers to maintain or conflict. - Dynamic Code Resolution: Rather than duplicating source code into Neo4j (which bloats caches and causes drift), the graph maps methods to exact coordinates and code fingerprints. Code is loaded dynamically from disk on demand, with an auto-recovery parser that corrects coordinates if lines shift due to local edits.
โจ Key Features
- ๐ง Autonomous AI Test Generation: Automatically generates complete, runnable unit, integration, and system tests utilizing a dual-agent Commander-Worker pattern.
- ๐ก๏ธ AI Self-Healing Loop: Executes generated tests inside an isolated sandbox, automatically intercepts errors, diagnoses them via the Commander, and implements code fixes via the Worker (up to 3 retries).
- ๐งฌ AST Call-Graph Parser: Powered by Tree-Sitter to parse Python, PHP, JavaScript, and TypeScript/JSX/TSX. It extracts classes, functions, complexity, input signatures, return types, raises, and constructs precise call relationships.
- ๐ฆ Dynamic Import & Dependency Tracker: Maps module imports (
File -[:IMPORTS]-> Module). Automatically distinguishes standard library, external packages, and internal project dependencies. - ๐ฟ Git Commit-to-Function Mapper: Parses Git commit diffs to link modified lines directly to the specific functions they affected (
Commit -[:CHANGED]-> Function), enabling precise Test Impact Analysis. - ๐ Hybrid Vector-Graph Queries: Combines vector database semantic similarity searches (ChromaDB) with graph relation expansions (Neo4j) to synthesis comprehensive multi-layered context.
- ๐ Git Hooks Auto-Sync: Integrates post-commit and pre-push hooks to automatically run incremental synchronizations, ensuring the graph never becomes stale.
- ๐ ๏ธ Self-Healing LLM Extraction: Combines
json-repairwith a self-correction feedback loop. If the LLM generates malformed JSON metadata, the system automatically feeds the errors back to the LLM to self-heal and regenerate (up to 4 retries). - ๐ Interactive Force-Directed Dashboard: Launch a local web explorer (
nelgraph viz) with node collision protection, charge range limits, and a Test Generation & Run Drawer to visually trigger test generation and view test run logs.
๐ ๏ธ Technology Stack
- Parsing: Tree-Sitter (Python, PHP, JS, TS)
- Graph Database: Neo4j (Bolt Protocol, Dockerized)
- Vector Database: ChromaDB (Flat Vector Indexing)
- LLM Engine: DeepSeek V4-Flash & OpenAI Text Embeddings via OpenRouter
- Visualization Backend: FastAPI (Uvicorn)
- Visualization Frontend: React (Vite) +
react-force-graph-2d+ D3 Force
๐ Project Structure
D:\GraphRAG/
โโโ config.py # System config and environment loader
โโโ docker-compose.yml # Docker orchestration for local Neo4j
โโโ start_all.bat # 1-Click developer launcher for Windows
โโโ Makefile # Cross-platform orchestration tasks
โโโ initialize_graph.py # CLI wrapper for ingestion & sync
โโโ knowledge_base.py # Python programmatic API for AI agents
โโโ core/ # Core synchronization & database pipelines
โโโ parsers/ # Code AST and Git history parsers
โโโ extractors/ # AI metadata extraction & enrichment loops
โโโ community/ # Graph clustering and community summarization
โโโ query/ # Hybrid search and context synthesis engine
โโโ updater/ # Filesystem watcher & Git hook scripts
โโโ visualization/ # FastAPI + React visualization dashboard
โโโ mcp/ # Model Context Protocol TS/JS server
โโโ docs/ # Comprehensive documentation & architecture references
๐ Installation & Setup
1. Prerequisites
- Python 3.10+
- Docker Desktop (running and configured)
- Node.js 18+ (only if building/developing visualization frontend)
2. Standard Installation
Install the package directly from PyPI:
pip install nelgraph
3. Local Development Setup
Clone the repository and install packages:
git clone https://github.com/anhluong447/GraphDB-Initialize.git D:\GraphRAG
cd D:\GraphRAG
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -e ./nelgraph
๐ป CLI Command Reference
Execute commands from your terminal:
nelgraph init # 1. First-time setup: launches Neo4j container, parses code, embeds and enriches codebase
nelgraph sync # 2. Performs incremental sync to index changes since last synced commit
nelgraph sync --silent # Run synchronization silently (ideal for git hooks)
nelgraph status # 3. View current DB metrics, function counts, and enrichment coverage
nelgraph install-hook # 4. Install post-commit hooks for automatic graph synchronization
nelgraph viz # 5. Launch the local interactive visualization dashboard at http://localhost:8080
๐ Programmatic Python API
Import nelgraph to query your codebase programmatically:
import nelgraph
# 1. Optional configuration (fallback to local .env if not specified)
nelgraph.configure(
codebase_path="/absolute/path/to/project",
openrouter_api_key="your-openrouter-api-key",
commander_model="deepseek/deepseek-r1",
worker_model="qwen/qwen3-coder-next"
)
# 2. Orient: Get high-level overview of codebase grouped by community clusters
snapshot = nelgraph.get_snapshot()
print(f"Total indexed functions: {snapshot['total']}")
for comm in snapshot["communities"]:
print(f"Cluster: {comm['name']} - {comm['summary'][:100]}...")
# 3. Search: Retrieve relevant functions via semantic vector similarity
search_results = nelgraph.search("database connection handling", top_k=5)
for res in search_results:
print(f"Match: {res['name']} in {res['file']} (Score: {res['score']})")
# 4. Retrieve Context: Get full signatures, calls, test plans, and raw source
ctx = nelgraph.get_function_context("execute", class_name="OrderProcessor")
print("Source Code:\n", ctx["raw_code"])
print("Parameters Input:", ctx["inputs"])
print("Test Recommendations:", ctx["test_recommendations"])
print("Exceptions Raised:", ctx["raises"])
print("Callers (Blast Radius):", ctx["callers"])
# 5. Retrieve Class: Get class hierarchy, parent classes, and child methods
class_ctx = nelgraph.get_class_context("BaseController")
print("Parent classes:", class_ctx["parent_classes"])
print("Class methods:", class_ctx["methods"])
# 6. Save Context: Export large context files to bypass terminal encoding limits on Windows
nelgraph.dump_context_to_file("execute", "context_export.md", format="markdown")
# 7. Mark Tested: Persist unit test completion status directly into Neo4j
nelgraph.mark_tested("execute", file="src/processors/order.py")
# 8. Autonomous Test Generation: Trigger the Commander-Worker self-healing loop
report = nelgraph.run_test_generation(
target="execute",
mode="unit",
file="src/processors/order.py"
)
print("Test Generation Summary:", report["summary"])
print("Bugs Found:", report["bugs_found"])
๐ Interactive Visualization Dashboard
Launch the visual explorer:
nelgraph viz
This starts a FastAPI backend and loads the React dashboard at http://localhost:8080.
Physical Layout Optimizations
To ensure complex codebases are easy to explore, the visualizer uses customized D3 force simulations:
- Anti-Overlap Collision: Integrates
forceColliderepresenting nodes as physical circles with safety margins (radius + 14px). Node labels and icons never overlap. - Compact Peripheries: Restricts many-body repulsion (
charge) to a maximum radius usingdistanceMax(250). This prevents disconnected files and external libraries from floating away into infinity, keeping them compactly structured around the main clusters. - Stretched Clusters: Adjusts default link distances to
80px, spreading out highly connected clusters for clean visibility.
๐ง Autonomous Test Generation & Self-Healing
nelgraph features an advanced, agentic dual-model test generation suite designed to autonomously build, execute, and fix test suites:
1. Dual-Agent Architecture
- Commander (
deepseek/deepseek-r1): Analyzes the graph structure, dependencies, and imports to outline a structured JSON test plan. If tests fail, it acts as a diagnostics agent to differentiate between test logic errors and real codebase bugs. - Worker (
qwen/qwen3-coder-next): Generates runnable test code using the specified framework (pytest/jest/vitest) based on the Commander's plan, and applies fixes based on the Commander's diagnoses.
2. Self-Healing Loop
When a generated test fails:
- The execution error output is caught and sent to the Commander.
- The Commander diagnoses the root cause. If it is a
"real_bug"in the codebase, it logs a bug report. If it is a"test_error", it generates specific fix instructions. - The Worker receives the instructions and regenerates the corrected test code.
- The cycle repeats for up to
MAX_HEAL_RETRIES(default:3).
3. Incremental Synchronization & Customization Protection
An incremental synchronization registry (.graphrag_data/test_registry.json) tracks both source function hashes and generated test file hashes.
- If the source code of a function changes, its test is regenerated.
- If a developer manually edits or customizes a generated test file,
nelgraphdetects the hash mismatch and skips regeneration to prevent overwriting user modifications.
๐งช Automated Testing Environment
The workspace includes a complete testing setup for both Frontend (React) and Backend (FastAPI).
1. Frontend UI Tests
Uses Vitest + React Testing Library + jsdom to test React components.
- Location:
nelgraph/nelgraph/visualization/frontend/ - Execution:
cd nelgraph/nelgraph/visualization/frontend npm run test # Run once npm run test:watch # Run in watch mode
- Test Coverage:
DetailPanel.test.jsx: Verifies metadata cards, list rendering of complex JSON structures (resolves Error 31), and chip navigations.GraphView.test.jsx: Mocks canvas elements, tests filter switching, and verifies filtering out dangling links.ErrorBoundary.test.jsx: Verifies rendering fallback panels and sending POST error logs to the API.
2. Backend Integration Tests
Uses pytest to verify FastAPI API routes.
- Location:
nelgraph/tests/ - Execution:
cd nelgraph pytest -v tests/
- Test Coverage:
conftest.py: Configuresmock_neo4jfixture to interceptget_clientcalls, bypassing live database requirements.test_api.py: Validates/status,/log,/node/{name},/node/{name}/mark_tested, and checks dangling edge filtering in/graph/full.
๐ค Agent Skill Integration
When nelgraph init runs, it generates .agents/nelgraph/SKILL.md. This file contains strict instructions, workflows, and API descriptions that downstream LLM coding agents can load. Agents reading this file are instructed to:
- Always run synchronization (
nelgraph.run_sync()) before taking actions. - Read overall project structure via
get_snapshot()rather than scanning directory trees. - Query source code via
get_function_context()["raw_code"]instead of opening files directly. - Inspect
ctx["callers"]to calculate change blast radii before refactoring. - Use
test_recommendationsas a baseline blueprint for test writing.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nelgraph-1.1.22.tar.gz.
File metadata
- Download URL: nelgraph-1.1.22.tar.gz
- Upload date:
- Size: 2.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c71568aa5070deea1602e58f6081c9341f42cda2f6d356bbf20fd048f19eda6
|
|
| MD5 |
3479379004f00392b9856f25d6d8ee51
|
|
| BLAKE2b-256 |
b361d063846010cd4cf8994f821b828da177d17ac556953f7add26361c1dd841
|
File details
Details for the file nelgraph-1.1.22-py3-none-any.whl.
File metadata
- Download URL: nelgraph-1.1.22-py3-none-any.whl
- Upload date:
- Size: 2.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89c2f24f0f1be1c0bbc4ef9df345798a8ef0e7f6f69a0da555fbbdf200b69002
|
|
| MD5 |
acb0d6f58fcca166c1c2f97544d75895
|
|
| BLAKE2b-256 |
11fd7d31b7392ef78273d8dd5060f9a04d93dd6f19e9dd5c2677a1dd27757eb3
|