GraphRAG knowledge base for codebases

Project description

nelgraph 🚀

An autonomous, zero-configuration GraphRAG (Graph Retrieval-Augmented Generation) knowledge base builder and semantic search engine optimized for local codebases and autonomous AI coding agents.

It automatically parses source code, builds Abstract Syntax Tree (AST) call graphs, resolves class hierarchies, maps Git commit histories, and ingests them into a unified hybrid database system (Neo4j for structural structural relations + ChromaDB for vector semantic indexes) powered by DeepSeek V4-Flash.

📖 Table of Contents

Core Philosophy
Key Features
Technology Stack
Project Structure
Installation & Setup
CLI Command Reference
Programmatic Python API
Interactive Visualization Dashboard
Autonomous Test Generation & Self-Healing
Automated Testing Environment
Agent Skill Integration

💡 Core Philosophy

AI coding agents struggle with large codebases because reading raw files is slow, expensive, and lacks structural context. nelgraph bridges this gap:

Graph-Driven Navigation: Instead of searching files blindly, agents query a structured knowledge graph to instantly understand function calls, dependencies, and inheritance paths.
Isolated Zero-Config Storage: All database assets, environment details, and sync status profiles are nested locally within the target codebase's hidden directory (.graphrag_data/). No centralized servers to maintain or conflict.
Dynamic Code Resolution: Rather than duplicating source code into Neo4j (which bloats caches and causes drift), the graph maps methods to exact coordinates and code fingerprints. Code is loaded dynamically from disk on demand, with an auto-recovery parser that corrects coordinates if lines shift due to local edits.

✨ Key Features

🧠 Autonomous AI Test Generation: Automatically generates complete, runnable unit, integration, and system tests utilizing a dual-agent Commander-Worker pattern.
🛡️ AI Self-Healing Loop: Executes generated tests inside an isolated sandbox, automatically intercepts errors, diagnoses them via the Commander, and implements code fixes via the Worker (up to 3 retries).
🧬 AST Call-Graph Parser: Powered by Tree-Sitter to parse Python, PHP, JavaScript, and TypeScript/JSX/TSX. It extracts classes, functions, complexity, input signatures, return types, raises, and constructs precise call relationships.
📦 Dynamic Import & Dependency Tracker: Maps module imports (File -[:IMPORTS]-> Module). Automatically distinguishes standard library, external packages, and internal project dependencies.
🌿 Git Commit-to-Function Mapper: Parses Git commit diffs to link modified lines directly to the specific functions they affected (Commit -[:CHANGED]-> Function), enabling precise Test Impact Analysis.
🔍 Hybrid Vector-Graph Queries: Combines vector database semantic similarity searches (ChromaDB) with graph relation expansions (Neo4j) to synthesis comprehensive multi-layered context.
🔄 Git Hooks Auto-Sync: Integrates post-commit and pre-push hooks to automatically run incremental synchronizations, ensuring the graph never becomes stale.
🛠️ Self-Healing LLM Extraction: Combines json-repair with a self-correction feedback loop. If the LLM generates malformed JSON metadata, the system automatically feeds the errors back to the LLM to self-heal and regenerate (up to 4 retries).
📊 Interactive Force-Directed Dashboard: Launch a local web explorer (nelgraph viz) with node collision protection, charge range limits, and a Test Generation & Run Drawer to visually trigger test generation and view test run logs.

🛠️ Technology Stack

Parsing: Tree-Sitter (Python, PHP, JS, TS)
Graph Database: Neo4j (Bolt Protocol, Dockerized)
Vector Database: ChromaDB (Flat Vector Indexing)
LLM Engine: DeepSeek V4-Flash & OpenAI Text Embeddings via OpenRouter
Visualization Backend: FastAPI (Uvicorn)
Visualization Frontend: React (Vite) + react-force-graph-2d + D3 Force

📂 Project Structure

D:\GraphRAG/
├── config.py                 # System config and environment loader
├── docker-compose.yml        # Docker orchestration for local Neo4j
├── start_all.bat             # 1-Click developer launcher for Windows
├── Makefile                  # Cross-platform orchestration tasks
├── initialize_graph.py       # CLI wrapper for ingestion & sync
├── knowledge_base.py         # Python programmatic API for AI agents
├── core/                     # Core synchronization & database pipelines
├── parsers/                  # Code AST and Git history parsers
├── extractors/               # AI metadata extraction & enrichment loops
├── community/                # Graph clustering and community summarization
├── query/                    # Hybrid search and context synthesis engine
├── updater/                  # Filesystem watcher & Git hook scripts
├── visualization/            # FastAPI + React visualization dashboard
├── mcp/                      # Model Context Protocol TS/JS server
└── docs/                     # Comprehensive documentation & architecture references

🚀 Installation & Setup

1. Prerequisites

Python 3.10+
Docker Desktop (running and configured)
Node.js 18+ (only if building/developing visualization frontend)

2. Standard Installation

Install the package directly from PyPI:

pip install nelgraph

3. Local Development Setup

Clone the repository and install packages:

git clone https://github.com/anhluong447/GraphDB-Initialize.git D:\GraphRAG
cd D:\GraphRAG
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -e ./nelgraph

💻 CLI Command Reference

Execute commands from your terminal:

nelgraph init            # 1. First-time setup: launches Neo4j container, parses code, embeds and enriches codebase
nelgraph sync            # 2. Performs incremental sync to index changes since last synced commit
nelgraph sync --silent   # Run synchronization silently (ideal for git hooks)
nelgraph status          # 3. View current DB metrics, function counts, and enrichment coverage
nelgraph install-hook    # 4. Install post-commit hooks for automatic graph synchronization
nelgraph viz             # 5. Launch the local interactive visualization dashboard at http://localhost:8080

🔌 Programmatic Python API

Import nelgraph to query your codebase programmatically:

import nelgraph

# 1. Optional configuration (fallback to local .env if not specified)
nelgraph.configure(
    codebase_path="/absolute/path/to/project",
    openrouter_api_key="your-openrouter-api-key",
    commander_model="deepseek/deepseek-r1",
    worker_model="qwen/qwen3-coder-next"
)

# 2. Orient: Get high-level overview of codebase grouped by community clusters
snapshot = nelgraph.get_snapshot()
print(f"Total indexed functions: {snapshot['total']}")
for comm in snapshot["communities"]:
    print(f"Cluster: {comm['name']} - {comm['summary'][:100]}...")

# 3. Search: Retrieve relevant functions via semantic vector similarity
search_results = nelgraph.search("database connection handling", top_k=5)
for res in search_results:
    print(f"Match: {res['name']} in {res['file']} (Score: {res['score']})")

# 4. Retrieve Context: Get full signatures, calls, test plans, and raw source
ctx = nelgraph.get_function_context("execute", class_name="OrderProcessor")
print("Source Code:\n", ctx["raw_code"])
print("Parameters Input:", ctx["inputs"])
print("Test Recommendations:", ctx["test_recommendations"])
print("Exceptions Raised:", ctx["raises"])
print("Callers (Blast Radius):", ctx["callers"])

# 5. Retrieve Class: Get class hierarchy, parent classes, and child methods
class_ctx = nelgraph.get_class_context("BaseController")
print("Parent classes:", class_ctx["parent_classes"])
print("Class methods:", class_ctx["methods"])

# 6. Save Context: Export large context files to bypass terminal encoding limits on Windows
nelgraph.dump_context_to_file("execute", "context_export.md", format="markdown")

# 7. Mark Tested: Persist unit test completion status directly into Neo4j
nelgraph.mark_tested("execute", file="src/processors/order.py")

# 8. Autonomous Test Generation: Trigger the Commander-Worker self-healing loop
report = nelgraph.run_test_generation(
    target="execute", 
    mode="unit", 
    file="src/processors/order.py"
)
print("Test Generation Summary:", report["summary"])
print("Bugs Found:", report["bugs_found"])

📊 Interactive Visualization Dashboard

Launch the visual explorer:

nelgraph viz

This starts a FastAPI backend and loads the React dashboard at http://localhost:8080.

Physical Layout Optimizations

To ensure complex codebases are easy to explore, the visualizer uses customized D3 force simulations:

Anti-Overlap Collision: Integrates forceCollide representing nodes as physical circles with safety margins (radius + 14px). Node labels and icons never overlap.
Compact Peripheries: Restricts many-body repulsion (charge) to a maximum radius using distanceMax(250). This prevents disconnected files and external libraries from floating away into infinity, keeping them compactly structured around the main clusters.
Stretched Clusters: Adjusts default link distances to 80px, spreading out highly connected clusters for clean visibility.

🧠 Autonomous Test Generation & Self-Healing

nelgraph features an advanced, agentic dual-model test generation suite designed to autonomously build, execute, and fix test suites:

1. Dual-Agent Architecture

Commander (deepseek/deepseek-r1): Analyzes the graph structure, dependencies, and imports to outline a structured JSON test plan. If tests fail, it acts as a diagnostics agent to differentiate between test logic errors and real codebase bugs.
Worker (qwen/qwen3-coder-next): Generates runnable test code using the specified framework (pytest/jest/vitest) based on the Commander's plan, and applies fixes based on the Commander's diagnoses.

2. Self-Healing Loop

When a generated test fails:

The execution error output is caught and sent to the Commander.
The Commander diagnoses the root cause. If it is a "real_bug" in the codebase, it logs a bug report. If it is a "test_error", it generates specific fix instructions.
The Worker receives the instructions and regenerates the corrected test code.
The cycle repeats for up to MAX_HEAL_RETRIES (default: 3).

3. Incremental Synchronization & Customization Protection

An incremental synchronization registry (.graphrag_data/test_registry.json) tracks both source function hashes and generated test file hashes.

If the source code of a function changes, its test is regenerated.
If a developer manually edits or customizes a generated test file, nelgraph detects the hash mismatch and skips regeneration to prevent overwriting user modifications.

🧪 Automated Testing Environment

The workspace includes a complete testing setup for both Frontend (React) and Backend (FastAPI).

1. Frontend UI Tests

Uses Vitest + React Testing Library + jsdom to test React components.

Location: nelgraph/nelgraph/visualization/frontend/

Execution:

cd nelgraph/nelgraph/visualization/frontend
npm run test          # Run once
npm run test:watch    # Run in watch mode

Test Coverage:
- DetailPanel.test.jsx: Verifies metadata cards, list rendering of complex JSON structures (resolves Error 31), and chip navigations.
- GraphView.test.jsx: Mocks canvas elements, tests filter switching, and verifies filtering out dangling links.
- ErrorBoundary.test.jsx: Verifies rendering fallback panels and sending POST error logs to the API.

2. Backend Integration Tests

Uses pytest to verify FastAPI API routes.

Location: nelgraph/tests/
Execution:
```
cd nelgraph
pytest -v tests/
```
Test Coverage:
- conftest.py: Configures mock_neo4j fixture to intercept get_client calls, bypassing live database requirements.
- test_api.py: Validates /status, /log, /node/{name}, /node/{name}/mark_tested, and checks dangling edge filtering in /graph/full.

🤖 Agent Skill Integration

When nelgraph init runs, it generates .agents/nelgraph/SKILL.md. This file contains strict instructions, workflows, and API descriptions that downstream LLM coding agents can load. Agents reading this file are instructed to:

Always run synchronization (nelgraph.run_sync()) before taking actions.
Read overall project structure via get_snapshot() rather than scanning directory trees.
Query source code via get_function_context()["raw_code"] instead of opening files directly.
Inspect ctx["callers"] to calculate change blast radii before refactoring.
Use test_recommendations as a baseline blueprint for test writing.

Project details

Release history Release notifications | RSS feed

This version

1.1.22

Jun 26, 2026

1.1.21

Jun 26, 2026

1.1.20

Jun 26, 2026

1.1.19

Jun 26, 2026

1.1.18

Jun 26, 2026

1.1.17

Jun 25, 2026

1.1.16

Jun 25, 2026

1.1.15

Jun 25, 2026

1.1.14

Jun 25, 2026

1.1.13

Jun 25, 2026

1.1.12

Jun 25, 2026

1.1.11

Jun 25, 2026

1.1.9

Jun 24, 2026

1.1.8

Jun 24, 2026

1.1.7

Jun 24, 2026

1.1.6

Jun 24, 2026

1.1.5

Jun 24, 2026

1.1.4

Jun 24, 2026

1.1.3

Jun 24, 2026

1.1.2

Jun 24, 2026

1.1.1

Jun 24, 2026

1.1.0

Jun 24, 2026

1.0.9

Jun 24, 2026

1.0.8

Jun 23, 2026

1.0.7

Jun 23, 2026

1.0.6

Jun 12, 2026

1.0.5

Jun 12, 2026

1.0.4

Jun 12, 2026

1.0.3

Jun 11, 2026

1.0.2

Jun 11, 2026

1.0.1

Jun 11, 2026

1.0.0

Jun 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nelgraph-1.1.22.tar.gz (2.9 MB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nelgraph-1.1.22-py3-none-any.whl (2.9 MB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file nelgraph-1.1.22.tar.gz.

File metadata

Download URL: nelgraph-1.1.22.tar.gz
Upload date: Jun 26, 2026
Size: 2.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nelgraph-1.1.22.tar.gz
Algorithm	Hash digest
SHA256	`5c71568aa5070deea1602e58f6081c9341f42cda2f6d356bbf20fd048f19eda6`
MD5	`3479379004f00392b9856f25d6d8ee51`
BLAKE2b-256	`b361d063846010cd4cf8994f821b828da177d17ac556953f7add26361c1dd841`

See more details on using hashes here.

File details

Details for the file nelgraph-1.1.22-py3-none-any.whl.

File metadata

Download URL: nelgraph-1.1.22-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 2.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nelgraph-1.1.22-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89c2f24f0f1be1c0bbc4ef9df345798a8ef0e7f6f69a0da555fbbdf200b69002`
MD5	`acb0d6f58fcca166c1c2f97544d75895`
BLAKE2b-256	`11fd7d31b7392ef78273d8dd5060f9a04d93dd6f19e9dd5c2677a1dd27757eb3`

See more details on using hashes here.

nelgraph 1.1.22

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

nelgraph 🚀

📖 Table of Contents

💡 Core Philosophy

✨ Key Features

🛠️ Technology Stack

📂 Project Structure

🚀 Installation & Setup

1. Prerequisites

2. Standard Installation

3. Local Development Setup

💻 CLI Command Reference

🔌 Programmatic Python API

📊 Interactive Visualization Dashboard

Physical Layout Optimizations

🧠 Autonomous Test Generation & Self-Healing

1. Dual-Agent Architecture

2. Self-Healing Loop

3. Incremental Synchronization & Customization Protection

🧪 Automated Testing Environment

1. Frontend UI Tests

2. Backend Integration Tests

🤖 Agent Skill Integration

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes