Skip to main content

GraphRAG knowledge base for codebases

Project description

nelgraph ๐Ÿš€

Build Status Python Version License Version

An autonomous, zero-configuration GraphRAG (Graph Retrieval-Augmented Generation) knowledge base builder and semantic search engine optimized for local codebases and autonomous AI coding agents.

It automatically parses source code, builds Abstract Syntax Tree (AST) call graphs, resolves class hierarchies, maps Git commit histories, and ingests them into a unified hybrid database system (Neo4j for structural structural relations + ChromaDB for vector semantic indexes) powered by DeepSeek V4-Flash.


๐Ÿ“– Table of Contents

  1. Core Philosophy
  2. Key Features
  3. Technology Stack
  4. Project Structure
  5. Installation & Setup
  6. CLI Command Reference
  7. Programmatic Python API
  8. Interactive Visualization Dashboard
  9. Autonomous Test Generation & Self-Healing
  10. Automated Testing Environment
  11. Agent Skill Integration

๐Ÿ’ก Core Philosophy

AI coding agents struggle with large codebases because reading raw files is slow, expensive, and lacks structural context. nelgraph bridges this gap:

  • Graph-Driven Navigation: Instead of searching files blindly, agents query a structured knowledge graph to instantly understand function calls, dependencies, and inheritance paths.
  • Isolated Zero-Config Storage: All database assets, environment details, and sync status profiles are nested locally within the target codebase's hidden directory (.graphrag_data/). No centralized servers to maintain or conflict.
  • Dynamic Code Resolution: Rather than duplicating source code into Neo4j (which bloats caches and causes drift), the graph maps methods to exact coordinates and code fingerprints. Code is loaded dynamically from disk on demand, with an auto-recovery parser that corrects coordinates if lines shift due to local edits.

โœจ Key Features

  • ๐Ÿง  Autonomous AI Test Generation: Automatically generates complete, runnable unit, integration, and system tests utilizing a dual-agent Commander-Worker pattern.
  • ๐Ÿ›ก๏ธ AI Self-Healing Loop: Executes generated tests inside an isolated sandbox, automatically intercepts errors, diagnoses them via the Commander, and implements code fixes via the Worker (up to 3 retries).
  • ๐Ÿงฌ AST Call-Graph Parser: Powered by Tree-Sitter to parse Python, PHP, JavaScript, and TypeScript/JSX/TSX. It extracts classes, functions, complexity, input signatures, return types, raises, and constructs precise call relationships.
  • ๐Ÿ“ฆ Dynamic Import & Dependency Tracker: Maps module imports (File -[:IMPORTS]-> Module). Automatically distinguishes standard library, external packages, and internal project dependencies.
  • ๐ŸŒฟ Git Commit-to-Function Mapper: Parses Git commit diffs to link modified lines directly to the specific functions they affected (Commit -[:CHANGED]-> Function), enabling precise Test Impact Analysis.
  • ๐Ÿ” Hybrid Vector-Graph Queries: Combines vector database semantic similarity searches (ChromaDB) with graph relation expansions (Neo4j) to synthesis comprehensive multi-layered context.
  • ๐Ÿ”„ Git Hooks Auto-Sync: Integrates post-commit and pre-push hooks to automatically run incremental synchronizations, ensuring the graph never becomes stale.
  • ๐Ÿ› ๏ธ Self-Healing LLM Extraction: Combines json-repair with a self-correction feedback loop. If the LLM generates malformed JSON metadata, the system automatically feeds the errors back to the LLM to self-heal and regenerate (up to 4 retries).
  • ๐Ÿ“Š Interactive Force-Directed Dashboard: Launch a local web explorer (nelgraph viz) with node collision protection, charge range limits, and a Test Generation & Run Drawer to visually trigger test generation and view test run logs.

๐Ÿ› ๏ธ Technology Stack

  • Parsing: Tree-Sitter (Python, PHP, JS, TS)
  • Graph Database: Neo4j (Bolt Protocol, Dockerized)
  • Vector Database: ChromaDB (Flat Vector Indexing)
  • LLM Engine: DeepSeek V4-Flash & OpenAI Text Embeddings via OpenRouter
  • Visualization Backend: FastAPI (Uvicorn)
  • Visualization Frontend: React (Vite) + react-force-graph-2d + D3 Force

๐Ÿ“‚ Project Structure

D:\GraphRAG/
โ”œโ”€โ”€ config.py                 # System config and environment loader
โ”œโ”€โ”€ docker-compose.yml        # Docker orchestration for local Neo4j
โ”œโ”€โ”€ start_all.bat             # 1-Click developer launcher for Windows
โ”œโ”€โ”€ Makefile                  # Cross-platform orchestration tasks
โ”œโ”€โ”€ initialize_graph.py       # CLI wrapper for ingestion & sync
โ”œโ”€โ”€ knowledge_base.py         # Python programmatic API for AI agents
โ”œโ”€โ”€ core/                     # Core synchronization & database pipelines
โ”œโ”€โ”€ parsers/                  # Code AST and Git history parsers
โ”œโ”€โ”€ extractors/               # AI metadata extraction & enrichment loops
โ”œโ”€โ”€ community/                # Graph clustering and community summarization
โ”œโ”€โ”€ query/                    # Hybrid search and context synthesis engine
โ”œโ”€โ”€ updater/                  # Filesystem watcher & Git hook scripts
โ”œโ”€โ”€ visualization/            # FastAPI + React visualization dashboard
โ”œโ”€โ”€ mcp/                      # Model Context Protocol TS/JS server
โ””โ”€โ”€ docs/                     # Comprehensive documentation & architecture references

๐Ÿš€ Installation & Setup

1. Prerequisites

  • Python 3.10+
  • Docker Desktop (running and configured)
  • Node.js 18+ (only if building/developing visualization frontend)

2. Standard Installation

Install the package directly from PyPI:

pip install nelgraph

3. Local Development Setup

Clone the repository and install packages:

git clone https://github.com/anhluong447/GraphDB-Initialize.git D:\GraphRAG
cd D:\GraphRAG
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -e ./nelgraph

๐Ÿ’ป CLI Command Reference

Execute commands from your terminal:

nelgraph init            # 1. First-time setup: launches Neo4j container, parses code, embeds and enriches codebase
nelgraph sync            # 2. Performs incremental sync to index changes since last synced commit
nelgraph sync --silent   # Run synchronization silently (ideal for git hooks)
nelgraph status          # 3. View current DB metrics, function counts, and enrichment coverage
nelgraph install-hook    # 4. Install post-commit hooks for automatic graph synchronization
nelgraph viz             # 5. Launch the local interactive visualization dashboard at http://localhost:8080

๐Ÿ”Œ Programmatic Python API

Import nelgraph to query your codebase programmatically:

import nelgraph

# 1. Optional configuration (fallback to local .env if not specified)
nelgraph.configure(
    codebase_path="/absolute/path/to/project",
    openrouter_api_key="your-openrouter-api-key",
    commander_model="deepseek/deepseek-r1",
    worker_model="qwen/qwen3-coder-next"
)

# 2. Orient: Get high-level overview of codebase grouped by community clusters
snapshot = nelgraph.get_snapshot()
print(f"Total indexed functions: {snapshot['total']}")
for comm in snapshot["communities"]:
    print(f"Cluster: {comm['name']} - {comm['summary'][:100]}...")

# 3. Search: Retrieve relevant functions via semantic vector similarity
search_results = nelgraph.search("database connection handling", top_k=5)
for res in search_results:
    print(f"Match: {res['name']} in {res['file']} (Score: {res['score']})")

# 4. Retrieve Context: Get full signatures, calls, test plans, and raw source
ctx = nelgraph.get_function_context("execute", class_name="OrderProcessor")
print("Source Code:\n", ctx["raw_code"])
print("Parameters Input:", ctx["inputs"])
print("Test Recommendations:", ctx["test_recommendations"])
print("Exceptions Raised:", ctx["raises"])
print("Callers (Blast Radius):", ctx["callers"])

# 5. Retrieve Class: Get class hierarchy, parent classes, and child methods
class_ctx = nelgraph.get_class_context("BaseController")
print("Parent classes:", class_ctx["parent_classes"])
print("Class methods:", class_ctx["methods"])

# 6. Save Context: Export large context files to bypass terminal encoding limits on Windows
nelgraph.dump_context_to_file("execute", "context_export.md", format="markdown")

# 7. Mark Tested: Persist unit test completion status directly into Neo4j
nelgraph.mark_tested("execute", file="src/processors/order.py")

# 8. Autonomous Test Generation: Trigger the Commander-Worker self-healing loop
report = nelgraph.run_test_generation(
    target="execute", 
    mode="unit", 
    file="src/processors/order.py"
)
print("Test Generation Summary:", report["summary"])
print("Bugs Found:", report["bugs_found"])

๐Ÿ“Š Interactive Visualization Dashboard

Launch the visual explorer:

nelgraph viz

This starts a FastAPI backend and loads the React dashboard at http://localhost:8080.

Physical Layout Optimizations

To ensure complex codebases are easy to explore, the visualizer uses customized D3 force simulations:

  • Anti-Overlap Collision: Integrates forceCollide representing nodes as physical circles with safety margins (radius + 14px). Node labels and icons never overlap.
  • Compact Peripheries: Restricts many-body repulsion (charge) to a maximum radius using distanceMax(250). This prevents disconnected files and external libraries from floating away into infinity, keeping them compactly structured around the main clusters.
  • Stretched Clusters: Adjusts default link distances to 80px, spreading out highly connected clusters for clean visibility.

๐Ÿง  Autonomous Test Generation & Self-Healing

nelgraph features an advanced, agentic dual-model test generation suite designed to autonomously build, execute, and fix test suites:

1. Dual-Agent Architecture

  • Commander (deepseek/deepseek-r1): Analyzes the graph structure, dependencies, and imports to outline a structured JSON test plan. If tests fail, it acts as a diagnostics agent to differentiate between test logic errors and real codebase bugs.
  • Worker (qwen/qwen3-coder-next): Generates runnable test code using the specified framework (pytest/jest/vitest) based on the Commander's plan, and applies fixes based on the Commander's diagnoses.

2. Self-Healing Loop

When a generated test fails:

  1. The execution error output is caught and sent to the Commander.
  2. The Commander diagnoses the root cause. If it is a "real_bug" in the codebase, it logs a bug report. If it is a "test_error", it generates specific fix instructions.
  3. The Worker receives the instructions and regenerates the corrected test code.
  4. The cycle repeats for up to MAX_HEAL_RETRIES (default: 3).

3. Incremental Synchronization & Customization Protection

An incremental synchronization registry (.graphrag_data/test_registry.json) tracks both source function hashes and generated test file hashes.

  • If the source code of a function changes, its test is regenerated.
  • If a developer manually edits or customizes a generated test file, nelgraph detects the hash mismatch and skips regeneration to prevent overwriting user modifications.

๐Ÿงช Automated Testing Environment

The workspace includes a complete testing setup for both Frontend (React) and Backend (FastAPI).

1. Frontend UI Tests

Uses Vitest + React Testing Library + jsdom to test React components.

  • Location: nelgraph/nelgraph/visualization/frontend/
  • Execution:
    cd nelgraph/nelgraph/visualization/frontend
    npm run test          # Run once
    npm run test:watch    # Run in watch mode
    
  • Test Coverage:
    • DetailPanel.test.jsx: Verifies metadata cards, list rendering of complex JSON structures (resolves Error 31), and chip navigations.
    • GraphView.test.jsx: Mocks canvas elements, tests filter switching, and verifies filtering out dangling links.
    • ErrorBoundary.test.jsx: Verifies rendering fallback panels and sending POST error logs to the API.

2. Backend Integration Tests

Uses pytest to verify FastAPI API routes.

  • Location: nelgraph/tests/
  • Execution:
    cd nelgraph
    pytest -v tests/
    
  • Test Coverage:
    • conftest.py: Configures mock_neo4j fixture to intercept get_client calls, bypassing live database requirements.
    • test_api.py: Validates /status, /log, /node/{name}, /node/{name}/mark_tested, and checks dangling edge filtering in /graph/full.

๐Ÿค– Agent Skill Integration

When nelgraph init runs, it generates .agents/nelgraph/SKILL.md. This file contains strict instructions, workflows, and API descriptions that downstream LLM coding agents can load. Agents reading this file are instructed to:

  1. Always run synchronization (nelgraph.run_sync()) before taking actions.
  2. Read overall project structure via get_snapshot() rather than scanning directory trees.
  3. Query source code via get_function_context()["raw_code"] instead of opening files directly.
  4. Inspect ctx["callers"] to calculate change blast radii before refactoring.
  5. Use test_recommendations as a baseline blueprint for test writing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nelgraph-1.1.16.tar.gz (2.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nelgraph-1.1.16-py3-none-any.whl (2.9 MB view details)

Uploaded Python 3

File details

Details for the file nelgraph-1.1.16.tar.gz.

File metadata

  • Download URL: nelgraph-1.1.16.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nelgraph-1.1.16.tar.gz
Algorithm Hash digest
SHA256 6baad099946bd6748cbbdf46da029fe4e0f72118b3f1599781106d380f8ff053
MD5 5b03f5ad2a3396134a19d57df2f4d7b3
BLAKE2b-256 6f21c986ecde3cd0d594c4e56efe4ea7adcbadda7f10faf1256dd706b6319545

See more details on using hashes here.

File details

Details for the file nelgraph-1.1.16-py3-none-any.whl.

File metadata

  • Download URL: nelgraph-1.1.16-py3-none-any.whl
  • Upload date:
  • Size: 2.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nelgraph-1.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 7e1ae02337dc4d9188d232a6c92d5cffe3efc3957568be6ae689e638fe945465
MD5 2b894ccefeca9849df3b1c16ad298f45
BLAKE2b-256 da2f25d7b6d6026c5ee4211d6ae397d5445a6144b3b0c504ae2a028f4488af5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page