Skip to main content

GraphRAG knowledge base for codebases

Project description

nelgraph ๐Ÿš€

Build Status Python Version License Version

An autonomous, zero-configuration GraphRAG (Graph Retrieval-Augmented Generation) knowledge base builder and semantic search engine optimized for local codebases and autonomous AI coding agents.

It automatically parses source code, builds Abstract Syntax Tree (AST) call graphs, resolves class hierarchies, maps Git commit histories, and ingests them into a unified hybrid database system (Neo4j for structural graph relations + ChromaDB for vector semantic indexes) powered by DeepSeek V4-Flash.


๐Ÿ“– Table of Contents

  1. Core Philosophy
  2. Key Features
  3. Technology Stack
  4. Project Structure
  5. Installation & Setup
  6. CLI Command Reference
  7. Programmatic Python API
  8. Interactive Visualization Dashboard
  9. Automated Testing Environment
  10. Agent Skill Integration

๐Ÿ’ก Core Philosophy

AI coding agents struggle with large codebases because reading raw files is slow, expensive, and lacks structural context. nelgraph bridges this gap:

  • Graph-Driven Navigation: Instead of searching files blindly, agents query a structured knowledge graph to instantly understand function calls, dependencies, and inheritance paths.
  • Isolated Zero-Config Storage: All database assets, environment details, and sync status profiles are nested locally within the target codebase's hidden directory (.graphrag_data/). No centralized servers to maintain or conflict.
  • Dynamic Code Resolution: Rather than duplicating source code into Neo4j (which bloats caches and causes drift), the graph maps methods to exact coordinates and code fingerprints. Code is loaded dynamically from disk on demand, with an auto-recovery parser that corrects coordinates if lines shift due to local edits.

โœจ Key Features

  • ๐Ÿงฌ AST Call-Graph Parser: Powered by Tree-Sitter to parse Python, PHP, JavaScript, and TypeScript/JSX/TSX. It extracts classes, functions, complexity, input signatures, return types, raises, and constructs precise call relationships.
  • ๐Ÿ“ฆ Dynamic Import & Dependency Tracker: Maps module imports (File -[:IMPORTS]-> Module). Automatically distinguishes standard library, external packages, and internal project dependencies.
  • ๐ŸŒฟ Git Commit-to-Function Mapper: Parses Git commit diffs to link modified lines directly to the specific functions they affected (Commit -[:CHANGED]-> Function), enabling precise Test Impact Analysis.
  • ๐Ÿ” Hybrid Vector-Graph Queries: Combines vector database semantic similarity searches (ChromaDB) with graph relation expansions (Neo4j) to synthesis comprehensive multi-layered context.
  • ๐Ÿ”„ Git Hooks Auto-Sync: Integrates post-commit and pre-push hooks to automatically run incremental synchronizations, ensuring the graph never becomes stale.
  • ๐Ÿ› ๏ธ Self-Healing LLM Extraction: Combines json-repair with a self-correction feedback loop. If the LLM generates malformed JSON metadata, the system automatically feeds the errors back to the LLM to self-heal and regenerate (up to 4 retries).
  • ๐Ÿ“Š Interactive Force-Directed Dashboard: Launch a local web explorer (nelgraph viz) with optimized layout physics (node collision protection, charge range limits) to visually map and filter classes, functions, files, communities, and test coverage.

๐Ÿ› ๏ธ Technology Stack

  • Parsing: Tree-Sitter (Python, PHP, JS, TS)
  • Graph Database: Neo4j (Bolt Protocol, Dockerized)
  • Vector Database: ChromaDB (Flat Vector Indexing)
  • LLM Engine: DeepSeek V4-Flash & OpenAI Text Embeddings via OpenRouter
  • Visualization Backend: FastAPI (Uvicorn)
  • Visualization Frontend: React (Vite) + react-force-graph-2d + D3 Force

๐Ÿ“‚ Project Structure

D:\GraphRAG/
โ”œโ”€โ”€ config.py                 # System config and environment loader
โ”œโ”€โ”€ docker-compose.yml        # Docker orchestration for local Neo4j
โ”œโ”€โ”€ start_all.bat             # 1-Click developer launcher for Windows
โ”œโ”€โ”€ Makefile                  # Cross-platform orchestration tasks
โ”œโ”€โ”€ initialize_graph.py       # CLI wrapper for ingestion & sync
โ”œโ”€โ”€ knowledge_base.py         # Python programmatic API for AI agents
โ”œโ”€โ”€ core/                     # Core synchronization & database pipelines
โ”œโ”€โ”€ parsers/                  # Code AST and Git history parsers
โ”œโ”€โ”€ extractors/               # AI metadata extraction & enrichment loops
โ”œโ”€โ”€ community/                # Graph clustering and community summarization
โ”œโ”€โ”€ query/                    # Hybrid search and context synthesis engine
โ”œโ”€โ”€ updater/                  # Filesystem watcher & Git hook scripts
โ”œโ”€โ”€ visualization/            # FastAPI + React visualization dashboard
โ”œโ”€โ”€ mcp/                      # Model Context Protocol TS/JS server
โ””โ”€โ”€ docs/                     # Comprehensive documentation & architecture references

๐Ÿš€ Installation & Setup

1. Prerequisites

  • Python 3.10+
  • Docker Desktop (running and configured)
  • Node.js 18+ (only if building/developing visualization frontend)

2. Standard Installation

Install the package directly from PyPI:

pip install nelgraph

3. Local Development Setup

Clone the repository and install packages:

git clone https://github.com/anhluong447/GraphDB-Initialize.git D:\GraphRAG
cd D:\GraphRAG
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -e ./nelgraph

๐Ÿ’ป CLI Command Reference

Execute commands from your terminal:

nelgraph init            # 1. First-time setup: launches Neo4j container, parses code, embeds and enriches codebase
nelgraph sync            # 2. Performs incremental sync to index changes since last synced commit
nelgraph sync --silent   # Run synchronization silently (ideal for git hooks)
nelgraph status          # 3. View current DB metrics, function counts, and enrichment coverage
nelgraph install-hook    # 4. Install post-commit hooks for automatic graph synchronization
nelgraph viz             # 5. Launch the local interactive visualization dashboard at http://localhost:8080

๐Ÿ”Œ Programmatic Python API

Import nelgraph to query your codebase programmatically:

import nelgraph

# 1. Optional configuration (fallback to local .env if not specified)
nelgraph.configure(
    codebase_path="/absolute/path/to/project",
    openrouter_api_key="your-openrouter-api-key"
)

# 2. Orient: Get high-level overview of codebase grouped by community clusters
snapshot = nelgraph.get_snapshot()
print(f"Total indexed functions: {snapshot['total']}")
for comm in snapshot["communities"]:
    print(f"Cluster: {comm['name']} - {comm['summary'][:100]}...")

# 3. Search: Retrieve relevant functions via semantic vector similarity
search_results = nelgraph.search("database connection handling", top_k=5)
for res in search_results:
    print(f"Match: {res['name']} in {res['file']} (Score: {res['score']})")

# 4. Retrieve Context: Get full signatures, calls, test plans, and raw source
ctx = nelgraph.get_function_context("execute", class_name="OrderProcessor")
print("Source Code:\n", ctx["raw_code"])
print("Parameters Input:", ctx["inputs"])
print("Test Recommendations:", ctx["test_recommendations"])
print("Exceptions Raised:", ctx["raises"])
print("Callers (Blast Radius):", ctx["callers"])

# 5. Retrieve Class: Get class hierarchy, parent classes, and child methods
class_ctx = nelgraph.get_class_context("BaseController")
print("Parent classes:", class_ctx["parent_classes"])
print("Class methods:", class_ctx["methods"])

# 6. Save Context: Export large context files to bypass terminal encoding limits on Windows
nelgraph.dump_context_to_file("execute", "context_export.md", format="markdown")

# 7. Mark Tested: Persist unit test completion status directly into Neo4j
nelgraph.mark_tested("execute", file="src/processors/order.py")

๐Ÿ“Š Interactive Visualization Dashboard

Launch the visual explorer:

nelgraph viz

This starts a FastAPI backend and loads the React dashboard at http://localhost:8080.

Physical Layout Optimizations

To ensure complex codebases are easy to explore, the visualizer uses customized D3 force simulations:

  • Anti-Overlap Collision: Integrates forceCollide representing nodes as physical circles with safety margins (radius + 14px). Node labels and icons never overlap.
  • Compact Peripheries: Restricts many-body repulsion (charge) to a maximum radius using distanceMax(250). This prevents disconnected files and external libraries from floating away into infinity, keeping them compactly structured around the main clusters.
  • Stretched Clusters: Adjusts default link distances to 80px, spreading out highly connected clusters for clean visibility.

๐Ÿงช Automated Testing Environment

The workspace includes a complete testing setup for both Frontend (React) and Backend (FastAPI).

1. Frontend UI Tests

Uses Vitest + React Testing Library + jsdom to test React components.

  • Location: nelgraph/nelgraph/visualization/frontend/
  • Execution:
    cd nelgraph/nelgraph/visualization/frontend
    npm run test          # Run once
    npm run test:watch    # Run in watch mode
    
  • Test Coverage:
    • DetailPanel.test.jsx: Verifies metadata cards, list rendering of complex JSON structures (resolves Error 31), and chip navigations.
    • GraphView.test.jsx: Mocks canvas elements, tests filter switching, and verifies filtering out dangling links.
    • ErrorBoundary.test.jsx: Verifies rendering fallback panels and sending POST error logs to the API.

2. Backend Integration Tests

Uses pytest to verify FastAPI API routes.

  • Location: nelgraph/tests/
  • Execution:
    cd nelgraph
    pytest -v tests/
    
  • Test Coverage:
    • conftest.py: Configures mock_neo4j fixture to intercept get_client calls, bypassing live database requirements.
    • test_api.py: Validates /status, /log, /node/{name}, /node/{name}/mark_tested, and checks dangling edge filtering in /graph/full.

๐Ÿค– Agent Skill Integration

When nelgraph init runs, it generates .agents/nelgraph/SKILL.md. This file contains strict instructions, workflows, and API descriptions that downstream LLM coding agents can load. Agents reading this file are instructed to:

  1. Always run synchronization (nelgraph.run_sync()) before taking actions.
  2. Read overall project structure via get_snapshot() rather than scanning directory trees.
  3. Query source code via get_function_context()["raw_code"] instead of opening files directly.
  4. Inspect ctx["callers"] to calculate change blast radii before refactoring.
  5. Use test_recommendations as a baseline blueprint for test writing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nelgraph-1.1.7.tar.gz (386.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nelgraph-1.1.7-py3-none-any.whl (433.6 kB view details)

Uploaded Python 3

File details

Details for the file nelgraph-1.1.7.tar.gz.

File metadata

  • Download URL: nelgraph-1.1.7.tar.gz
  • Upload date:
  • Size: 386.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nelgraph-1.1.7.tar.gz
Algorithm Hash digest
SHA256 4813e8613a48a35efa9390dbbdf0e58cd0121b6ec775267f146070cf9dd0e076
MD5 a08f28a76a72a0e43b031754d2073991
BLAKE2b-256 45c1ee1d813a37194ad40c9a8bb89512aa2f4ffcda6f0597833a085d18644e20

See more details on using hashes here.

File details

Details for the file nelgraph-1.1.7-py3-none-any.whl.

File metadata

  • Download URL: nelgraph-1.1.7-py3-none-any.whl
  • Upload date:
  • Size: 433.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nelgraph-1.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 60f98526690f1f5e8201821439763122659ed32ecbc418b904c609095b16f21a
MD5 7d648d1a3b14ea8517f53dfcfe6711e1
BLAKE2b-256 2f87f328c965a97a31c0b7cbb17ec50403da328df4fd8ff4698b431fab1190af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page