Skip to main content

Helix: Temporal GraphRAG combining LightRAG and Graphiti for time-aware knowledge graphs

Project description

๐Ÿงฌ Helix: Temporal GraphRAG

LightRAG + Graphiti = Temporal Knowledge Graphs for RAG


๐ŸŽฏ What is Helix?

Helix fuses LightRAG's proven dual-level retrieval with Graphiti's bi-temporal Knowledge Graph to create a next-generation RAG system with:

Feature Capability
Temporal Awareness Point-in-time queries, automatic edge invalidation
Multi-Hop Reasoning BFS-based path exploration with scoring
Hallucination Detection Composite Fidelity Index (CFI) verification
Incremental Updates No full graph rebuild required

๐Ÿ“Š Benchmark Targets

Category Datasets Metrics Target Baseline
Temporal TSQA, Time-LongQA, ECT-QA, MultiTQ Hit@1, Hit@5, Acc 70-75% 45-55%
Hallucination Legal QA, Medical QA, FEVER AUC, CFI >0.95 0.84-0.94
Multi-Hop MuSiQue, 2WikiMHQA, HotpotQA F1, EM 70-75 54-59
Scalability UltraDomain (all) Tokens, Latency <600K 14M

๐Ÿ“ฆ Installation

From PyPI

pip install helix-rag

From Source (Development)

git clone https://github.com/YashNuhash/Helix.git
cd Helix

# Install with Helix dependencies
pip install -e ".[helix]"

Dependencies

Helix requires:

  • Neo4j (for Graphiti Knowledge Graph)
  • Supabase (optional, for vector storage)
  • LLM API (any provider - configured via environment)

โš™๏ธ Configuration

Copy .env.example to .env and configure:

cp .env.example .env

Required Environment Variables

# Neo4j Configuration (for Graphiti)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_password

# LLM Configuration (model-agnostic)
LLM_MODEL_NAME=your_model_name
LLM_API_KEY=your_api_key

# Supabase (optional)
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your_key

Supabase Setup (Optional)

Run scripts/supabase_schema.sql in your Supabase SQL Editor to create the vector storage table.


๐Ÿš€ Quick Start

Basic Usage

import asyncio
from helix import Helix

async def main():
    # Initialize Helix
    async with Helix() as helix:
        # Insert document with temporal tracking
        result = await helix.insert(
            "Alan Turing was born on June 23, 1912. "
            "He is considered the father of computer science.",
            source_description="Wikipedia"
        )
        print(f"Extracted {result['entities_extracted']} entities")
        
        # Query with temporal awareness
        answer = await helix.query(
            "When was Alan Turing born?",
            mode="hybrid"
        )
        print(answer["answer"])

asyncio.run(main())

Temporal Queries

from datetime import datetime
from helix import Helix
from helix.utils import is_temporal_query, extract_temporal_params

async def temporal_example():
    async with Helix() as helix:
        # Detect temporal intent
        query = "What was the CEO of Apple in 2015?"
        
        if is_temporal_query(query):
            params = extract_temporal_params(query)
            print(f"Temporal query detected: {params.temporal_keywords}")
        
        # Query with point-in-time context
        result = await helix.query(
            query,
            valid_at=datetime(2015, 1, 1),
            include_temporal_context=True
        )
        print(result)

asyncio.run(temporal_example())

Hallucination Detection

from helix.hallucination import HallucinationDetector

async def verify_response():
    async with Helix() as helix:
        detector = HallucinationDetector(graphiti=helix.graphiti)
        
        # Get response
        result = await helix.query("Tell me about Alan Turing")
        
        # Verify against knowledge graph
        verification = await detector.verify_response(
            response=result["answer"],
            query="Tell me about Alan Turing",
            context=result.get("temporal_context")
        )
        
        print(f"Grounded: {verification.is_grounded}")
        print(f"CFI Score: {verification.confidence_score:.2f}")
        print(f"Entity Coverage: {verification.entity_coverage:.2%}")

asyncio.run(verify_response())

Multi-Hop Reasoning

from helix.multihop import MultiHopRetriever

async def multihop_example():
    async with Helix() as helix:
        retriever = MultiHopRetriever(graphiti=helix.graphiti)
        
        # Find reasoning paths
        paths = await retriever.find_paths(
            query="How is Alan Turing connected to modern AI?",
            max_hops=3
        )
        
        # Format as context
        context = retriever.format_paths_as_context(paths)
        print(context)

asyncio.run(multihop_example())

๐Ÿ“ˆ Evaluation

Running Benchmarks

Helix includes evaluation scripts for academic benchmarks. Use these in Google Colab or Kaggle:

# Install Helix
!pip install helix-rag

# Run temporal benchmark
from helix.eval import TemporalBenchmark

benchmark = TemporalBenchmark(dataset="time-longqa")
results = await benchmark.run()
print(f"Hit@1: {results['hit_at_1']:.2%}")

Supported Benchmarks

Benchmark Dataset Command
Temporal TSQA helix eval --dataset tsqa
Temporal Time-LongQA helix eval --dataset time-longqa
Temporal ECT-QA helix eval --dataset ect-qa
Multi-Hop MuSiQue helix eval --dataset musique
Multi-Hop HotpotQA helix eval --dataset hotpotqa
Hallucination FEVER helix eval --dataset fever
Scalability UltraDomain helix eval --dataset ultradomain

Colab/Kaggle Notebook

# Quick evaluation notebook
import os
os.environ["LLM_API_KEY"] = "your_key"
os.environ["LLM_MODEL_NAME"] = "your_model"
os.environ["NEO4J_URI"] = "bolt://localhost:7687"
os.environ["NEO4J_PASSWORD"] = "password"

from helix import Helix
from helix.eval import run_all_benchmarks

# Run all benchmarks
results = await run_all_benchmarks()
print(results.to_dataframe())

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         Helix                                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚   LightRAG  โ”‚  โ”‚   Graphiti   โ”‚  โ”‚  Helix Modules    โ”‚   โ”‚
โ”‚  โ”‚  (Retrieval)โ”‚  โ”‚ (Temporal KG)โ”‚  โ”‚                   โ”‚   โ”‚
โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚
โ”‚  โ”‚ - Chunking  โ”‚  โ”‚ - Episodes   โ”‚  โ”‚ - TemporalHandler โ”‚   โ”‚
โ”‚  โ”‚ - Embedding โ”‚  โ”‚ - Bi-temporalโ”‚  โ”‚ - Hallucination   โ”‚   โ”‚
โ”‚  โ”‚ - Vector DB โ”‚  โ”‚ - Resolution โ”‚  โ”‚ - MultiHop        โ”‚   โ”‚
โ”‚  โ”‚ - Dual-levelโ”‚  โ”‚ - Invalidate โ”‚  โ”‚ - CFI Scoring     โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚         โ”‚                โ”‚                    โ”‚              โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚                          โ–ผ                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚                    Storage Layer                     โ”‚    โ”‚
โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค    โ”‚
โ”‚  โ”‚  Neo4j (Graph)  โ”‚  Supabase (Vector)  โ”‚  Local KV   โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ Project Structure

helix/
โ”œโ”€โ”€ __init__.py           # Package entry (v0.1.1)
โ”œโ”€โ”€ core/
โ”‚   โ””โ”€โ”€ helix.py          # Main Helix class
โ”œโ”€โ”€ storage/
โ”‚   โ”œโ”€โ”€ graphiti_impl.py  # GraphitiStorage
โ”‚   โ””โ”€โ”€ supabase_impl.py  # SupabaseVectorStorage
โ”œโ”€โ”€ temporal/
โ”‚   โ””โ”€โ”€ query_handler.py  # TemporalQueryHandler
โ”œโ”€โ”€ hallucination/
โ”‚   โ””โ”€โ”€ detector.py       # HallucinationDetector (CFI)
โ”œโ”€โ”€ multihop/
โ”‚   โ””โ”€โ”€ retriever.py      # MultiHopRetriever (BFS)
โ””โ”€โ”€ utils/
    โ””โ”€โ”€ temporal_utils.py # Temporal parsing

๐Ÿ”ฌ Research Goals

Helix is designed to achieve state-of-the-art performance on:

  1. Temporal GraphRAG: 70-75% accuracy on temporal QA benchmarks
  2. Hallucination Detection: AUC >0.95 using graph-aligned verification
  3. Multi-Hop Reasoning: F1 70-75 on complex reasoning benchmarks
  4. Scalability: <600K tokens for indexing (vs 14M baseline)

See PLAN.md for detailed research methodology.


๐Ÿ“š Citation

If you use Helix in your research, please cite:

@software{helix2024,
  title = {Helix: Temporal GraphRAG with LightRAG and Graphiti},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/YashNuhash/Helix}
}

๐Ÿค Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.


๐Ÿ“„ License

MIT License - see LICENSE for details.


Built with ๐Ÿงฌ Helix

LightRAG + Graphiti = Temporal GraphRAG

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helix_rag-0.1.16.tar.gz (562.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

helix_rag-0.1.16-py3-none-any.whl (49.0 kB view details)

Uploaded Python 3

File details

Details for the file helix_rag-0.1.16.tar.gz.

File metadata

  • Download URL: helix_rag-0.1.16.tar.gz
  • Upload date:
  • Size: 562.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for helix_rag-0.1.16.tar.gz
Algorithm Hash digest
SHA256 b3ce74b4a97813128e9f67d5756198d31a073472940909b2fbeb1a7e6f741279
MD5 3002c3d2a278c46803db849a2ea12232
BLAKE2b-256 9f004e10e3faa0e8586c6b115cdb8df4e47c585c08787fe93f5b4aa33fb369bf

See more details on using hashes here.

File details

Details for the file helix_rag-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: helix_rag-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 49.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for helix_rag-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 58ca22ebe6f9e8dd68caa79d1f7d9e3a27ef51c84b39aa028ba32b7eaf79d79d
MD5 6de8abbb8d3fe98fd0dfc03c8c97a55f
BLAKE2b-256 e7ef256b0f9ee1e24db4a63b1cdb31b0fdade2cc4aad8019af6d914a4f5ccfb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page