Skip to main content

Hierarchical Retrieval-Augmented Generation with Haystack

Project description

ManasRAG

Hierarchical Retrieval-Augmented Generation with Haystack

This project implements HiRAG using the Haystack framework. ManasRAG is a hierarchical knowledge retrieval approach that combines knowledge graphs with community-based summarization for improved RAG systems.

Features

  • Hierarchical Knowledge Structure: Uses Leiden clustering to build multi-level community hierarchies
  • Multiple Retrieval Modes:
    • naive: Basic RAG with document chunks
    • local: Local entity and relationship knowledge
    • global: Global community report knowledge
    • bridge: Cross-community reasoning paths
    • nobridge: Local + global combined (no paths)
    • hi: Full hierarchical retrieval combining all modes
  • Flexible Storage: Supports NetworkX (in-memory) and Neo4j graph databases
  • Haystack Integration: Built on Haystack's component and pipeline architecture

Installation

# Basic installation
pip install -e .

# With OpenAI support
pip install -e ".[openai]"

# With Neo4j support
pip install -e ".[neo4j]"

# All optional dependencies
pip install -e ".[all]"

Configuration

Environment Variables

The project supports loading environment variables from a .env file. Copy the example file and configure it:

cp .env.example .env

Edit .env and add your API key:

OPENAI_API_KEY=your-openai-api-key-here

# Optional: Custom API base URL
# OPENAI_BASE_URL=https://api.openai.com/v1

The examples will automatically load environment variables from the .env file.

Quick Start

from manasrag import ManasRAG
from haystack.components.generators import OpenAIGenerator
import os

# Initialize with OpenAI
manas = ManasRAG(
    working_dir="./manas_data",
    generator=OpenAIGenerator(
        model="gpt-4o-mini",
        api_key=os.getenv("OPENAI_API_KEY")
    ),
)

# Index documents
documents = """
# Machine Learning

Machine Learning is a subset of Artificial Intelligence focused on
algorithms that can learn from data...

# Neural Networks

Neural networks are computing systems inspired by biological neurons...
"""

manas.index(documents)

# Query with different modes
result = manas.query(
    "How are neural networks related to machine learning?",
    mode="hi"  # Full hierarchical retrieval
)

print(result["answer"])

Retrieval Modes

Mode Description Components
naive Basic RAG Document chunks only
local Local knowledge Entities + Relations + Chunks
global Global knowledge Community reports + Chunks
bridge Bridge knowledge Cross-community reasoning paths
nobridge No-bridge Local + Global (no paths)
hi Full hierarchical All components combined

Advanced Usage

Custom Query Parameters

from manasrag import QueryParam

param = QueryParam(
    mode="hi",
    top_k=20,           # Number of entities to retrieve
    top_m=10,           # Key entities per community
    max_token_for_text_unit=20000,
    response_type="Multiple Paragraphs",
)

result = manas.query("Your query here", param=param)

Using Custom LLM

from haystack.components.generators import HuggingFaceLocalGenerator

generator = HuggingFaceLocalGenerator(
    model="HuggingFaceH4/zephyr-7b-beta"
)

manas = ManasRAG(generator=generator)

Accessing Communities

# After indexing, access detected communities
for comm_id, community in manas.communities.items():
    print(f"Community: {community.title}")
    print(f"Entities: {len(community.nodes)}")
    print(f"Report: {community.report_string[:200]}...")

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Indexing Pipeline                        │
├─────────────────────────────────────────────────────────────┤
│  Documents → Splitter → EntityExtractor → GraphDocumentStore │
│                                    ↓                          │
│                          CommunityDetector                     │
│                                    ↓                          │
│                       CommunityReportGenerator                │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                       Query Pipeline                          │
├─────────────────────────────────────────────────────────────┤
│  Query → EntityRetriever → HierarchicalRetriever            │
│                            ↓                                 │
│                      ContextBuilder                          │
│                            ↓                                 │
│                       PromptBuilder                          │
│                            ↓                                 │
│                       ChatGenerator → Answer                  │
└─────────────────────────────────────────────────────────────┘

Project Structure

manasrag/
├── core/           # Core data structures
├── stores/         # Graph storage backends
├── components/     # Haystack components
├── pipelines/      # Indexing and query pipelines
└── __init__.py     # High-level API

References

License

MIT

Acknowledgments

Based on HiRAG by Haoyu Huang et al.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manasrag-0.1.1.tar.gz (260.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

manasrag-0.1.1-py3-none-any.whl (99.4 kB view details)

Uploaded Python 3

File details

Details for the file manasrag-0.1.1.tar.gz.

File metadata

  • Download URL: manasrag-0.1.1.tar.gz
  • Upload date:
  • Size: 260.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.4

File hashes

Hashes for manasrag-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f69f59e4f3262f09f824eab875cc8fe1526ab8054ad73097ff44e8454ffcd3d8
MD5 cc765b50bc08e9e3d1c08770756e9757
BLAKE2b-256 8e98ddb4b893af80a6a7da69db83b2aedfb6fb88a391eb8d81c0e88ee5aca213

See more details on using hashes here.

File details

Details for the file manasrag-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: manasrag-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 99.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.4

File hashes

Hashes for manasrag-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5105e73141a509acadb2d665d2294341a9795332ef1bad2f70c49de895ad8114
MD5 2d47f57899971c083d70acbf3d7cd050
BLAKE2b-256 5baa0e4f3d811398ddecc5afa969ec163aa61aa9d97341047f5de02455bf0e9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page