Hierarchical Retrieval-Augmented Generation with Haystack
Project description
ManasRAG
Hierarchical Retrieval-Augmented Generation with Haystack
This project implements HiRAG using the Haystack framework. ManasRAG is a hierarchical knowledge retrieval approach that combines knowledge graphs with community-based summarization for improved RAG systems.
Features
- Hierarchical Knowledge Structure: Uses Leiden clustering to build multi-level community hierarchies
- Multiple Retrieval Modes:
naive: Basic RAG with document chunkslocal: Local entity and relationship knowledgeglobal: Global community report knowledgebridge: Cross-community reasoning pathsnobridge: Local + global combined (no paths)hi: Full hierarchical retrieval combining all modes
- Flexible Storage: Supports NetworkX (in-memory) and Neo4j graph databases
- Haystack Integration: Built on Haystack's component and pipeline architecture
Installation
# Basic installation
pip install -e .
# With OpenAI support
pip install -e ".[openai]"
# With Neo4j support
pip install -e ".[neo4j]"
# All optional dependencies
pip install -e ".[all]"
Configuration
Environment Variables
The project supports loading environment variables from a .env file. Copy the example file and configure it:
cp .env.example .env
Edit .env and add your API key:
OPENAI_API_KEY=your-openai-api-key-here
# Optional: Custom API base URL
# OPENAI_BASE_URL=https://api.openai.com/v1
The examples will automatically load environment variables from the .env file.
Quick Start
from manasrag import ManasRAG
from haystack.components.generators import OpenAIGenerator
import os
# Initialize with OpenAI
manas = ManasRAG(
working_dir="./manas_data",
generator=OpenAIGenerator(
model="gpt-4o-mini",
api_key=os.getenv("OPENAI_API_KEY")
),
)
# Index documents
documents = """
# Machine Learning
Machine Learning is a subset of Artificial Intelligence focused on
algorithms that can learn from data...
# Neural Networks
Neural networks are computing systems inspired by biological neurons...
"""
manas.index(documents)
# Query with different modes
result = manas.query(
"How are neural networks related to machine learning?",
mode="hi" # Full hierarchical retrieval
)
print(result["answer"])
Retrieval Modes
| Mode | Description | Components |
|---|---|---|
naive |
Basic RAG | Document chunks only |
local |
Local knowledge | Entities + Relations + Chunks |
global |
Global knowledge | Community reports + Chunks |
bridge |
Bridge knowledge | Cross-community reasoning paths |
nobridge |
No-bridge | Local + Global (no paths) |
hi |
Full hierarchical | All components combined |
Advanced Usage
Custom Query Parameters
from manasrag import QueryParam
param = QueryParam(
mode="hi",
top_k=20, # Number of entities to retrieve
top_m=10, # Key entities per community
max_token_for_text_unit=20000,
response_type="Multiple Paragraphs",
)
result = manas.query("Your query here", param=param)
Using Custom LLM
from haystack.components.generators import HuggingFaceLocalGenerator
generator = HuggingFaceLocalGenerator(
model="HuggingFaceH4/zephyr-7b-beta"
)
manas = ManasRAG(generator=generator)
Accessing Communities
# After indexing, access detected communities
for comm_id, community in manas.communities.items():
print(f"Community: {community.title}")
print(f"Entities: {len(community.nodes)}")
print(f"Report: {community.report_string[:200]}...")
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Indexing Pipeline │
├─────────────────────────────────────────────────────────────┤
│ Documents → Splitter → EntityExtractor → GraphDocumentStore │
│ ↓ │
│ CommunityDetector │
│ ↓ │
│ CommunityReportGenerator │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Query Pipeline │
├─────────────────────────────────────────────────────────────┤
│ Query → EntityRetriever → HierarchicalRetriever │
│ ↓ │
│ ContextBuilder │
│ ↓ │
│ PromptBuilder │
│ ↓ │
│ ChatGenerator → Answer │
└─────────────────────────────────────────────────────────────┘
Project Structure
manasrag/
├── core/ # Core data structures
├── stores/ # Graph storage backends
├── components/ # Haystack components
├── pipelines/ # Indexing and query pipelines
└── __init__.py # High-level API
References
License
MIT
Acknowledgments
Based on HiRAG by Haoyu Huang et al.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file manasrag-0.1.1.tar.gz.
File metadata
- Download URL: manasrag-0.1.1.tar.gz
- Upload date:
- Size: 260.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f69f59e4f3262f09f824eab875cc8fe1526ab8054ad73097ff44e8454ffcd3d8
|
|
| MD5 |
cc765b50bc08e9e3d1c08770756e9757
|
|
| BLAKE2b-256 |
8e98ddb4b893af80a6a7da69db83b2aedfb6fb88a391eb8d81c0e88ee5aca213
|
File details
Details for the file manasrag-0.1.1-py3-none-any.whl.
File metadata
- Download URL: manasrag-0.1.1-py3-none-any.whl
- Upload date:
- Size: 99.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5105e73141a509acadb2d665d2294341a9795332ef1bad2f70c49de895ad8114
|
|
| MD5 |
2d47f57899971c083d70acbf3d7cd050
|
|
| BLAKE2b-256 |
5baa0e4f3d811398ddecc5afa969ec163aa61aa9d97341047f5de02455bf0e9f
|