A hierarchical RAG framework for chunks retrieval.
Project description
📚 HieRAG – Hierarchical Retrieval-Augmented Generation
hie_rag is a modular, extensible Python package designed for Hierarchical Retrieval-Augmented Generation (Hie-RAG). It enables you to extract, split, embed, summarize, and query documents using both chunk- and tree-level semantics, all backed by a vector database.
✅ Features
- PDF/DOCX/XLSX/CSV/PPT ingestion and intelligent semantic splitting
- Hierarchical summarization tree building
- Embedding-based similarity search
- Vector DB indexing and querying (e.g., Qdrant)
- Full streaming interface for frontend integration
📦 Components Used
| Module | Role |
|---|---|
HieRAG |
Main interface for processing, querying, and managing indexes. |
Split |
Split raw text into chunks |
Process |
Adds metadata and embeddings to chunks |
TreeIndex |
Builds tree-based hierarchical summaries |
Utils |
Text extraction and token handling |
Vectordb |
Stores and queries summaries/chunks |
AiClient |
Handles embedding API (e.g., OpenAI, HuggingFace, Ollama) |
🛠 Installation
pip install hie-rag
⏯︎ How to Use
Initialize HieRAG
from hie_rag import HieRag
hierag = HieRag(base_url="http://localhost:11434")
[!NOTE] Ensure you have set u an AI server. You should have a chatting model and a embedding model running.
Process and Index a File
with open("sample.pdf", "rb") as f:
file_bytes = f.read()
for status in hierag.process_and_save_index_stream(
file_name="sample.pdf",
uploaded_file=file_bytes,
min_chunk_size=300,
max_chunk_size=500
):
print(status)
{ "status": "✅ Done", "file_id": "abc123", "summary_count": 5, "chunk_count": 22 }
Query the Summaries or Chunks
Query Summaries by text:
results = hierag.query_summaries_by_text("What is the contract duration?")
Query Chunks by text:
results = hierag.query_chunks_by_text("Explain clause 3.4", file_id="abc123")
List & Manage Indexed Files
List All Indexed Files
hierag.list_summaries()
View Chunks of a File
hierag.list_chunks(file_id="abc123")
Delete a File Index
hierag.delete_index(file_id="abc123")
Get the Summary of a File
hierag.get_summary(file_id="abc123")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hie_rag-0.2.4.tar.gz.
File metadata
- Download URL: hie_rag-0.2.4.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9aaef0cb08a4c5f60184bfa936bd00abb7138d21fb7d0d62460d87c45e00f59e
|
|
| MD5 |
1240023ec4b728a29f1df6245d64f9eb
|
|
| BLAKE2b-256 |
431044c256820287ab73668df17b0259fa6a81bfcc358221bae8c454879dfb42
|
File details
Details for the file hie_rag-0.2.4-py3-none-any.whl.
File metadata
- Download URL: hie_rag-0.2.4-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22ecc57828025fb6c1efaffcdbc694608cec2e40b02cd6407443dcb9b8878563
|
|
| MD5 |
f1b56e292c61bb5c492845781af94189
|
|
| BLAKE2b-256 |
c540ac2be92f94ad9237ecf8d8c7c8d7751215d3f319aef847031f120aff8a3c
|