Skip to main content

A hierarchical RAG framework for chunks retrieval.

Project description

📚 HieRAG – Hierarchical Retrieval-Augmented Generation

hie_rag is a modular, extensible Python package designed for Hierarchical Retrieval-Augmented Generation (Hie-RAG). It enables you to extract, split, embed, summarize, and query documents using both chunk- and tree-level semantics, all backed by a vector database.


✅ Features

  • PDF/DOCX/XLSX/CSV/PPT ingestion and intelligent semantic splitting
  • Hierarchical summarization tree building
  • Embedding-based similarity search
  • Vector DB indexing and querying (e.g., Qdrant)
  • Full streaming interface for frontend integration

📦 Components Used

Module Role
HieRAG Main interface for processing, querying, and managing indexes.
Split Split raw text into chunks
Process Adds metadata and embeddings to chunks
TreeIndex Builds tree-based hierarchical summaries
Utils Text extraction and token handling
Vectordb Stores and queries summaries/chunks
AiClient Handles embedding API (e.g., OpenAI, HuggingFace, Ollama)

🛠 Installation

pip install hie-rag

⏯︎ How to Use

Initialize HieRAG

from hie_rag import HieRag

hierag = HieRag(base_url="http://localhost:11434")

[!NOTE] Ensure you have set u an AI server. You should have a chatting model and a embedding model running.

Process and Index a File

with open("sample.pdf", "rb") as f:
    file_bytes = f.read()

for status in hierag.process_and_save_index_stream(
    file_name="sample.pdf",
    uploaded_file=file_bytes,
    min_chunk_size=300,
    max_chunk_size=500
):
    print(status)
{
  "status": "✅ Done",
  "file_id": "abc123",
  "summary_count": 5,
  "chunk_count": 22
}

Query the Summaries or Chunks

Query Summaries by text:

results = hierag.query_summaries_by_text("What is the contract duration?")

Query Chunks by text:

results = hierag.query_chunks_by_text("Explain clause 3.4", file_id="abc123")

List & Manage Indexed Files

List All Indexed Files

hierag.list_summaries()

View Chunks of a File

hierag.list_chunks(file_id="abc123")

Delete a File Index

hierag.delete_index(file_id="abc123")

Get the Summary of a File

hierag.get_summary(file_id="abc123")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hie_rag-0.2.4.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hie_rag-0.2.4-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file hie_rag-0.2.4.tar.gz.

File metadata

  • Download URL: hie_rag-0.2.4.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for hie_rag-0.2.4.tar.gz
Algorithm Hash digest
SHA256 9aaef0cb08a4c5f60184bfa936bd00abb7138d21fb7d0d62460d87c45e00f59e
MD5 1240023ec4b728a29f1df6245d64f9eb
BLAKE2b-256 431044c256820287ab73668df17b0259fa6a81bfcc358221bae8c454879dfb42

See more details on using hashes here.

File details

Details for the file hie_rag-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: hie_rag-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for hie_rag-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 22ecc57828025fb6c1efaffcdbc694608cec2e40b02cd6407443dcb9b8878563
MD5 f1b56e292c61bb5c492845781af94189
BLAKE2b-256 c540ac2be92f94ad9237ecf8d8c7c8d7751215d3f319aef847031f120aff8a3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page