Skip to main content

A hierarchical RAG framework for chunks retrieval.

Project description

📚 HieRAG – Hierarchical Retrieval-Augmented Generation

hie_rag is a modular, extensible Python package designed for Hierarchical Retrieval-Augmented Generation (Hie-RAG). It enables you to extract, split, embed, summarize, and query documents using both chunk- and tree-level semantics, all backed by a vector database.


✅ Features

  • PDF/DOCX/XLSX/CSV/PPT ingestion and intelligent semantic splitting
  • Hierarchical summarization tree building
  • Embedding-based similarity search
  • Vector DB indexing and querying (e.g., Qdrant)
  • Full streaming interface for frontend integration

📦 Components Used

Module Role
HieRAG Main interface for processing, querying, and managing indexes.
Split Split raw text into chunks
Process Adds metadata and embeddings to chunks
TreeIndex Builds tree-based hierarchical summaries
Utils Text extraction and token handling
Vectordb Stores and queries summaries/chunks
AiClient Handles embedding API (e.g., OpenAI, HuggingFace, Ollama)

🛠 Installation

pip install hie-rag

⏯︎ How to Use

Initialize HieRAG

from hie_rag import HieRag

hierag = HieRag(base_url="http://localhost:11434")

[!NOTE] Ensure you have set u an AI server. You should have a chatting model and a embedding model running.

Process and Index a File

with open("sample.pdf", "rb") as f:
    file_bytes = f.read()

for status in hierag.process_and_save_index_stream(
    file_name="sample.pdf",
    uploaded_file=file_bytes,
    min_chunk_size=300,
    max_chunk_size=500
):
    print(status)
{
  "status": "✅ Done",
  "file_id": "abc123",
  "summary_count": 5,
  "chunk_count": 22
}

Query the Summaries or Chunks

Query Summaries by text:

results = hierag.query_summaries_by_text("What is the contract duration?")

Query Chunks by text:

results = hierag.query_chunks_by_text("Explain clause 3.4", file_id="abc123")

List & Manage Indexed Files

List All Indexed Files

hierag.list_summaries()

View Chunks of a File

hierag.list_chunks(file_id="abc123")

Delete a File Index

hierag.delete_index(file_id="abc123")

Get the Summary of a File

hierag.get_summary(file_id="abc123")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hie_rag-0.2.5.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hie_rag-0.2.5-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file hie_rag-0.2.5.tar.gz.

File metadata

  • Download URL: hie_rag-0.2.5.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for hie_rag-0.2.5.tar.gz
Algorithm Hash digest
SHA256 b5a23b3fb769358559ccf2c6004c4e2df0641b485e215424f06b293554965741
MD5 aac54a20e69bac67e500431761059290
BLAKE2b-256 f63531e7f16171d661a98838f51aa436cb1b1417b4874a91e192b515111c6189

See more details on using hashes here.

File details

Details for the file hie_rag-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: hie_rag-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for hie_rag-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5a7cf1e0db37ecf7dd5a371a1d47738162ec252e229b026fb84488f6e2a59d16
MD5 d2e38753f64b519f81dd9c505de271ae
BLAKE2b-256 4a014fd3b37ca669f154f9f8509b95cb8976d780dc81f90b0d7b3544920dfc6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page