Skip to main content

OptimisedRAG: Simple and Fast Retrieval-Augmented Generation a modified version of LightRAG

Project description

or-lib

or-lib is a modular extension of LightRAG that enriches the Retrieval-Augmented Generation (RAG) pipeline by integrating graph-based algorithms for efficient and quality-enhanced retrieval and image-based query support for multimodal reasoning. It builds upon LightRAG’s hybrid architecture to improve both retrieval accuracy and user interactivity. Enhanced by Md Nazish Arman


🔍 Key Enhancements Over LightRAG

1. Graph-Based Retrieval Optimization

Introduces several graph algorithms to rank and filter knowledge graph nodes and relationships for more relevant information retrieval:

  • Degree Centrality
  • PageRank
  • Article Rank (personalized PageRank)
  • Betweenness Centrality
  • CELF-Based Influence Maximization

These algorithms help dynamically identify high-impact entities and relations in the knowledge graph, improving time by 50% and retrieval quality by 30%.

2. Image Query Support

Enhances RAG to handle image-based prompts via pre-indexed image metadata and summaries:

  • Extracts and processes image chunks in _build_query_context()
  • Associates each image with a unique S3 image_id
  • Returns presigned image URLs for downstream consumption
  • Adds visual context understanding into RAG flows

📦 Features

Feature Description
GraphAlgorithms Modular class with pluggable centrality/influence metrics. Used at query time to score entities.
QueryParam.graph_algorithm Users can dynamically select which graph algorithm to apply per query (pagerank, degree_centrality, etc).
Image Chunk Processing Enhances query results with structured image summaries and image IDs that can be mapped to S3-hosted images.
Presigned URL Integration Supports image result delivery through S3-backed URL mapping.

🧠 Graph Algorithm

The GraphAlgorithms class (in algorithms.py) provides the following methods:

compute_degree_centrality(node_datas, edge_datas, k, weighted=False)
compute_pagerank(node_datas, edge_datas, k)
compute_article_rank(node_datas, edge_datas, k)
compute_betweenness_centrality(node_datas, edge_datas, k)
celf_influence_maximization(node_datas, edge_datas, k)

Each method updates node_datas with rank scores and returns the top k nodes.


🧾 Usage Example

from orlib.algorithms import GraphAlgorithms
graph_algo = GraphAlgorithms()

top_nodes = graph_algo.compute_pagerank(node_datas, edge_datas, k=10)

To use it in a query:

query_param.graph_algorithm = "pagerank"
response, image_ids = await kg_query(
    query,
    knowledge_graph_inst,
    entities_vdb,
    relationships_vdb,
    text_chunks_db,
    query_param,
    global_config
)

🖼️ Image Support

✨ What It Does:

  • Parses uploaded documents and stores image summaries as chunks, along with relevant metadata such as image_id (used for S3 mapping).
  • During image-related queries, retrieves the relevant image chunks.
  • Extracts summaries and metadata for matching image chunks.
  • Sends the image summaries in CSV format to the LLM.
  • Filters out image IDs whose summaries are most relevant to the query.
  • Implements caching of image IDs related to previous queries to avoid redundant processing.
  • Returns the relevant image IDs.
  • Generates presigned URLs for image access.

🔁 Flow:

  1. Image metadata is indexed with image_id and content.

  2. During a query:

    • Keywords are matched against stored image chunks.
    • Relevant results are structured into an image_chunk.
  3. An img_prompt is generated using:

    image_csv_data = "serial number,image id,image summary\n1,img001,..."
    
  4. The LLM receives both textual and visual context for improved relevance.

🗂️ Image Storage

Images are expected to be pre-processed and stored in S3. The corresponding image_id is then used to generate presigned URLs for secure frontend rendering.


🧪 QueryParam Extensions

QueryParam(
    graph_algorithm="pagerank",
    only_need_prompt=False,
    top_k=60,
    response_type="Bullet Points",
    ...
)
  • graph_algorithm: Selects the algorithm to guide ranking logic in the retrieval phase.
  • top_k: Defines how many top nodes or relationships to consider.
  • only_need_context / only_need_prompt: Controls which intermediate step to return (useful for debugging or chaining outputs).

✅ Supported Graph Algorithms

Algorithm Purpose
pagerank Scores nodes based on importance across the graph
degree_centrality Scores nodes by connection count
article_rank Personalized PageRank for localized influence
betweenness_centrality Captures bridge nodes that connect clusters
celf_influence Approximates influence spread using CELF optimization

📌 Requirements

  • Python 3.10+
  • networkx
  • LightRAG dependencies (faiss, transformers, langchain, etc.)
  • boto3 or any S3-compatible client for presigned URLs

Author

Md Nazish Arman

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

or_lib-1.3.0.tar.gz (203.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

or_lib-1.3.0-py3-none-any.whl (226.4 kB view details)

Uploaded Python 3

File details

Details for the file or_lib-1.3.0.tar.gz.

File metadata

  • Download URL: or_lib-1.3.0.tar.gz
  • Upload date:
  • Size: 203.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for or_lib-1.3.0.tar.gz
Algorithm Hash digest
SHA256 8171d97f7af233f04fb3cbbd3e0150967699c612602580c4fa7bb626b9f30753
MD5 53ba5b1df0ba7f339cf4cba7f91cd334
BLAKE2b-256 b0930ec9077e5ae37bf9dfbd8488d4955b7c9fd895f420764cf335f0cb474aeb

See more details on using hashes here.

File details

Details for the file or_lib-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: or_lib-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 226.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for or_lib-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6889c50f52cd5affcab0f08152fc67a80fe8b82d3d54db79c3b34f0c269c7658
MD5 5f29871ec8b69add8fbb3850295b2dea
BLAKE2b-256 73cf727611be725526edb9823c0ec549bf3893d1a222142d686b376f0fb77706

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page