OptimisedRAG: Simple and Fast Retrieval-Augmented Generation a modified version of LightRAG
Project description
or-lib
or-lib is a modular extension of LightRAG that enriches the Retrieval-Augmented Generation (RAG) pipeline by integrating graph-based algorithms for efficient and quality-enhanced retrieval and image-based query support for multimodal reasoning. It builds upon LightRAG’s hybrid architecture to improve both retrieval accuracy and user interactivity. Enhanced by Md Nazish Arman
🔍 Key Enhancements Over LightRAG
1. Graph-Based Retrieval Optimization
Introduces several graph algorithms to rank and filter knowledge graph nodes and relationships for more relevant information retrieval:
- Degree Centrality
- PageRank
- Article Rank (personalized PageRank)
- Betweenness Centrality
- CELF-Based Influence Maximization
These algorithms help dynamically identify high-impact entities and relations in the knowledge graph, improving time by 50% and retrieval quality by 30%.
2. Image Query Support
Enhances RAG to handle image-based prompts via pre-indexed image metadata and summaries:
- Extracts and processes image chunks in
_build_query_context() - Associates each image with a unique S3
image_id - Returns presigned image URLs for downstream consumption
- Adds visual context understanding into RAG flows
📦 Features
| Feature | Description |
|---|---|
GraphAlgorithms |
Modular class with pluggable centrality/influence metrics. Used at query time to score entities. |
QueryParam.graph_algorithm |
Users can dynamically select which graph algorithm to apply per query (pagerank, degree_centrality, etc). |
Image Chunk Processing |
Enhances query results with structured image summaries and image IDs that can be mapped to S3-hosted images. |
Presigned URL Integration |
Supports image result delivery through S3-backed URL mapping. |
🧠 Graph Algorithm
The GraphAlgorithms class (in algorithms.py) provides the following methods:
compute_degree_centrality(node_datas, edge_datas, k, weighted=False)
compute_pagerank(node_datas, edge_datas, k)
compute_article_rank(node_datas, edge_datas, k)
compute_betweenness_centrality(node_datas, edge_datas, k)
celf_influence_maximization(node_datas, edge_datas, k)
Each method updates node_datas with rank scores and returns the top k nodes.
🧾 Usage Example
from orlib.algorithms import GraphAlgorithms
graph_algo = GraphAlgorithms()
top_nodes = graph_algo.compute_pagerank(node_datas, edge_datas, k=10)
To use it in a query:
query_param.graph_algorithm = "pagerank"
response, image_ids = await kg_query(
query,
knowledge_graph_inst,
entities_vdb,
relationships_vdb,
text_chunks_db,
query_param,
global_config
)
🖼️ Image Support
✨ What It Does:
- Parses uploaded documents and stores image summaries as chunks, along with relevant metadata such as
image_id(used for S3 mapping). - During image-related queries, retrieves the relevant image chunks.
- Extracts summaries and metadata for matching image chunks.
- Sends the image summaries in CSV format to the LLM.
- Filters out image IDs whose summaries are most relevant to the query.
- Implements caching of image IDs related to previous queries to avoid redundant processing.
- Returns the relevant image IDs.
- Generates presigned URLs for image access.
🔁 Flow:
-
Image metadata is indexed with
image_idandcontent. -
During a query:
- Keywords are matched against stored image chunks.
- Relevant results are structured into an
image_chunk.
-
An
img_promptis generated using:image_csv_data = "serial number,image id,image summary\n1,img001,..."
-
The LLM receives both textual and visual context for improved relevance.
🗂️ Image Storage
Images are expected to be pre-processed and stored in S3. The corresponding image_id is then used to generate presigned URLs for secure frontend rendering.
🧪 QueryParam Extensions
QueryParam(
graph_algorithm="pagerank",
only_need_prompt=False,
top_k=60,
response_type="Bullet Points",
...
)
graph_algorithm: Selects the algorithm to guide ranking logic in the retrieval phase.top_k: Defines how many top nodes or relationships to consider.only_need_context/only_need_prompt: Controls which intermediate step to return (useful for debugging or chaining outputs).
✅ Supported Graph Algorithms
| Algorithm | Purpose |
|---|---|
pagerank |
Scores nodes based on importance across the graph |
degree_centrality |
Scores nodes by connection count |
article_rank |
Personalized PageRank for localized influence |
betweenness_centrality |
Captures bridge nodes that connect clusters |
celf_influence |
Approximates influence spread using CELF optimization |
📌 Requirements
- Python 3.10+
networkx- LightRAG dependencies (
faiss,transformers,langchain, etc.) boto3or any S3-compatible client for presigned URLs
Author
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file or_lib-1.3.0.tar.gz.
File metadata
- Download URL: or_lib-1.3.0.tar.gz
- Upload date:
- Size: 203.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8171d97f7af233f04fb3cbbd3e0150967699c612602580c4fa7bb626b9f30753
|
|
| MD5 |
53ba5b1df0ba7f339cf4cba7f91cd334
|
|
| BLAKE2b-256 |
b0930ec9077e5ae37bf9dfbd8488d4955b7c9fd895f420764cf335f0cb474aeb
|
File details
Details for the file or_lib-1.3.0-py3-none-any.whl.
File metadata
- Download URL: or_lib-1.3.0-py3-none-any.whl
- Upload date:
- Size: 226.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6889c50f52cd5affcab0f08152fc67a80fe8b82d3d54db79c3b34f0c269c7658
|
|
| MD5 |
5f29871ec8b69add8fbb3850295b2dea
|
|
| BLAKE2b-256 |
73cf727611be725526edb9823c0ec549bf3893d1a222142d686b376f0fb77706
|