Layout Aware RAG
Project description
Layout-Aware RAG (LA-RAG)
This is the source code repository for Python package la-rag
Why Layout-Aware RAG?
The impressive abilities of large language models (LLMs) offer exciting possibilities for large-scale document analysis. However, a significant challenge remains in making text from extensive documents, such as large PDFs, accessible to LLMs due to their limited context window, which restricts the amount of text they can process at a time.
Retrieval Augmented Generation (RAG) systems address this challenge by combining LLMs with advanced retrieval techniques. Common chunking technologies used in LangChain, such as TextSplitter, RecursiveCharacterTextSplitter, etc break documents into smaller sections to fit within the LLM’s context window. But, these methods will allow the system to lose the semantic connection of layout features, such as sections & subsections, tables, lists, bullet points, etc. For example, in a bullet-pointed list, all the points can be interrelated, and each point is connected to the paragraph or the last sentence of the paragraph given above the list.
ayout-aware RAG considers the layout features of the document and the semantic connection between them.
Important Note
Although I have made the repository public and released the python package, I am still working on this project. I will be adding more details on the installation and building projects using la-rag soon. I encourage you to look into the tests\sample.py for reference to a sample code to see how to use the package.
If you are interested to collaborate on this project, please email me at muafirathasnikt@gmail.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file la_rag-0.1.4.tar.gz.
File metadata
- Download URL: la_rag-0.1.4.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ce60be21e78f0c6a1b542a243da5d170bcfc9c34eadf59faf345b70767bad74
|
|
| MD5 |
1aaec161df786d625978ada0c6683a4d
|
|
| BLAKE2b-256 |
7e1680b40d176c93041290f43704c62fa4492f8e7a97fd536b1a3c3147ddac7d
|
File details
Details for the file la_rag-0.1.4-py3-none-any.whl.
File metadata
- Download URL: la_rag-0.1.4-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16add094063423266bbbe9b8094cadca1492ab5c06103103420d54377b6ad92d
|
|
| MD5 |
aeb3fdc590668563ba3b5bd40ab12550
|
|
| BLAKE2b-256 |
95be7a55625b164d49293cc35e4d7a2071c2cbfa5c1e2aa83528e9f87b645000
|