Skip to main content

Layout Aware RAG

Project description

Layout-Aware RAG (LA-RAG)

This is the source code repository for Python package la-rag

Why Layout-Aware RAG?

The impressive abilities of large language models (LLMs) offer exciting possibilities for large-scale document analysis. However, a significant challenge remains in making text from extensive documents, such as large PDFs, accessible to LLMs due to their limited context window, which restricts the amount of text they can process at a time.

Retrieval Augmented Generation (RAG) systems address this challenge by combining LLMs with advanced retrieval techniques. Common chunking technologies used in LangChain, such as TextSplitter, RecursiveCharacterTextSplitter, etc break documents into smaller sections to fit within the LLM’s context window. But, these methods will allow the system to lose the semantic connection of layout features, such as sections & subsections, tables, lists, bullet points, etc. For example, in a bullet-pointed list, all the points can be interrelated, and each point is connected to the paragraph or the last sentence of the paragraph given above the list.

ayout-aware RAG considers the layout features of the document and the semantic connection between them.

alt text

Important Note

Although I have made the repository public and released the python package, I am still working on this project. I will be adding more details on the installation and building projects using la-rag soon. I encourage you to look into the tests\sample.py for reference to a sample code to see how to use the package.

If you are interested to collaborate on this project, please email me at muafirathasnikt@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

la_rag-0.1.4.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

la_rag-0.1.4-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file la_rag-0.1.4.tar.gz.

File metadata

  • Download URL: la_rag-0.1.4.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for la_rag-0.1.4.tar.gz
Algorithm Hash digest
SHA256 0ce60be21e78f0c6a1b542a243da5d170bcfc9c34eadf59faf345b70767bad74
MD5 1aaec161df786d625978ada0c6683a4d
BLAKE2b-256 7e1680b40d176c93041290f43704c62fa4492f8e7a97fd536b1a3c3147ddac7d

See more details on using hashes here.

File details

Details for the file la_rag-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: la_rag-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for la_rag-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 16add094063423266bbbe9b8094cadca1492ab5c06103103420d54377b6ad92d
MD5 aeb3fdc590668563ba3b5bd40ab12550
BLAKE2b-256 95be7a55625b164d49293cc35e4d7a2071c2cbfa5c1e2aa83528e9f87b645000

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page