Skip to main content

Layout Aware RAG

Project description

Layout-Aware RAG (LA-RAG)

This is the source code repository for Python package la-rag

Why Layout-Aware RAG?

The impressive abilities of large language models (LLMs) offer exciting possibilities for large-scale document analysis. However, a significant challenge remains in making text from extensive documents, such as large PDFs, accessible to LLMs due to their limited context window, which restricts the amount of text they can process at a time.

Retrieval Augmented Generation (RAG) systems address this challenge by combining LLMs with advanced retrieval techniques. Common chunking technologies used in LangChain, such as TextSplitter, RecursiveCharacterTextSplitter, etc break documents into smaller sections to fit within the LLM’s context window. But, these methods will allow the system to lose the semantic connection of layout features, such as sections & subsections, tables, lists, bullet points, etc. For example, in a bullet-pointed list, all the points can be interrelated, and each point is connected to the paragraph or the last sentence of the paragraph given above the list.

ayout-aware RAG considers the layout features of the document and the semantic connection between them.

alt text

Important Note

Although I have made the repository public and released the python package, I am still working on this project. I will be adding more details on the installation and building projects using la-rag soon. I encourage you to look into the tests\sample.py for reference to a sample code to see how to use the package.

If you are interested to collaborate on this project, please email me at muafirathasnikt@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

la_rag-0.1.5.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

la_rag-0.1.5-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file la_rag-0.1.5.tar.gz.

File metadata

  • Download URL: la_rag-0.1.5.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for la_rag-0.1.5.tar.gz
Algorithm Hash digest
SHA256 676925ecdbdceec95953eed32e414ccc86dcc47028ee436eb16daa9cfa7389ee
MD5 a133e185eaf4ea2628051bcf3dca0ddb
BLAKE2b-256 112c427b8db3a2322d3c60b588bf63e2675879498bbe08fccf885c065b08bdbc

See more details on using hashes here.

File details

Details for the file la_rag-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: la_rag-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for la_rag-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 80b0288aaf288502574dd6b58645c14adbd1ee40e2dfb51a9337997b32689eb1
MD5 7a201b6f195041688206d9d7c168279c
BLAKE2b-256 82a39b37bc5ed4de5a45867557547743fd4883e0d39cd00ff3bf22062caaa513

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page