Skip to main content

llama-index readers smart_pdf_loader integration

Project description

Smart PDF Loader

pip install llama-index-readers-smart-pdf-loader

SmartPDFLoader is a super fast PDF reader that understands the layout structure of PDFs such as nested sections, nested lists, paragraphs and tables. It uses layout information to smartly chunk PDFs into optimal short contexts for LLMs.

Requirements

Install the llmsherpa library if it is not already present:

pip install llmsherpa

Usage

Here's an example usage of the SmartPDFLoader:

from llama_index.readers.smart_pdf_loader import SmartPDFLoader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://arxiv.org/pdf/1910.13461.pdf"  # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
documents = pdf_loader.load_data(pdf_url)

Now you can use the documents with other LlamaIndex components. For example, for retrieval augmented generation, try this:

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("list all the tasks that work with bart")
print(response)

response = query_engine.query("what is the bart performance score on squad")
print(response)

More Examples

SmartPDFLoader is based on LayoutPDFReader from llmsherpa library. See the documentation there to explore other ways to use the library for connecting data from your PDFs with LLMs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file llama_index_readers_smart_pdf_loader-0.2.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_smart_pdf_loader-0.2.0.tar.gz
Algorithm Hash digest
SHA256 91be7e15a4e3b1b8607b8d1bfcf3747d672a81d0e27f896daa96e2116acbe5d9
MD5 4cdda1457621478e6bac21744ddac0f5
BLAKE2b-256 f2cc712ce687db5f1aa3781bbaf03c2e002d7859fe92fc958df01682abf95566

See more details on using hashes here.

File details

Details for the file llama_index_readers_smart_pdf_loader-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_smart_pdf_loader-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c0511afdb7e2c59619c9b887ad054716d64273410545b35cd85dcace2dbb6ff
MD5 795b8df55ad00cc2dc28c645171bf114
BLAKE2b-256 c7ba02c02c003642f4a85a7fdd9b6f87edf8f9bf2d9a17bf9218f2795eca885e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page