Skip to main content

llama-index readers smart_pdf_loader integration

Project description

Smart PDF Loader

pip install llama-index-readers-smart-pdf-loader

SmartPDFLoader is a super fast PDF reader that understands the layout structure of PDFs such as nested sections, nested lists, paragraphs and tables. It uses layout information to smartly chunk PDFs into optimal short contexts for LLMs.

Requirements

Install the llmsherpa library if it is not already present:

pip install llmsherpa

Usage

Here's an example usage of the SmartPDFLoader:

from llama_index.readers.smart_pdf_loader import SmartPDFLoader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://arxiv.org/pdf/1910.13461.pdf"  # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
documents = pdf_loader.load_data(pdf_url)

Now you can use the documents with other LlamaIndex components. For example, for retrieval augmented generation, try this:

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("list all the tasks that work with bart")
print(response)

response = query_engine.query("what is the bart performance score on squad")
print(response)

More Examples

SmartPDFLoader is based on LayoutPDFReader from llmsherpa library. See the documentation there to explore other ways to use the library for connecting data from your PDFs with LLMs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_smart_pdf_loader-0.5.0.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_smart_pdf_loader-0.5.0.tar.gz.

File metadata

  • Download URL: llama_index_readers_smart_pdf_loader-0.5.0.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_smart_pdf_loader-0.5.0.tar.gz
Algorithm Hash digest
SHA256 620e4aedee063779e80461fb6c75cf76284eddec5f218049904d5c4fe65b81d7
MD5 d1f45a1107db13ff7d613dea16b37b78
BLAKE2b-256 3430fac2a8b2887c6fd04e1a2272dec4ed4c0cd84b23aeb96d3c711b14e13827

See more details on using hashes here.

File details

Details for the file llama_index_readers_smart_pdf_loader-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_smart_pdf_loader-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_smart_pdf_loader-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97b4d2a2c6f016cf04fb4a2440804851405a6ce53668592fd1cb175aace5c905
MD5 3982f7dcf04158582b32be99c63a18fc
BLAKE2b-256 644fef584194c66777991bd9053826d12cf3592186e121c7282881d9b8462c4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page