Skip to main content

llama-index readers marker integration

Project description

LlamaIndex Readers Integration: Pdf-Marker

Uses the pdf-marker library to extract the content of a PDF file.

From the original README:

Marker converts PDF to markdown quickly and accurately.

  • Supports a wide range of documents (optimized for books and scientific papers)
  • Supports all languages
  • Removes headers/footers/other artifacts
  • Formats tables and code blocks
  • Extracts and saves images along with the markdown
  • Converts most equations to latex
  • Works on GPU, CPU, or MPS

Usage

Here's an example usage of the PDFMarkerReader.

from llama_index.readers.pdf_marker import PDFMarkerReader
from pathlib import Path

path = Path("/path/to/pdf")
reader = PDFMarkerReader()
reader.load_data(path)

License

The marker-pdf library is licensed under the GPL-3.0 license (see https://github.com/VikParuchuri/marker), meaning that you may copy, distribute and modify the software as long as you track changes/dates in source files. Any modifications to or software including (via compiler) GPL-licensed code must also be made available under the GPL along with build & install instructions.

There is also commercial usage limitations (see https://github.com/VikParuchuri/marker?tab=readme-ov-file#commercial-usage).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_pdf_marker-0.3.0.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file llama_index_readers_pdf_marker-0.3.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_pdf_marker-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e636191fbab0174dc67d101dc91fad4139e32c1aaf419c7b8b0e5d812e864f43
MD5 4bfd02bc1d2fdb10642534cbdcf3f3bd
BLAKE2b-256 945c008250ba8229850e1fbc78d49d9bdc44df66cce099e5f82d622a3e0ab61f

See more details on using hashes here.

File details

Details for the file llama_index_readers_pdf_marker-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_pdf_marker-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 464c63ebfd0d94a5bbb60e8cacba387c8d2f22615ad5219a9f5f4dec467758bd
MD5 30f1d5b56fdafea5a7d7772850796093
BLAKE2b-256 a30522cbbc64824a5eae6a5ad91d5bfc12026a57d5a4a48200f34573f701b8ea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page