Skip to main content

llama-index readers marker integration

Project description

LlamaIndex Readers Integration: Pdf-Marker

Uses the pdf-marker library to extract the content of a PDF file.

From the original README:

Marker converts PDF to markdown quickly and accurately.

  • Supports a wide range of documents (optimized for books and scientific papers)
  • Supports all languages
  • Removes headers/footers/other artifacts
  • Formats tables and code blocks
  • Extracts and saves images along with the markdown
  • Converts most equations to latex
  • Works on GPU, CPU, or MPS

Usage

Here's an example usage of the PDFMarkerReader.

from llama_index.readers.pdf_marker import PDFMarkerReader
from pathlib import Path

path = Path("/path/to/pdf")
reader = PDFMarkerReader()
reader.load_data(path)

License

The marker-pdf library is licensed under the GPL-3.0 license (see https://github.com/VikParuchuri/marker), meaning that you may copy, distribute and modify the software as long as you track changes/dates in source files. Any modifications to or software including (via compiler) GPL-licensed code must also be made available under the GPL along with build & install instructions.

There is also commercial usage limitations (see https://github.com/VikParuchuri/marker?tab=readme-ov-file#commercial-usage).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_pdf_marker-0.1.0.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file llama_index_readers_pdf_marker-0.1.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_pdf_marker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eb3a1e7f218d98be9e0ec67a7e7ec37d13ec4a8ddd3b43470337776844c9380d
MD5 c93dfa5a3964d24f155091771e504a60
BLAKE2b-256 ea4e554e78da08db5a056f06226a927575ce22de11ae8f7bc8a8e3c93a846dbc

See more details on using hashes here.

File details

Details for the file llama_index_readers_pdf_marker-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_pdf_marker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5cd33c8e0eb66f5b332f4acae5f227bf7ec5d8d9abb606945e78e08ef7453b2e
MD5 8ba1aa78b85903d9c56c596cdcfb5f04
BLAKE2b-256 42319b4b92d9a07af6e69493f4e77c43e76c08158ebbc046f7b900797ad533b8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page