Skip to main content

llama-index readers marker integration

Project description

LlamaIndex Readers Integration: Pdf-Marker

Uses the pdf-marker library to extract the content of a PDF file.

From the original README:

Marker converts PDF to markdown quickly and accurately.

  • Supports a wide range of documents (optimized for books and scientific papers)
  • Supports all languages
  • Removes headers/footers/other artifacts
  • Formats tables and code blocks
  • Extracts and saves images along with the markdown
  • Converts most equations to latex
  • Works on GPU, CPU, or MPS

Usage

Here's an example usage of the PDFMarkerReader.

from llama_index.readers.pdf_marker import PDFMarkerReader
from pathlib import Path

path = Path("/path/to/pdf")
reader = PDFMarkerReader()
reader.load_data(path)

License

The marker-pdf library is licensed under the GPL-3.0 license (see https://github.com/VikParuchuri/marker), meaning that you may copy, distribute and modify the software as long as you track changes/dates in source files. Any modifications to or software including (via compiler) GPL-licensed code must also be made available under the GPL along with build & install instructions.

There is also commercial usage limitations (see https://github.com/VikParuchuri/marker?tab=readme-ov-file#commercial-usage).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_pdf_marker-0.4.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_pdf_marker-0.4.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_pdf_marker-0.4.1.tar.gz
Algorithm Hash digest
SHA256 a640cbbb8db2ef17e767fdbc49c78710bfd35d8985323b06f79f8f305e60b821
MD5 8a25d13eb7d386ecb5141b2c4cd5e17e
BLAKE2b-256 b373bf3943b9cc8f5dfb15b085caed79d37d63f7671c80c4b646c4d91f815ad0

See more details on using hashes here.

File details

Details for the file llama_index_readers_pdf_marker-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_pdf_marker-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf89e1f5ce254ffb848adc77addc968952dac5b6da361af07969e73f015f65e2
MD5 c196a850af47244a4ce0c2273ad40381
BLAKE2b-256 615293859fa98fc346c118e16d4a805c405d9f1cc6ca719c0c8c18ba96114f17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page