Skip to main content

llama-index readers marker integration

Project description

LlamaIndex Readers Integration: Pdf-Marker

Uses the pdf-marker library to extract the content of a PDF file.

From the original README:

Marker converts PDF to markdown quickly and accurately.

  • Supports a wide range of documents (optimized for books and scientific papers)
  • Supports all languages
  • Removes headers/footers/other artifacts
  • Formats tables and code blocks
  • Extracts and saves images along with the markdown
  • Converts most equations to latex
  • Works on GPU, CPU, or MPS

Usage

Here's an example usage of the PDFMarkerReader.

from llama_index.readers.pdf_marker import PDFMarkerReader
from pathlib import Path

path = Path("/path/to/pdf")
reader = PDFMarkerReader()
reader.load_data(path)

License

The marker-pdf library is licensed under the GPL-3.0 license (see https://github.com/VikParuchuri/marker), meaning that you may copy, distribute and modify the software as long as you track changes/dates in source files. Any modifications to or software including (via compiler) GPL-licensed code must also be made available under the GPL along with build & install instructions.

There is also commercial usage limitations (see https://github.com/VikParuchuri/marker?tab=readme-ov-file#commercial-usage).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_pdf_marker-0.5.0.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_readers_pdf_marker-0.5.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_readers_pdf_marker-0.5.0.tar.gz.

File metadata

  • Download URL: llama_index_readers_pdf_marker-0.5.0.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_pdf_marker-0.5.0.tar.gz
Algorithm Hash digest
SHA256 82f84b8ec8d7d671db0228d208fd5483b8f5d43aa934b5446c781bf25d88945f
MD5 1dd460f22145242179dbe691bf3fe5d3
BLAKE2b-256 22de0bf8d3447da4aeeac30f27b111172a6ceeff63d1b5913970ea6372564fc8

See more details on using hashes here.

File details

Details for the file llama_index_readers_pdf_marker-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_pdf_marker-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_pdf_marker-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8424b18d1b350847a772017e20ddfe34a4c20c55402e8bdca79596375966368d
MD5 36e2feb4626a1d4079c7ade2c20ffa79
BLAKE2b-256 e4688568e75f239c16aa9c707e2273af3b06c3233407bc7ea1f91cede03650a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page