llama-index readers marker integration
Project description
LlamaIndex Readers Integration: Pdf-Marker
Uses the pdf-marker library to extract the content of a PDF file.
From the original README:
Marker converts PDF to markdown quickly and accurately.
- Supports a wide range of documents (optimized for books and scientific papers)
- Supports all languages
- Removes headers/footers/other artifacts
- Formats tables and code blocks
- Extracts and saves images along with the markdown
- Converts most equations to latex
- Works on GPU, CPU, or MPS
Usage
Here's an example usage of the PDFMarkerReader.
from llama_index.readers.pdf_marker import PDFMarkerReader
from pathlib import Path
path = Path("/path/to/pdf")
reader = PDFMarkerReader()
reader.load_data(path)
License
The marker-pdf library is licensed under the GPL-3.0 license (see https://github.com/VikParuchuri/marker), meaning that you may copy, distribute and modify the software as long as you track changes/dates in source files. Any modifications to or software including (via compiler) GPL-licensed code must also be made available under the GPL along with build & install instructions.
There is also commercial usage limitations (see https://github.com/VikParuchuri/marker?tab=readme-ov-file#commercial-usage).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llama_index_readers_pdf_marker-0.3.0.tar.gz
.
File metadata
- Download URL: llama_index_readers_pdf_marker-0.3.0.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e636191fbab0174dc67d101dc91fad4139e32c1aaf419c7b8b0e5d812e864f43 |
|
MD5 | 4bfd02bc1d2fdb10642534cbdcf3f3bd |
|
BLAKE2b-256 | 945c008250ba8229850e1fbc78d49d9bdc44df66cce099e5f82d622a3e0ab61f |
File details
Details for the file llama_index_readers_pdf_marker-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: llama_index_readers_pdf_marker-0.3.0-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 464c63ebfd0d94a5bbb60e8cacba387c8d2f22615ad5219a9f5f4dec467758bd |
|
MD5 | 30f1d5b56fdafea5a7d7772850796093 |
|
BLAKE2b-256 | a30522cbbc64824a5eae6a5ad91d5bfc12026a57d5a4a48200f34573f701b8ea |