llama-index readers marker integration
Project description
LlamaIndex Readers Integration: Pdf-Marker
Uses the pdf-marker library to extract the content of a PDF file.
From the original README:
Marker converts PDF to markdown quickly and accurately.
- Supports a wide range of documents (optimized for books and scientific papers)
- Supports all languages
- Removes headers/footers/other artifacts
- Formats tables and code blocks
- Extracts and saves images along with the markdown
- Converts most equations to latex
- Works on GPU, CPU, or MPS
Usage
Here's an example usage of the PDFMarkerReader.
from llama_index.readers.pdf_marker import PDFMarkerReader
from pathlib import Path
path = Path("/path/to/pdf")
reader = PDFMarkerReader()
reader.load_data(path)
License
The marker-pdf library is licensed under the GPL-3.0 license (see https://github.com/VikParuchuri/marker), meaning that you may copy, distribute and modify the software as long as you track changes/dates in source files. Any modifications to or software including (via compiler) GPL-licensed code must also be made available under the GPL along with build & install instructions.
There is also commercial usage limitations (see https://github.com/VikParuchuri/marker?tab=readme-ov-file#commercial-usage).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llama_index_readers_pdf_marker-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb3a1e7f218d98be9e0ec67a7e7ec37d13ec4a8ddd3b43470337776844c9380d |
|
MD5 | c93dfa5a3964d24f155091771e504a60 |
|
BLAKE2b-256 | ea4e554e78da08db5a056f06226a927575ce22de11ae8f7bc8a8e3c93a846dbc |
Hashes for llama_index_readers_pdf_marker-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5cd33c8e0eb66f5b332f4acae5f227bf7ec5d8d9abb606945e78e08ef7453b2e |
|
MD5 | 8ba1aa78b85903d9c56c596cdcfb5f04 |
|
BLAKE2b-256 | 42319b4b92d9a07af6e69493f4e77c43e76c08158ebbc046f7b900797ad533b8 |