llama-index readers pdfmux integration

These details have not been verified by PyPI

Project description

LlamaIndex Readers Integration: pdfmux

pdfmux extracts PDFs into section-aware, LLM-ready chunks. It scores each page's extraction confidence and routes low-confidence or scanned pages to an OCR fallback automatically, so mixed digital/scanned documents come back as clean text without manual pre-processing. Each chunk is returned as a LlamaIndex Document with title, page range, token estimate, and confidence metadata.

Installation

pip install llama-index-readers-pdfmux

Usage

from llama_index.readers.pdfmux import PDFMuxReader

reader = PDFMuxReader(quality="standard")  # "fast" | "standard" | "high"
documents = reader.load_data("report.pdf")

for doc in documents:
    print(doc.metadata["title"], "—", doc.metadata["tokens"], "tokens")

Each returned Document carries metadata: source, title, page_start, page_end, tokens, confidence. Pass extra_info={...} to load_data to merge additional metadata into every document.

This loader is designed for RAG/LLM pipelines that ingest a mix of scanned and digital PDFs and need per-page confidence to decide what to trust.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Jun 11, 2026

0.1.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_pdfmux-0.1.1.tar.gz (3.6 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_index_readers_pdfmux-0.1.1-py3-none-any.whl (3.9 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file llama_index_readers_pdfmux-0.1.1.tar.gz.

File metadata

Download URL: llama_index_readers_pdfmux-0.1.1.tar.gz
Upload date: Jun 11, 2026
Size: 3.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for llama_index_readers_pdfmux-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`879244c431339cc72a0cb08ff7bb5c0f629257060153901fcd1d389ca906435c`
MD5	`defbeab079ad47efcf32e06b62af922e`
BLAKE2b-256	`4861f7459fed14ed22363ccd16ca61d58e711c535457dbe508e2e8e776c76b45`

See more details on using hashes here.

File details

Details for the file llama_index_readers_pdfmux-0.1.1-py3-none-any.whl.

File metadata

Download URL: llama_index_readers_pdfmux-0.1.1-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 3.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for llama_index_readers_pdfmux-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`333e192a855fc17efc783f5beb918b07cd3cfe36c030a61ff6fbd97198af51d1`
MD5	`6087496bc33c1607ae1673d86982efdd`
BLAKE2b-256	`9ad4700eeb78d2f29b3a0cbe15d43eae4bd11d7919ae5c6342068a8036a82469`

See more details on using hashes here.

llama-index-readers-pdfmux 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LlamaIndex Readers Integration: pdfmux

Installation

Usage

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes