An integration package connecting PyMuPDF Layout to LangChain. Load PDF content to Markdown using AI-based, CPU only, layout analysis.

Project description

langchain-pymupdf-layout

An integration package connecting PyMuPDF Layout to LangChain.

Load PDF content to Markdown using AI-based, CPU only, layout analysis.

Features

📚 Structured data extraction from your documents
🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling
🔍 Detect and isolate header and footer patterns on each page

For more detailed information visit the official PyMuPDF Layout documentation webpage.

Requirements

Python 3.11 or higher
LangChain Core v1.0.0 or higher
PyMuPDF v1.26.6 or higher
PyMuPDF4LLM v0.2.0 or higher
PyMuPDF Layout v1.26.6 or higher

Installation

Install the package using pip to start using the Document Loader:

pip install -U langchain-pymupdf-layout

Usage

You can easily integrate and use the PyMuPDF Layout Loader in your Python application for loading and parsing PDFs.

Below is an example of how to set up and utilize this loader:

from langchain_pymupdf_layout import version

print(version())  # Output: version number

from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain_pymupdf_layout import PyMuPDFLayoutLoader

loader = PyMuPDFLayoutLoader(
    file_path="https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf",
    show_progress=False,  
    # See other loader options on https://pymupdf.readthedocs.io/en/latest/pymupdf-layout/index.html#pymupdf-layout-and-parameter-caveats
)

documents = loader.load()

# Chunk
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

chunks = text_splitter.split_documents(documents)

print(f"Loaded {len(documents)} document(s)")
print(f"Created {len(chunks)} chunk(s)")

content = chunks[0].page_content
print(f"\ncontent:\n{content}")

Project details

Release history Release notifications | RSS feed

0.1.3

Nov 24, 2025

0.1.2

Nov 24, 2025

This version

0.1.1

Nov 21, 2025

0.1.0

Nov 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_pymupdf_layout-0.1.1.tar.gz (10.4 kB view details)

Uploaded Nov 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_pymupdf_layout-0.1.1-py3-none-any.whl (11.0 kB view details)

Uploaded Nov 21, 2025 Python 3

File details

Details for the file langchain_pymupdf_layout-0.1.1.tar.gz.

File metadata

Download URL: langchain_pymupdf_layout-0.1.1.tar.gz
Upload date: Nov 21, 2025
Size: 10.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.7

File hashes

Hashes for langchain_pymupdf_layout-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`9d9fca680479f3e454fc80e01461435d3f164c2cacd43f8a6914fcf21ea66633`
MD5	`e4bdc17dd178def9f7b0c97ff2d2ccf1`
BLAKE2b-256	`fb22d9fcba02d2939131bb7826f10a75b25a94093f43e802e0ba7d3bd97145a1`

See more details on using hashes here.

File details

Details for the file langchain_pymupdf_layout-0.1.1-py3-none-any.whl.

File metadata

Download URL: langchain_pymupdf_layout-0.1.1-py3-none-any.whl
Upload date: Nov 21, 2025
Size: 11.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.7

File hashes

Hashes for langchain_pymupdf_layout-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f2e1c81945772cee63264204020a103238c233fb729bd1ba4da869b1dda939a8`
MD5	`e091a2aa0ddd1c651cff10e754da6929`
BLAKE2b-256	`a6c21fe28e29824a2818ed81988ac167935eee2c587fd531a7ee3e841111f93f`

See more details on using hashes here.

langchain-pymupdf-layout 0.1.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Project description

langchain-pymupdf-layout

Features

Requirements

Installation

Usage

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes