Skip to main content

PyMuPDF Layout turns PDFs into structured data 10× faster than vision-based tools using AI trained on PDF internals, not images. CPU-only. No GPU required.

Project description

PyMuPDF Layout

PyMuPDF Layout is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models.

While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10× the speed utilizing CPU-only resources.

License PolyForm Noncommercial Python version Docs Discord

Features

  • 📚 Structured data extraction from your documents in Markdown, JSON or TXT format
  • 🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling
  • 🔍 Detect and isolate header and footer patterns on each page

Usage

PyMuPDF Layout works alongside PyMuDF4LLM's to_markdown method. Once PyMuPDF Layout is activated just use to_markdown and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results.

You can also get a JSON or TXT format of the data with to_json or to_text.

Extract Structured data

import pymupdf.layout
import pymupdf4llm
source = "your.pdf"
doc = pymupdf.open(source)
md = pymupdf4llm.to_markdown(doc)
json = pymupdf4llm.to_json(doc)
txt = pymupdf4llm.to_text(doc)

Try It!

Try PyMuPDF Layout on our PyMuPDF website.

Documentation

See the PyMuPDF Layout documentation page for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymupdf_layout-1.27.2.2-cp310-abi3-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.10+Windows x86-64

pymupdf_layout-1.27.2.2-cp310-abi3-manylinux_2_28_x86_64.whl (15.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

pymupdf_layout-1.27.2.2-cp310-abi3-manylinux_2_28_aarch64.whl (15.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

pymupdf_layout-1.27.2.2-cp310-abi3-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

pymupdf_layout-1.27.2.2-cp310-abi3-macosx_10_9_x86_64.whl (15.8 MB view details)

Uploaded CPython 3.10+macOS 10.9+ x86-64

File details

Details for the file pymupdf_layout-1.27.2.2-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.2-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 efc66387833f085b9e9a77089c748c88c4c96485772d7dfe0139eaa6efc2f444
MD5 2fe6ace56ff49c1b4181235200887127
BLAKE2b-256 825697fad0cd00869e934f7a130f251b21e3534ec0fcffaa3459286fbf3daf32

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2.2-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.2-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 df503eab9c28cfaadb847970f39093958e7a2ebf79fc47426dbd91b9f9064d6c
MD5 c287d4527d1167d86bcd8effe3a11f3f
BLAKE2b-256 024535c67a1b1956618f69674b9823cc78e96787de37fe22a2b217581a1770a9

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2.2-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.2-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d610359e1eb8013124531431f3b8c77818070e7869500b92c9b25bd78ea7ef7f
MD5 287034643b65559d325c32dbf54ece06
BLAKE2b-256 f720487a2b1422999113ecc8b117cf50e72915992d0a7ef247164989396cf8db

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2.2-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.2-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bef82a3ff5c05212c806333153cece2b9d972eed173d2352f0c514bb3f1faf54
MD5 f76e18d07dbf36600091fd20cdc75fd3
BLAKE2b-256 ce143ed13138449a002ab6957789019da5951fc8ba07ab8f1faf27a14c274717

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2.2-cp310-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.2-cp310-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7b8f0d94d5675802c67e4af321214dcfce2de3d963926459dc6fc138607366cd
MD5 4bd48d0254d799e95932d71a07a21a69
BLAKE2b-256 65dd4a9769b17661c1ee1b5bdeac28c832c9c7cc1ef425eb2088b5b5bd982bcc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page