Skip to main content

PyMuPDF Layout turns PDFs into structured data 10× faster than vision-based tools using AI trained on PDF internals, not images. CPU-only. No GPU required.

Project description

PyMuPDF Layout

PyMuPDF Layout is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models.

While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10× the speed utilizing CPU-only resources.

License PolyForm Noncommercial Python version Docs Discord

Features

  • 📚 Structured data extraction from your documents in Markdown, JSON or TXT format
  • 🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling
  • 🔍 Detect and isolate header and footer patterns on each page

Usage

PyMuPDF Layout works alongside PyMuDF4LLM's to_markdown method. Once PyMuPDF Layout is activated just use to_markdown and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results.

You can also get a JSON or TXT format of the data with to_json or to_text.

Extract Structured data

import pymupdf.layout
import pymupdf4llm
source = "your.pdf"
doc = pymupdf.open(source)
md = pymupdf4llm.to_markdown(doc)
json = pymupdf4llm.to_json(doc)
txt = pymupdf4llm.to_text(doc)

Try It!

Try PyMuPDF Layout on our PyMuPDF website.

Documentation

See the PyMuPDF Layout documentation page for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymupdf_layout-1.27.2.3-cp310-abi3-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.10+Windows x86-64

pymupdf_layout-1.27.2.3-cp310-abi3-manylinux_2_28_x86_64.whl (15.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

pymupdf_layout-1.27.2.3-cp310-abi3-manylinux_2_28_aarch64.whl (15.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

pymupdf_layout-1.27.2.3-cp310-abi3-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

pymupdf_layout-1.27.2.3-cp310-abi3-macosx_10_9_x86_64.whl (15.8 MB view details)

Uploaded CPython 3.10+macOS 10.9+ x86-64

File details

Details for the file pymupdf_layout-1.27.2.3-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.3-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 503b64d9b6b31ea3af79ef85cf7d36950c5048af468cb297684d2953553c62ad
MD5 85d59943ea04a529335b416ed9fe05e1
BLAKE2b-256 bf613b2417d8f2cdfaa0f4749cd9dafa3379cb5cdaddf4233165f1ff81953c30

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2.3-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.3-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 303b9414216dfaf711ec7d807b6f1e4c3e0a92bbb4569340fcedd9d5593d16ca
MD5 6cb6a0e03c3dbf33d69caa15b13820f6
BLAKE2b-256 32e97ce6eaf97cebd46c3808593282e9eb99a60cddd6183e25a636980d5c7986

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2.3-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.3-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 fcf03aa815cbceebdb3263dd6a190de4547c46b1d168928836ec38738afe127d
MD5 de102a92d00b22394dff23b7eb637c0c
BLAKE2b-256 8487bfdcca67346052943a4549814f2009b38f4d15ec025798cdf7dfa5f57c84

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5656b09669dcd7c51f539afb6fdaf853602bab4cbc20479ee5ee1a85a4e32b60
MD5 a68c965beeaafeb6edc4fc9c87dba315
BLAKE2b-256 0aba46a7a36474722f9280d885f6eec878561a257d9378e52590b43d32ffb96c

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2.3-cp310-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2.3-cp310-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 75c2ab3c0e8830ac2bc50cfd32d375a30768a2610dac72a02f08265336e0834f
MD5 6cadc2cff5eddfe43e20a7a938e81898
BLAKE2b-256 bcee067726c3ee5574ad5c605d00d7419e264ef509d626a726f99388111f8216

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page