Skip to main content

PyMuPDF Layout turns PDFs into structured data 10× faster than vision-based tools using AI trained on PDF internals, not images. CPU-only. No GPU required.

Project description

PyMuPDF Layout

PyMuPDF Layout is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models.

While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10× the speed utilizing CPU-only resources.

License PolyForm Noncommercial Python version Docs Discord

Features

  • 📚 Structured data extraction from your documents in Markdown, JSON or TXT format
  • 🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling
  • 🔍 Detect and isolate header and footer patterns on each page

Usage

PyMuPDF Layout works alongside PyMuDF4LLM's to_markdown method. Once PyMuPDF Layout is activated just use to_markdown and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results.

You can also get a JSON or TXT format of the data with to_json or to_text.

Extract Structured data

import pymupdf.layout
import pymupdf4llm
source = "your.pdf"
doc = pymupdf.open(source)
md = pymupdf4llm.to_markdown(doc)
json = pymupdf4llm.to_json(doc)
txt = pymupdf4llm.to_text(doc)

Try It!

Try PyMuPDF Layout on our PyMuPDF website.

Documentation

See the PyMuPDF Layout documentation page for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymupdf_layout-1.27.1-cp310-abi3-win_amd64.whl (12.3 MB view details)

Uploaded CPython 3.10+Windows x86-64

pymupdf_layout-1.27.1-cp310-abi3-manylinux_2_28_x86_64.whl (12.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

pymupdf_layout-1.27.1-cp310-abi3-manylinux_2_28_aarch64.whl (12.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

pymupdf_layout-1.27.1-cp310-abi3-macosx_11_0_arm64.whl (12.3 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

pymupdf_layout-1.27.1-cp310-abi3-macosx_10_9_x86_64.whl (12.3 MB view details)

Uploaded CPython 3.10+macOS 10.9+ x86-64

File details

Details for the file pymupdf_layout-1.27.1-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e456bcdc8760caadcaecb57e96277de11f9cdde785b04bf535448df4fb1e25fc
MD5 a96916c4e2f2a445f3a94b5903b1fad4
BLAKE2b-256 0c2cf96920afb5a17226e7475f5db9eb7755a019623984caedcb47cb16660bd2

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.1-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.1-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e9e10d96f06ef5f511523a7caf5b707596a60c5c0a4735f4c5f84728cdb75ffd
MD5 89b4e0a624689286d5e2ce30779cf115
BLAKE2b-256 0db60278408e2f6db993e3c9d4ee7be89f01b0022233298aa20fe54b95a0169c

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.1-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.1-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 05d88e0060b5875d21ee53c1dc50a2454aa32e59048b971847782318f47dca11
MD5 69a1e105dc5eb8c6515f368bc4c5a73d
BLAKE2b-256 66e4d11ebba2e950606713495620503d0e1b627c879c708fc488f7167cfb35a3

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 888923391e155cb6a4e0498709c8f4da750858dc273417a7386aeb8b81009b83
MD5 38fdb21f0efe896b1fdf7b7136e7c415
BLAKE2b-256 b40bb83178bb88cc8933f61dac113c89be9be426fd7c3b9deca3b628e96eb99a

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.1-cp310-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.1-cp310-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e60b7238f7652a825a884641fc1e9070e3b385905891ae194dfe91e5932b2342
MD5 1e6769ab1bf7ac35d4cbdbf7cb498f10
BLAKE2b-256 b7ac574f537fa0199b31755eb0859d8e6ed8fcc669f7b460eba78611d1d78c8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page