Skip to main content

PyMuPDF Layout turns PDFs into structured data 10× faster than vision-based tools using AI trained on PDF internals, not images. CPU-only. No GPU required.

Project description

PyMuPDF Layout

PyMuPDF Layout is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models.

While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10× the speed utilizing CPU-only resources.

License PolyForm Noncommercial Python version Docs Discord

Features

  • 📚 Structured data extraction from your documents in Markdown, JSON or TXT format
  • 🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling
  • 🔍 Detect and isolate header and footer patterns on each page

Usage

PyMuPDF Layout works alongside PyMuDF4LLM's to_markdown method. Once PyMuPDF Layout is activated just use to_markdown and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results.

You can also get a JSON or TXT format of the data with to_json or to_text.

Extract Structured data

import pymupdf.layout
import pymupdf4llm
source = "your.pdf"
doc = pymupdf.open(source)
md = pymupdf4llm.to_markdown(doc)
json = pymupdf4llm.to_json(doc)
txt = pymupdf4llm.to_text(doc)

Try It!

Try PyMuPDF Layout on our PyMuPDF website.

Documentation

See the PyMuPDF Layout documentation page for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymupdf_layout-1.27.2-cp310-abi3-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.10+Windows x86-64

pymupdf_layout-1.27.2-cp310-abi3-manylinux_2_28_x86_64.whl (15.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

pymupdf_layout-1.27.2-cp310-abi3-manylinux_2_28_aarch64.whl (15.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

pymupdf_layout-1.27.2-cp310-abi3-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

pymupdf_layout-1.27.2-cp310-abi3-macosx_10_9_x86_64.whl (15.8 MB view details)

Uploaded CPython 3.10+macOS 10.9+ x86-64

File details

Details for the file pymupdf_layout-1.27.2-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 57f63186ab7170dfe1b6918c2ad5c241fdab34a1ed27a75b020306fa29aaf0ec
MD5 94335213485e80e0fc4f99227f8e3ffa
BLAKE2b-256 1ab4f9f7293162a7ce4ccd490241e1e1bd6f415b05809e8f75704cca47e6e32e

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 962cd43d70b1816df1ac10405bfb3a85545449253f0813a30198829cae64e053
MD5 14fb4202d56e4a65d743e6258a375ac2
BLAKE2b-256 be2de0545dcbe65d0e762004c4266f3f272a508a414a9359f452099bf3283dd1

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 96767de7f05695d771171ed39077363a280bae6e9d5de70e484a888ae44e3cab
MD5 39aca6a898e0422528b98153916f57ad
BLAKE2b-256 2ef49a9a984f172979ee850ccade9d20ad8a9ce0194a517e8b07e69c57c73f7c

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 447c2f20524a1cca26dbb30611521758fe464ced7fdedb441c884bf40564e92c
MD5 88cea889cf0c19cbf848abbc94f7bb03
BLAKE2b-256 dbb7436d0011e0d7e5a94ef23f3ac1f5198f0584f3dd1a29fb1deae2647bd01e

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.27.2-cp310-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.27.2-cp310-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bee613779fd1053979b7bfbfb9d2a8937066002fe6981fe9290c5f806d27e44e
MD5 eff853a2ad511d7ffb210561cd61c3f1
BLAKE2b-256 122ec0d80ff460babfb91ae708ff2d75a9bbf074322ce4f4ef1afcd911c2f209

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page