Skip to main content

PyMuPDF Layout turns PDFs into structured data 10× faster than vision-based tools using AI trained on PDF internals, not images. CPU-only. No GPU required.

Project description

PyMuPDF Layout

PyMuPDF Layout is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models.

While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10× the speed utilizing CPU-only resources.

License PolyForm Noncommercial Python version Docs Discord

Features

  • 📚 Structured data extraction from your documents in Markdown, JSON or TXT format
  • 🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling
  • 🔍 Detect and isolate header and footer patterns on each page

Usage

PyMuPDF Layout works alongside PyMuDF4LLM's to_markdown method. Once PyMuPDF Layout is activated just use to_markdown and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results.

You can also get a JSON or TXT format of the data with to_json or to_text.

Extract Structured data

import pymupdf.layout
import pymupdf4llm
source = "your.pdf"
doc = pymupdf.open(source)
md = pymupdf4llm.to_markdown(doc)
json = pymupdf4llm.to_json(doc)
txt = pymupdf4llm.to_text(doc)

Try It!

Try PyMuPDF Layout on our PyMuPDF website.

Documentation

See the PyMuPDF Layout documentation page for more.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymupdf_layout-1.28.0-cp310-abi3-win_amd64.whl (41.6 MB view details)

Uploaded CPython 3.10+Windows x86-64

pymupdf_layout-1.28.0-cp310-abi3-manylinux_2_28_x86_64.whl (41.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

pymupdf_layout-1.28.0-cp310-abi3-manylinux_2_28_aarch64.whl (41.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

pymupdf_layout-1.28.0-cp310-abi3-macosx_11_0_arm64.whl (41.6 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

pymupdf_layout-1.28.0-cp310-abi3-macosx_10_9_x86_64.whl (41.6 MB view details)

Uploaded CPython 3.10+macOS 10.9+ x86-64

File details

Details for the file pymupdf_layout-1.28.0-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.28.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 07195ec4ad6317dd70bf1d4eca44d9dd93231d8db3648ab9f4b771557a59ea48
MD5 559799d7a16a8c97167328dacf82f5e6
BLAKE2b-256 b87379af357b1cacb02ea9e054b5b368dee5a0ccd7168c6229e54e1a70e74c53

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.28.0-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.28.0-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a6bd191301570a0863d6e04418324e823d0fd7f18b4fe1db2b9e53e715d2f8ff
MD5 ec0aa2b68ceb939d24a5dbc01b8fa3c5
BLAKE2b-256 563adc5ab8573300b0f1b7fb996aa33ce4683c27b190a8e8be6f18d80714183d

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.28.0-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.28.0-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 35b98e1ce9382709622e03c34b57d1b51a155ceb3d9122f11f8e9b2aa4f1ee55
MD5 388bee799fe7e3d3f172e409f6fba4d1
BLAKE2b-256 0b582607c539540ce261d05d2677c0d839b78a4191d328accc1d0e9385a06799

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.28.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.28.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 911aa80b3ddcca2ddbc1df5fc59f652935b392052251397e75963d3de9559dbc
MD5 a3113d9250c3835a9c15f0dd961ae164
BLAKE2b-256 60149a330a7d77cbe05fa9e97d9f2aa11d627a556eec795eee08bcbeedc16ac6

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.28.0-cp310-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.28.0-cp310-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 dc4e2f4b48951633607020d80de0bed00b2450eafd56ecda224f611c19e5f9e8
MD5 49e7c8c6a3a5e6670f708346d89360bd
BLAKE2b-256 19feeb3c960cbc6e5be5acffbd6811f281ba6e176a841b592858e25e28ac0a97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page