Commercial extension for PyMuPDF
Project description
PyMuPDF Layout
PyMuPDF Layout is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models.
While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10× the speed utilizing CPU-only resources.
Features
- 📚 Structured data extraction from your documents in Markdown, JSON or TXT format
- 🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling
- 🔍 Detect and isolate header and footer patterns on each page
Usage
PyMuPDF Layout works alongside PyMuDF4LLM's to_markdown method. Once PyMuPDF Layout is activated just use to_markdown and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results.
You can also get a JSON or TXT format of the data with to_json or to_text.
Extract Structured data
import pymupdf.layout
pymupdf.layout.activate()
import pymupdf4llm
doc = pymupdf.open(source)
md = pymupdf4llm.to_markdown(doc)
json = pymupdf4llm.to_json(doc)
txt = pymupdf4llm.to_text(doc)
Try It!
Try PyMuPDF Layout on our PyMuPDF website.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymupdf_layout-1.26.6-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: pymupdf_layout-1.26.6-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 12.7 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2305aac24fd6e12217afaaea8ec95be297be9b250b6077a3f4e92f7f9beeaf92
|
|
| MD5 |
630da1cf43357e4f26bac209638d00a8
|
|
| BLAKE2b-256 |
f87a69078bf16669f8361360321ea6bede4cbfede35bf3f4ca5842a7c2387825
|
File details
Details for the file pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 12.7 MB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee8e2bfed12d4b6421b27a1f89837ac09d8bc3f783f79670db397ec24614bf3d
|
|
| MD5 |
a9a5ca2c64e9adc40434fd4658f02fc1
|
|
| BLAKE2b-256 |
a7bd3e049b359dd0c3a101ae915484b87ff73bfdedfb24a924e0a8e6783b33f3
|
File details
Details for the file pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 12.7 MB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0561b9485a6ac1a40bb1e2ec7a1648aa64e4be56dab2f39182b11a69e3e43024
|
|
| MD5 |
f44eb6c413feb22aa0aeda62a1203eb4
|
|
| BLAKE2b-256 |
ae49ad1a5edccc45477493d6a53a41df7620d6147febb897c3dd8354f413e154
|
File details
Details for the file pymupdf_layout-1.26.6-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: pymupdf_layout-1.26.6-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 12.7 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1d45f72ec08ef7f644928487e7a067df6df63172d682d0bb05158896d0d9c71
|
|
| MD5 |
37972f94fca14e3686cc4dfcba8e2457
|
|
| BLAKE2b-256 |
ffd30e52d7d1e2f975843f5354ac3b210a98471b690105efc332d3c285bd794b
|
File details
Details for the file pymupdf_layout-1.26.6-cp310-abi3-macosx_10_9_x86_64.whl.
File metadata
- Download URL: pymupdf_layout-1.26.6-cp310-abi3-macosx_10_9_x86_64.whl
- Upload date:
- Size: 12.7 MB
- Tags: CPython 3.10+, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d632f83208db8b24600eb8ac54d3135fab6ab1f251a38fa6061e7470e81b9481
|
|
| MD5 |
2da97547b4249dd71f14ea7018db7ca8
|
|
| BLAKE2b-256 |
708631f8d05b36ebf43cca88d5c6415de46eb748e487b618a589671a610be8c8
|