Skip to main content

Commercial extension for PyMuPDF

Project description

PyMuPDF Layout

PyMuPDF Layout is a fast and lightweight layout analysis Python package integrated with PyMuPDF for clean, structured data output from PDF. It's fast, accurate and doesn't need GPUs like vision-based models.

While other tools train machine learning models on rendered page images, PyMuPDF Layout trains Graph Neural Networks directly on PDF internals. This gives us accuracy at 10× the speed utilizing CPU-only resources.

License PolyForm Noncommercial Python version

Features

  • 📚 Structured data extraction from your documents in Markdown, JSON or TXT format
  • 🧐 Advanced document page layout understanding, including semantic markup for titles, headings, headers, footers, tables, images and text styling
  • 🔍 Detect and isolate header and footer patterns on each page

Usage

PyMuPDF Layout works alongside PyMuDF4LLM's to_markdown method. Once PyMuPDF Layout is activated just use to_markdown and PyMuPDF Layout will work behind the scenes to analyse documents and deliver improved results.

You can also get a JSON or TXT format of the data with to_json or to_text.

Extract Structured data

import pymupdf.layout
pymupdf.layout.activate()
import pymupdf4llm
doc = pymupdf.open(source)
md = pymupdf4llm.to_markdown(doc)
json = pymupdf4llm.to_json(doc)
txt = pymupdf4llm.to_text(doc)

Try It!

Try PyMuPDF Layout on our PyMuPDF website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymupdf_layout-1.26.6-cp310-abi3-win_amd64.whl (12.7 MB view details)

Uploaded CPython 3.10+Windows x86-64

pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_x86_64.whl (12.7 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_aarch64.whl (12.7 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

pymupdf_layout-1.26.6-cp310-abi3-macosx_11_0_arm64.whl (12.7 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

pymupdf_layout-1.26.6-cp310-abi3-macosx_10_9_x86_64.whl (12.7 MB view details)

Uploaded CPython 3.10+macOS 10.9+ x86-64

File details

Details for the file pymupdf_layout-1.26.6-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.26.6-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2305aac24fd6e12217afaaea8ec95be297be9b250b6077a3f4e92f7f9beeaf92
MD5 630da1cf43357e4f26bac209638d00a8
BLAKE2b-256 f87a69078bf16669f8361360321ea6bede4cbfede35bf3f4ca5842a7c2387825

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ee8e2bfed12d4b6421b27a1f89837ac09d8bc3f783f79670db397ec24614bf3d
MD5 a9a5ca2c64e9adc40434fd4658f02fc1
BLAKE2b-256 a7bd3e049b359dd0c3a101ae915484b87ff73bfdedfb24a924e0a8e6783b33f3

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.26.6-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0561b9485a6ac1a40bb1e2ec7a1648aa64e4be56dab2f39182b11a69e3e43024
MD5 f44eb6c413feb22aa0aeda62a1203eb4
BLAKE2b-256 ae49ad1a5edccc45477493d6a53a41df7620d6147febb897c3dd8354f413e154

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.26.6-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.26.6-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f1d45f72ec08ef7f644928487e7a067df6df63172d682d0bb05158896d0d9c71
MD5 37972f94fca14e3686cc4dfcba8e2457
BLAKE2b-256 ffd30e52d7d1e2f975843f5354ac3b210a98471b690105efc332d3c285bd794b

See more details on using hashes here.

File details

Details for the file pymupdf_layout-1.26.6-cp310-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for pymupdf_layout-1.26.6-cp310-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d632f83208db8b24600eb8ac54d3135fab6ab1f251a38fa6061e7470e81b9481
MD5 2da97547b4249dd71f14ea7018db7ca8
BLAKE2b-256 708631f8d05b36ebf43cca88d5c6415de46eb748e487b618a589671a610be8c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page