A high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Project description
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Community
Join us on Discord here: #pymupdf
Installation
PyMuPDF requires Python 3.9 or later, install using pip with:
pip install PyMuPDF
There are no mandatory external dependencies. However, some optional features become available only if additional packages are installed.
You can also try without installing by visiting PyMuPDF.io.
Usage
Basic usage is as follows:
import pymupdf # imports the pymupdf library
doc = pymupdf.open("example.pdf") # open a document
for page in doc: # iterate the document pages
text = page.get_text() # get plain text encoded as UTF-8
Documentation
Full documentation can be found on pymupdf.readthedocs.io.
Optional Features
- fontTools for creating font subsets.
- pymupdf-fonts contains some nice fonts for your text output.
- Tesseract-OCR for optical character recognition in images and document pages.
About
PyMuPDF adds Python bindings and abstractions to MuPDF, a lightweight PDF, XPS, and eBook viewer, renderer, and toolkit. Both PyMuPDF and MuPDF are maintained and developed by Artifex Software, Inc.
PyMuPDF was originally written by Jorj X. McKie.
License and Copyright
PyMuPDF is available under open-source AGPL and commercial license agreements. If you determine you cannot meet the requirements of the AGPL, please contact Artifex for more information regarding a commercial license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for PyMuPDF-1.24.13-cp39-abi3-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec17914e4a560f4070212a2e84db5cc8b561d85d1ead193605a22f9561b03148 |
|
MD5 | a3539e160fe804a10e05f9d85d8ec005 |
|
BLAKE2b-256 | 3880f8d8ae555b237574005faef8a181a5c6a1d983e16a982b65ccc56a42faa2 |
Hashes for PyMuPDF-1.24.13-cp39-abi3-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab22828d4fc205791ef1332a64893cbfc38cd9c331c5f46ae4537372ffee6fc1 |
|
MD5 | cd0adf41fc441fc31f91d27a83622917 |
|
BLAKE2b-256 | 07a42e545217436e7717642809c7392bd7d7156ba102e7a47acb22659bfd41de |
Hashes for PyMuPDF-1.24.13-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4520558580ac6b5a7164fda29fbc14e39d3114fd803420721500edbf47d04872 |
|
MD5 | 21d950e7a3899f5fc1d04ace42969442 |
|
BLAKE2b-256 | 6d225aa9e01747518878a54866b4d925abdc663c64c75f5fbc6a9706957a7a30 |
Hashes for PyMuPDF-1.24.13-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c830610e4fde237fcf0532f1f8c1381453f48c164a5eadd0c6e5fd0bea1ca8e3 |
|
MD5 | 5aeec8625b4e15e63d00d52be5bcbdb5 |
|
BLAKE2b-256 | 8548e4630eb58f4daed22a078e19db8a709d407d2e19316089675f6ed185f01a |
Hashes for PyMuPDF-1.24.13-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4c8808e62afbbde0f7b9c4151c4b1a5735911c2d39c34332860df600dba76f8 |
|
MD5 | d678b2a5aa456c060c2d059b7bef7559 |
|
BLAKE2b-256 | 5b5f916bb534fd498d069d68c7a52289ba78d27823c2d6f8c693889e288e31e4 |
Hashes for PyMuPDF-1.24.13-cp39-abi3-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 240d5c43daa9278db50d609162b48f673ab256d7e5c73eea67af517c1fc2d47c |
|
MD5 | 9836c6f11186a35d458434e0cb417752 |
|
BLAKE2b-256 | eafeff2bb633c0934ba43c36184b8ed025092e946994dc6b4c764a0079f0ab3c |
Hashes for PyMuPDF-1.24.13-cp39-abi3-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c11bb9198af69d490b4b346421db827d875a28fbc760d239e691d4b3ed12b5ad |
|
MD5 | a19f0db90c628c425ca39c49c3ee3e23 |
|
BLAKE2b-256 | ce798d31a98ebeb329000406d6c36fb2ad42264d5a4a6915ebabbde332642204 |