Skip to main content

Python bindings to PDFium

Project description

PyPDFium2

PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

python3 -m pip install -U pypdfium2

Manual installation

The following steps require the external tools git, ctypesgen and gcc to be installed and available in PATH. Additionally, the python package wheel is required.

For source build, more dependencies may be necessary (see DEPS.txt).

Package locally

This will download a pre-built binary for PDFium, generate the bindings and build a wheel.

python3 update.py -p ${platform_name}
python3 setup_${platform_name}.py bdist_wheel
python3 -m pip install -U dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Source build

If you are using a platform where no pre-compiled package is available, it might be possible to build PDFium from source. However, this is a complex process that can vary depending on the host system, and it may take a long time.

python3 build_pdfium.py
python3 setup_source.py bdist_wheel
pip3 install dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 -i your_file.pdf -o your_output_dir/ --scale 1 --rotation 0 --optimise-mode none

If you want to render multiple files at once, a bash for-loop may be suitable:

for file in ./*.pdf; do echo "$file" && pypdfium2 -i "$file" -o your_output_dir/ --scale 2; done

Dump the table of contents of a PDF:

pypdfium2 --show-toc -i your_file.pdf

To obtain a full list of possible command-line parameters, run

pypdfium2 --help

CLI documentation: https://pypdfium2.readthedocs.io/en/latest/cli.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF by function:

pdf = pdfium.open_pdf(filename)
# ... work with the PDF
pdfium.close_pdf(pdf)

Open a PDF by context manager:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the PDF

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = 0xFFFFFFFF,
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf(filename):
    image.save(f'out_{suffix}.png')
    image.close()

Read the table of contents:

with pdfium.PdfContext(filename) as pdf:
    toc = pdfium.get_toc(pdf)
    pdfium.print_toc(toc)

Support model documentation: https://pypdfium2.readthedocs.io/en/latest/support_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page   = pdfium.FPDF_LoadPage(doc, 0)
pdfium.FORM_OnAfterLoadPage(page, form_fill)

width  = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

Documentation for the PDFium API is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and PyPDFium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Documentation and examples are CC-BY-4.0.

Various other BSD- and MIT-style licenses apply to the dependencies of PDFium.

License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt, which is also shipped with binary re-distributions.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Currently supported architectures:

  • macOS x86_64 *
  • macOS arm64 *
  • Linux x86_64
  • Linux aarch64 (64-bit ARM) *
  • Linux armv7l (32-bit ARM hard-float, e. g. Raspberry Pi 2)
  • Windows 64bit
  • Windows 32bit *

* Not tested yet

If you have access to a theoretically supported but untested system, please report success or failure on the issues panel.

(In case bblanchon/pdfium-binaries would add support for more architectures, PyPDFium2 could be adapted easily.)

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs.

PyPDFium2 contains scripts to automate the release process:

  • To build wheels for all platforms, run ./release.sh. This will download binaries and header files, write finished Python wheels to dist/, and run check-wheel-contents.
  • To clean up after a release, run ./clean.sh. This will remove downloaded files and build artifacts.

Testing

Run pytest -sv on the tests directory.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging most likely need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to build configuration should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Non-ascii file paths on Windows

On Windows, PDFium currently is not able to open documents with file names containing multi-byte, non-ascii characters. This issue is confirmed upstream, but has not been addressed yet.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

PyPDFium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, PyPDFium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

This version

0.7.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-0.7.0.tar.gz (283.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-0.7.0-py3-none-win_arm64.whl (2.4 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-0.7.0-py3-none-win_amd64.whl (2.5 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-0.7.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-0.7.0-py3-none-manylinux_2_17_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-0.7.0-py3-none-manylinux_2_17_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-0.7.0-py3-none-manylinux_2_17_aarch64.whl (2.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-0.7.0-py3-none-macosx_10_11_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64

pypdfium2-0.7.0-py3-none-macosx_10_11_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 10.11+ ARM64

File details

Details for the file pypdfium2-0.7.0.tar.gz.

File metadata

  • Download URL: pypdfium2-0.7.0.tar.gz
  • Upload date:
  • Size: 283.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.7.0.tar.gz
Algorithm Hash digest
SHA256 2fa678fc5f830c667b9d0bcb16a2eff04ad7534723e930b0ceb3f2a2c7577ccc
MD5 3480eb2396c27c713e61788a8ba15443
BLAKE2b-256 de95257de6978e04af7b01e6395a1aed295fcf1682ebccb5c492a1311f8a5db8

See more details on using hashes here.

File details

Details for the file pypdfium2-0.7.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.7.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.7.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 040340a66616126637005896dc3a87821e596e5a6ad986b6f97a68f8aae75f7e
MD5 e6d3da50e1a1061d9562bf451a7535f9
BLAKE2b-256 89ead157b066d6c8b760bdf3419459f390dc3f16d58eb2aaf2163592b2d201cc

See more details on using hashes here.

File details

Details for the file pypdfium2-0.7.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-0.7.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.7.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 26045ec68703b077f0e9f0510eba8cdbe27988ecd693e761f9ca2131191dd5a4
MD5 13be017586502625f143286bc9491861
BLAKE2b-256 256ca371626c5244089bba9dc9323f112748c3a0f1e466cc917cfaa905b1bb13

See more details on using hashes here.

File details

Details for the file pypdfium2-0.7.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-0.7.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.7.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 7fb2f45cd3c7da5760966d3405468d81f4e1b870e9a71a04698ad1c87b320016
MD5 96af11f24dfe9256c5d2d1567c05a0d3
BLAKE2b-256 a5bd403830beaa217e38e65f43a63ede1fe3c44085e39bdb428592c19b1c3741

See more details on using hashes here.

File details

Details for the file pypdfium2-0.7.0-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.7.0-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.7.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 b4507941809805b3b2156679c9693bc6a8aae8f73acf10df50da47929d844032
MD5 3dc496cf2783f8cbb7aed2063327800a
BLAKE2b-256 79b566214b5d0c799bba8d92f86ee56c9e24b7d36dd3ac03f7d4784c5839bca6

See more details on using hashes here.

File details

Details for the file pypdfium2-0.7.0-py3-none-manylinux_2_17_armv7l.whl.

File metadata

  • Download URL: pypdfium2-0.7.0-py3-none-manylinux_2_17_armv7l.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARMv7l
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.7.0-py3-none-manylinux_2_17_armv7l.whl
Algorithm Hash digest
SHA256 13402fa8a1602fec783ea9b18caf4eab2ec2271b13f7b7675de4027d519bb691
MD5 40fb5dfe190b18ad764889816956b26d
BLAKE2b-256 0e112c9ff2186012f53ee003a44d8cccad6f5ce7be9f38e6c015ef437ce42bb7

See more details on using hashes here.

File details

Details for the file pypdfium2-0.7.0-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: pypdfium2-0.7.0-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.7.0-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 1818792a2c58aeeeffa7846986cc779cdcc0f98dcc44e536e43656503b96b16d
MD5 0e090668feaf5ee5f1088df1422444e9
BLAKE2b-256 a15cc3921b1cb983300ed13710609cc7180d2d00ca5280f23983467c7466a0bf

See more details on using hashes here.

File details

Details for the file pypdfium2-0.7.0-py3-none-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.7.0-py3-none-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.7.0-py3-none-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 b9212e0547b0fb72f68afa1d8a671d652f0f8bb9de2858989c2f1dca1a9b94e9
MD5 78d1963ddfa0abc779a5316e93ac030a
BLAKE2b-256 a8d55296001005c1ae4134dce10e7a37b0c4d436619b7c7273cc137504b84384

See more details on using hashes here.

File details

Details for the file pypdfium2-0.7.0-py3-none-macosx_10_11_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.7.0-py3-none-macosx_10_11_arm64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, macOS 10.11+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.7.0-py3-none-macosx_10_11_arm64.whl
Algorithm Hash digest
SHA256 960653452ee117de86726439a7adaf8b7cdf6bd20e17b800a45c7558c7456d8a
MD5 6a13e32c18a8804f8c48a1bfb19fbd1d
BLAKE2b-256 bee981ed7ee27be7e456a622e4d3f5a6d0e6cac5d7a175656e852c10e88e3a63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page