Skip to main content

Python bindings to PDFium

Project description

PyPDFium2

PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

python3 -m pip install -U pypdfium2

Manual installation

The following steps require git, ctypesgen and gcc to be installed and available in PATH.

Package locally

This will download a pre-built binary for PDFium, generate the bindings and build a wheel.

python3 update.py -p ${platform_name}
python3 setup_${platform_name}.py bdist_wheel
python3 -m pip install -U dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Source build

If you are using an architecture where no pre-compiled package is available, it is possible to build PDFium from source. However, this is a complex process that can vary depending on the host system, and it may take a long time.

Please make sure you have all build dependencies installed (see DEPS.txt).

python3 build.py
python3 setup_source.py bdist_wheel
pip3 install dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Examples

Using the command-line interface

pypdfium2 -i your_file.pdf -o your_output_dir/ --scale 1 --rotation 0 --optimise-mode none

If you want to render multiple files at once, a bash for-loop may be suitable:

for file in ./*.pdf; do echo "$file" && pypdfium2 -i "$file" -o your_output_dir/ --scale 2; done

To obtain a list of possible command-line parameters, run

pypdfium2 --help

CLI documentation: https://pypdfium2.readthedocs.io/en/latest/cli.html

Using the support model

Render a single page:

import pypdfium2 as pdfium

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        background_colour = 0xFFFFFFFF,
        render_annotations = True,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")

Render multiple pages concurrently (in this case, the whole document):

import pypdfium2 as pdfium

for image, suffix in pdfium.render_pdf(filename):
    image.save(f'out_{suffix}.png')

Support model documentation: https://pypdfium2.readthedocs.io/en/latest/support_api.html

Using the PDFium API

import ctypes
from PIL import Image
import pypdfium2 as pdfium

doc = pdfium.FPDF_LoadDocument(filename, None) # load document (filename, password string)
page_count = pdfium.FPDF_GetPageCount(doc)     # get page count
assert page_count >= 1

page   = pdfium.FPDF_LoadPage(doc, 0)                # load the first page
width  = int(pdfium.FPDF_GetPageWidthF(page)  + 0.5) # get page width
height = int(pdfium.FPDF_GetPageHeightF(page) + 0.5) # get page height

# render to bitmap
bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)
pdfium.FPDF_RenderPageBitmap(
    bitmap, page, 0, 0, width, height, 0, 
    pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT
)

# retrieve data from bitmap
cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

if bitmap is not None:
    pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDF_CloseDocument(doc)

Documentation for the PDFium API is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubts, take a look at the inline source code documentation of PDFium.

Licensing

PyPDFium2 source code itself is Apache-2.0 licensed. The auto-generated bindings file contains BSD-3-Clause code.

Documentation and examples are CC-BY-4.0.

PDFium is available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other BSD- and MIT-style licenses apply to the dependencies of PDFium.

License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt, which is also shipped with binary re-distributions.

History

PyPDFium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, PyPDFium2 includes facilities to build PDFium from source, to extend platform compatibility.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Currently supported architectures:

  • macOS x86_64 *
  • macOS arm64 *
  • Linux x86_64
  • Linux aarch64 (64-bit ARM) *
  • Linux armv7l (32-bit ARM hard-float, e. g. Raspberry Pi 2)
  • Windows 64bit
  • Windows 32bit *

* Not tested yet

If you have access to a theoretically supported but untested system, please report success or failure on the issues panel.

(In case bblanchon/pdfium-binaries would add support for more architectures, PyPDFium2 could be adapted easily.)

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs.

PyPDFium2 contains scripts to automate the release process:

  • To build wheels for all platforms, run ./release.sh. This will download binaries and header files, write finished Python wheels to dist/, and run check-wheel-contents.
  • To clean up after a release, run ./clean.sh. This will remove downloaded files and build artifacts.

Testing

Run pytest -sv on the tests directory.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging most likely need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to build configuration should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Non-ascii file paths on Windows

On Windows, PDFium currently is not able to open documents with file names containing multi-byte, non-ascii characters. This issue is confirmed upstream, but has not been addressed yet.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-0.4.1-py3-none-win_amd64.whl (2.5 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-0.4.1-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-0.4.1-py3-none-manylinux_2_17_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-0.4.1-py3-none-manylinux_2_17_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-0.4.1-py3-none-manylinux_2_17_aarch64.whl (2.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-0.4.1-py3-none-macosx_10_11_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64

pypdfium2-0.4.1-py3-none-macosx_10_11_arm64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ ARM64

File details

Details for the file pypdfium2-0.4.1-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-0.4.1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.4.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 47da154fc9a3460de119c0d6dd27d1041a8642fe50f26e64c01354e140890214
MD5 74ced13a9066bea2c66d17456b1c92c1
BLAKE2b-256 9efa62b9f01ac0207cafbc0239aaa6ac036f7390ed5d178dddb43e9f8db83245

See more details on using hashes here.

File details

Details for the file pypdfium2-0.4.1-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-0.4.1-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.4.1-py3-none-win32.whl
Algorithm Hash digest
SHA256 79cb1dd6f3fa334657bf509432431cfecd3fa460460e71839685b1672afd71c3
MD5 18b9d88000b69a4fd9531d48e05de016
BLAKE2b-256 3747811866eb3c9f21d6f8df9cb36c6ba3fb8d6af3fd69f0ed59d16fce6a6900

See more details on using hashes here.

File details

Details for the file pypdfium2-0.4.1-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.4.1-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.4.1-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 1c120d26ed56f10cdb87097d0607dff64c6d0cabcec2330932358cfadefa1722
MD5 312089365e9bf29e977cd3c2ccbf8d09
BLAKE2b-256 0e633a3498872d2265b406f6740b1169a584fedb4801ec2f1b10064dc5966ffc

See more details on using hashes here.

File details

Details for the file pypdfium2-0.4.1-py3-none-manylinux_2_17_armv7l.whl.

File metadata

  • Download URL: pypdfium2-0.4.1-py3-none-manylinux_2_17_armv7l.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARMv7l
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.4.1-py3-none-manylinux_2_17_armv7l.whl
Algorithm Hash digest
SHA256 39fa5d3fcde61aea47a23e2d2ce812138ac5b1a87450ededed300a00c82b47a7
MD5 0ec3d82c8238868d2ad6fd9f5e4f1a86
BLAKE2b-256 c440d59beef61fe46263fbcbfcad3dc505e4d6a8833e22df175a9478ca79848d

See more details on using hashes here.

File details

Details for the file pypdfium2-0.4.1-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: pypdfium2-0.4.1-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.4.1-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 89e18510d6e0b6d987d9964374ff9b63dde998a16e76c7b08170baa333359794
MD5 b0ae2680f636d0b5febdc1a779343589
BLAKE2b-256 367e3840a69301daf0eff34ec9b81dc2d61f44d0e16179992f6c7358240984ba

See more details on using hashes here.

File details

Details for the file pypdfium2-0.4.1-py3-none-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.4.1-py3-none-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.4.1-py3-none-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 07b7382286a235912f5ff0d05b5f11c3cd3f17f695c0a676723489449e9607e1
MD5 0b7573f7770b6a9d5403c06c639c3dd5
BLAKE2b-256 45ee59b5706bdd67de9301bd99083f8786cd5defe91e15c5decf126c8d9434ca

See more details on using hashes here.

File details

Details for the file pypdfium2-0.4.1-py3-none-macosx_10_11_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.4.1-py3-none-macosx_10_11_arm64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.4.1-py3-none-macosx_10_11_arm64.whl
Algorithm Hash digest
SHA256 002d9956052c4500afdb0a923e45aabc3b0b53e9617ed8e94e7e571797e543df
MD5 4650d21afc38f90c226f9a9bf3121f57
BLAKE2b-256 3281aaa9dd8a216974b791cec2feee64aaca342fdf5ed47fdc0c0c4e7e96a5bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page