Skip to main content

Python bindings to PDFium

Project description

PyPDFium2

PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

python3 -m pip install -U pypdfium2

Manual installation

# download binaries / header files and generate bindings
python3 update.py

# build the package that corresponds to your platform
python3 setup_${platform_name}.py bdist_wheel

# optionally, run check-wheel-contents on the package to confirm its validity
check-wheel-contents dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

# install the package
python3 -m pip install -U dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

# remove downloaded files and build artifacts
bash clean.sh

Documentation

API documentation for PDFium is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python.

Examples

Using the command-line interface

pypdfium2 -i your_file.pdf -o your_output_dir/ --scale 1 --rotation 0 --optimise-mode none

If you want to render multiple files at once, a bash for-loop may be suitable:

for file in ./*.pdf; do echo "$file" && pypdfium2 -i "$file" -o your_output_dir/ --scale 2; done

To obtain a list of possible command-line parameters, run

pypdfium2 --help

Using the support model

import pypdfium2 as pdfium

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        background_colour = 0xFFFFFFFF,
        render_annotations = True,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")

Using the PDFium API

import ctypes
from PIL import Image
import pypdfium2 as pdfium

doc = pdfium.FPDF_LoadDocument(filename, None) # load document (filename, password string)
page_count = pdfium.FPDF_GetPageCount(doc)     # get page count
assert page_count >= 1

page   = pdfium.FPDF_LoadPage(doc, 0)                # load the first page
width  = int(pdfium.FPDF_GetPageWidthF(page)  + 0.5) # get page width
height = int(pdfium.FPDF_GetPageHeightF(page) + 0.5) # get page height

# render to bitmap
bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)
pdfium.FPDF_RenderPageBitmap(
    bitmap, page, 0, 0, width, height, 0, 
    pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT
)

# retrieve data from bitmap
cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

if bitmap is not None:
    pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDF_CloseDocument(doc)

Licensing

PyPDFium2 source code itself is Apache-2.0 licensed. The auto-generated bindings file contains BSD-3-Clause code.

Documentation and examples are CC-BY-4.0.

PDFium is available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other BSD- and MIT-style licenses apply to the dependencies of PDFium.

License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt, which is also shipped with binary re-distributions.

History

PyPDFium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Currently supported architectures:

  • macOS x86_64 *
  • macOS arm64 *
  • Linux x86_64
  • Linux aarch64 (64-bit ARM) *
  • Linux armv7l (32-bit ARM hard-float, e. g. Raspberry Pi 2)
  • Windows 64bit
  • Windows 32bit *

* Not tested yet

If you have access to a theoretically supported but untested system, please report success or failure on the issues panel.

(In case bblanchon/pdfium-binaries would add support for more architectures, PyPDFium2 could be adapted easily.)

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs.

PyPDFium2 contains scripts to automate the release process:

  • To build wheels for all platforms, run ./release.sh. This will download binaries and header files, write finished Python wheels to dist/, and run check-wheel-contents.
  • To clean up after a release, run ./clean.sh. This will remove downloaded files and build artifacts.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging most likely need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to build configuration should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Non-ascii file paths on Windows

On Windows, PDFium currently is not able to open documents with file names containing multi-byte, non-ascii characters. This bug is reported since March 2017. However, the PDFium development team so far has not given it much attention. The cause of the issue is known and the structure for a fix was proposed, but it has not been applied yet.

This issue cannot reasonably be worked around in PyPDFium2, for the following reasons:

  • Using FPDF_LoadMemDocument() rather than FPDF_LoadDocument() is not possible due to issues with concurrent access to the same file. Moreover, it would be less efficient as the whole document has to be loaded into memory. This makes it impractical for large files.

  • FPDF_LoadCustomDocument() is not a solution, since mapping the complex file reading callback to Python is hardly feasible. Furthermore, there would likely be the same problem with concurrent access.

  • Creating a tempfile with a compatible name would be possible, but cannot be done in PyPDFium2 itself: For faster rendering, you usually set up a multiprocessing pool or a concurrent future. This means each process has to initialise its own PdfContext. If an automatic tempfile workaround were implemented in PdfContext, this would mean that each process creates its own temporary copy of the file, which would be highly inefficient. The tempfile should be created only once for all pages, not for each page separately. Therefore, this workaround can only be applied downstream. It could be done somewhat like this:

    import sys
    
    if sys.platform.startswith('win32') and not filename.isascii():
        # create a temporary copy and remap the file name
        # (str.isascii() requires at least Python 3.7)
        ...
    

    This workaround is currently used for the command-line interface of PyPDFium2 (see __main__.py).

Project details


Release history Release notifications | RSS feed

This version

0.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-0.1.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-0.1.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-0.1.0-py3-none-manylinux_2_17_x86_64.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-0.1.0-py3-none-manylinux_2_17_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-0.1.0-py3-none-manylinux_2_17_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-0.1.0-py3-none-macosx_11_0_arm64.whl (2.8 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

pypdfium2-0.1.0-py3-none-macosx_10_10_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.10+ x86-64

File details

Details for the file pypdfium2-0.1.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-0.1.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.1.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 d3c952dad0e6530660259c26c11ee7d2781fa440abcf377dba6c9532bdeeb2b9
MD5 47a7d171b1c78e61e2f5afb989eba591
BLAKE2b-256 36b541babfe800ddd61a22e9283b78e239ba86d692cfb3ff2b3ec37c43f876f6

See more details on using hashes here.

File details

Details for the file pypdfium2-0.1.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-0.1.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.1.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 ced54a448948fef73555942e29caf04f2b600efa5062188ad277d8806246c0df
MD5 4bc3f26f4ba9c23997d5609b204dbe32
BLAKE2b-256 026db59131b35d8c086fdad07985c22dc5d69ce41a68d37702c2f2bf410ac1e1

See more details on using hashes here.

File details

Details for the file pypdfium2-0.1.0-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.1.0-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.1.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 2e53d9cbdde67875317d0afb06ab8ca379a7fbb9b133e6030f946cf1eac53a30
MD5 c1debf9d4dde01ef6138c35b9472587e
BLAKE2b-256 e5d3b3e17da9eee06f8d4bb9737fbbcb43b71fbe86907dd578ae9e932d634ebf

See more details on using hashes here.

File details

Details for the file pypdfium2-0.1.0-py3-none-manylinux_2_17_armv7l.whl.

File metadata

  • Download URL: pypdfium2-0.1.0-py3-none-manylinux_2_17_armv7l.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARMv7l
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.1.0-py3-none-manylinux_2_17_armv7l.whl
Algorithm Hash digest
SHA256 d278652d16447d2e890aff33115a7b9b5d06d300ad86941e26561d1009573663
MD5 a992f4f7b9063f5c0f4d86d6f8471f72
BLAKE2b-256 f3fd6f09885924d50d41e9630c8b9b2761711c231ecdd9776ddeaff056643f3d

See more details on using hashes here.

File details

Details for the file pypdfium2-0.1.0-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: pypdfium2-0.1.0-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.1.0-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 da399f148a050dbf6fecb838ca29e116a667d58e30670e3fdf4c8f45a6a9420e
MD5 3269d4ccb718d74698f867a3487e3b44
BLAKE2b-256 783dba58dd8deef8a3f2c7177d775facd464d09a379cd46bbb2ad5bd5eff333d

See more details on using hashes here.

File details

Details for the file pypdfium2-0.1.0-py3-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.1.0-py3-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.1.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e2cd1bf023f97e87e26d0b1d3945b48f9c0553c15a4f0af60014f3b470d6509e
MD5 57540e1c1854020e7e6265b61f8a85ac
BLAKE2b-256 42d623dec9ad81f64b68658663f2fdec07a605e4448e78639917ec6f0e61f8d4

See more details on using hashes here.

File details

Details for the file pypdfium2-0.1.0-py3-none-macosx_10_10_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.1.0-py3-none-macosx_10_10_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.10+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.1.0-py3-none-macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 278fdde950521b3c3f3a8e865403388c143bbc70c6c47cc19a19a5ce29ee883a
MD5 a02a58b85b1d9d9202c6474d26997db3
BLAKE2b-256 4f39292fd9faf343916a00a9f5f7eba81cd32fabf8369cb554209d2aa634a941

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page