Skip to main content

Python bindings to PDFium

Project description

PyPDFium2

PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

python3 -m pip install -U pypdfium2

Manual installation

The following steps require git, ctypesgen and gcc to be installed and available in PATH.

Package locally

This will download a pre-built binary for PDFium, generate the bindings and build a wheel.

python3 update.py -p ${platform_name}
python3 setup_${platform_name}.py bdist_wheel
python3 -m pip install -U dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Source build

If you are using an architecture where no pre-compiled package is available, it is possible to build PDFium from source. However, this is a complex process that can vary depending on the host system, and it may take a long time.

python3 build.py
python3 setup_source.py bdist_wheel
pip3 install dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Examples

Using the command-line interface

pypdfium2 -i your_file.pdf -o your_output_dir/ --scale 1 --rotation 0 --optimise-mode none

If you want to render multiple files at once, a bash for-loop may be suitable:

for file in ./*.pdf; do echo "$file" && pypdfium2 -i "$file" -o your_output_dir/ --scale 2; done

To obtain a list of possible command-line parameters, run

pypdfium2 --help

CLI documentation: https://pypdfium2.readthedocs.io/en/latest/cli.html

Using the support model

import pypdfium2 as pdfium

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        background_colour = 0xFFFFFFFF,
        render_annotations = True,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")

Support model documentation: https://pypdfium2.readthedocs.io/en/latest/api.html

Using the PDFium API

import ctypes
from PIL import Image
import pypdfium2 as pdfium

doc = pdfium.FPDF_LoadDocument(filename, None) # load document (filename, password string)
page_count = pdfium.FPDF_GetPageCount(doc)     # get page count
assert page_count >= 1

page   = pdfium.FPDF_LoadPage(doc, 0)                # load the first page
width  = int(pdfium.FPDF_GetPageWidthF(page)  + 0.5) # get page width
height = int(pdfium.FPDF_GetPageHeightF(page) + 0.5) # get page height

# render to bitmap
bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)
pdfium.FPDF_RenderPageBitmap(
    bitmap, page, 0, 0, width, height, 0, 
    pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT
)

# retrieve data from bitmap
cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

if bitmap is not None:
    pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDF_CloseDocument(doc)

Documentation for the PDFium API is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubts, take a look at the inline source code documentation of PDFium.

Licensing

PyPDFium2 source code itself is Apache-2.0 licensed. The auto-generated bindings file contains BSD-3-Clause code.

Documentation and examples are CC-BY-4.0.

PDFium is available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other BSD- and MIT-style licenses apply to the dependencies of PDFium.

License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt, which is also shipped with binary re-distributions.

History

PyPDFium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, PyPDFium2 includes facilities to build PDFium from source, to extend platform compatibility.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Currently supported architectures:

  • macOS x86_64 *
  • macOS arm64 *
  • Linux x86_64
  • Linux aarch64 (64-bit ARM) *
  • Linux armv7l (32-bit ARM hard-float, e. g. Raspberry Pi 2)
  • Windows 64bit
  • Windows 32bit *

* Not tested yet

If you have access to a theoretically supported but untested system, please report success or failure on the issues panel.

(In case bblanchon/pdfium-binaries would add support for more architectures, PyPDFium2 could be adapted easily.)

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs.

PyPDFium2 contains scripts to automate the release process:

  • To build wheels for all platforms, run ./release.sh. This will download binaries and header files, write finished Python wheels to dist/, and run check-wheel-contents.
  • To clean up after a release, run ./clean.sh. This will remove downloaded files and build artifacts.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging most likely need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to build configuration should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Non-ascii file paths on Windows

On Windows, PDFium currently is not able to open documents with file names containing multi-byte, non-ascii characters. This bug is reported since March 2017. However, the PDFium development team so far has not given it much attention. The cause of the issue is known and the structure for a fix was proposed, but it has not been applied yet.

The following approaches have been considered to work around this limitation in PyPDFium2:

  • Using FPDF_LoadMemDocument() rather than FPDF_LoadDocument() is not possible due to issues with concurrent access to the same file. Moreover, it would be less efficient as the whole document has to be loaded into memory. This makes it impractical for large files.

  • FPDF_LoadCustomDocument() is not a solution, since mapping the complex file reading callback to Python is hardly feasible. Furthermore, there would likely be the same problem with concurrent access.

  • Creating a tempfile with a compatible name would be possible, but cannot be done in PdfContext itself: For faster rendering, you usually set up a multiprocessing pool or a concurrent future. This means each process has to initialise its own PdfContext. If an automatic tempfile workaround were implemented in PdfContext, this would mean that each process creates its own temporary copy of the file, which would be highly inefficient. The tempfile should be created only once for all pages, not for each page separately. The workaround could be done somewhat like this:

    import sys
    
    if sys.platform.startswith('win32') and not filename.isascii():
        # create a temporary copy and remap the file name
        # (str.isascii() requires at least Python 3.7)
        ...
    

    This concept is currently used for the render_pdf() support model of PyPDFium2 (see _helpers.py).

Project details


Release history Release notifications | RSS feed

This version

0.3.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-0.3.0-py3-none-win_amd64.whl (2.5 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-0.3.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-0.3.0-py3-none-manylinux_2_17_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-0.3.0-py3-none-manylinux_2_17_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-0.3.0-py3-none-manylinux_2_17_aarch64.whl (2.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-0.3.0-py3-none-macosx_10_11_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64

pypdfium2-0.3.0-py3-none-macosx_10_11_arm64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ ARM64

File details

Details for the file pypdfium2-0.3.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-0.3.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.3.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 a659334f807f3ec1591244186ba58c01fb84f5ff52f2285d09e030d5ba7e739b
MD5 5adc863decbe0d76a7f73be3a546542a
BLAKE2b-256 5e23be0e472a1769887460fbf42dd03b1a996f69168cfa1d59557fb94ed34344

See more details on using hashes here.

File details

Details for the file pypdfium2-0.3.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-0.3.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.3.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 9ab79985b018a9d7b2c2e124e7185480e64ee3c00ccf9ba2ecf87ae860594237
MD5 27dbb056b49397424f0bcfeba1538dd8
BLAKE2b-256 63b371abee22a3c95f837c33218f312f44eebf434e40ffdc886d361dbccbfb37

See more details on using hashes here.

File details

Details for the file pypdfium2-0.3.0-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.3.0-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.3.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 025233282a235a6387ff24ad0c68c4d4ea5ecbc20b503e4258fa264177107f83
MD5 2cb134775108ec857302a3600bcf1bd6
BLAKE2b-256 fbb3f7cc70f89f0bd3ad06de2b78a537aef266df7d95b7755d2bcec18c1c6947

See more details on using hashes here.

File details

Details for the file pypdfium2-0.3.0-py3-none-manylinux_2_17_armv7l.whl.

File metadata

  • Download URL: pypdfium2-0.3.0-py3-none-manylinux_2_17_armv7l.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARMv7l
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.3.0-py3-none-manylinux_2_17_armv7l.whl
Algorithm Hash digest
SHA256 f3c0c327c36dc11f7590b26918939bbd290220e4394748f7451b1d41579b418e
MD5 d560375ffd4b1d7ffb61951bd599d8fc
BLAKE2b-256 f976ba8dea428b10cf9a2dd3580b5a0488adbc1f19f4253bb8e10a1ffc8c8eb8

See more details on using hashes here.

File details

Details for the file pypdfium2-0.3.0-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: pypdfium2-0.3.0-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.3.0-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 08bd911aa80c8b4c54d9b285a0401a5e05264186ea5e539004c8093c7b09afe5
MD5 d06d407313c524a546a076e0e61037e4
BLAKE2b-256 a31aec8a8d07c173df51341e4e9abc946e587a0e52b1ac48c778778188715442

See more details on using hashes here.

File details

Details for the file pypdfium2-0.3.0-py3-none-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.3.0-py3-none-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.3.0-py3-none-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 6aa112d5c5ba7a11ebb0d2f8b2c1f8d28890865143218d42237c09032b2910f1
MD5 a933e122bc42607031308908ed9f65f0
BLAKE2b-256 63f557bc858bd668b006f933c1df8ec955343c7f1247f7472a222c9f53dcbfd3

See more details on using hashes here.

File details

Details for the file pypdfium2-0.3.0-py3-none-macosx_10_11_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.3.0-py3-none-macosx_10_11_arm64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.3.0-py3-none-macosx_10_11_arm64.whl
Algorithm Hash digest
SHA256 f2ffae18961a6b0b6062c8a946fecf8c66960d5a0a56c5aa94a14d0bb0d59b3c
MD5 74f4acd2ddab841815d14731d8dd0ba2
BLAKE2b-256 aaa4a7c3388b5470ada68512f11bf6584b6a227850d73ddb042cbfb8ee25d590

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page