Skip to main content

Python bindings to PDFium

Project description

PyPDFium2

PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

python3 -m pip install -U pypdfium2

Manual installation

The following steps require the external tools git, ctypesgen and gcc to be installed and available in PATH. Additionally, the python package wheel is required.

For source build, more dependencies may be necessary (see DEPS.txt).

Package locally

This will download a pre-built binary for PDFium, generate the bindings and build a wheel.

python3 update.py -p ${platform_name}
python3 setup_${platform_name}.py bdist_wheel
python3 -m pip install -U dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Source build

If you are using a platform where no pre-compiled package is available, it might be possible to build PDFium from source. However, this is a complex process that can vary depending on the host system, and it may take a long time.

python3 build_pdfium.py
python3 setup_source.py bdist_wheel
pip3 install dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 -i your_file.pdf -o your_output_dir/ --scale 1 --rotation 0 --optimise-mode none

If you want to render multiple files at once, a bash for-loop may be suitable:

for file in ./*.pdf; do echo "$file" && pypdfium2 -i "$file" -o your_output_dir/ --scale 2; done

Dump the table of contents of a PDF:

pypdfium2 --show-toc -i your_file.pdf

To obtain a full list of possible command-line parameters, run

pypdfium2 --help

CLI documentation: https://pypdfium2.readthedocs.io/en/latest/cli.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF by function:

pdf = pdfium.open_pdf(filename)
# ... work with the PDF
pdfium.close_pdf(pdf)

Open a PDF by context manager:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the PDF

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = 0xFFFFFFFF,
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf(filename):
    image.save(f'out_{suffix}.png')
    image.close()

Read the table of contents:

with pdfium.PdfContext(filename) as pdf:
    toc = pdfium.get_toc(pdf)
    pdfium.print_toc(toc)

Support model documentation: https://pypdfium2.readthedocs.io/en/latest/support_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page   = pdfium.FPDF_LoadPage(doc, 0)
pdfium.FORM_OnAfterLoadPage(page, form_fill)

width  = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

Documentation for the PDFium API is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and PyPDFium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt, which is also shipped with binary redistributions.

Documentation and examples of PyPDFium2 are CC-BY-4.0 licensed.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 to get a list of platforms for which binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

(In case bblanchon/pdfium-binaries adds support for more architectures, PyPDFium2 can be adapted easily.)

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

PyPDFium2 contains scripts to automate the release process:

  • To build the wheels, run ./release.sh. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run ./clean.sh. This will remove downloaded files and build artifacts.

Testing

Run pytest -sv on the tests directory.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging most likely need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to build configuration should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Non-ascii file paths on Windows

On Windows, PDFium currently is not able to open documents with file names containing multi-byte, non-ascii characters. This issue is confirmed upstream, but has not been addressed yet.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

PyPDFium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, PyPDFium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-0.8.1.tar.gz (283.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-0.8.1-py3-none-win_arm64.whl (2.4 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-0.8.1-py3-none-win_amd64.whl (2.5 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-0.8.1-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-0.8.1-py3-none-manylinux_2_17_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-0.8.1-py3-none-manylinux_2_17_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-0.8.1-py3-none-manylinux_2_17_aarch64.whl (2.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-0.8.1-py3-none-macosx_11_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

pypdfium2-0.8.1-py3-none-macosx_10_11_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64

File details

Details for the file pypdfium2-0.8.1.tar.gz.

File metadata

  • Download URL: pypdfium2-0.8.1.tar.gz
  • Upload date:
  • Size: 283.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.8.1.tar.gz
Algorithm Hash digest
SHA256 50ee26894454dd947a34ea28fbf768dea3e58e31b113b6953ebcea3ba9d5e4e4
MD5 6a95ed10e8b70c61d2c64c1f43f8cec1
BLAKE2b-256 89f629aecd35bb1b80ea40e33d8aafebd263b9851e9e4979d4c93a1e656f8425

See more details on using hashes here.

File details

Details for the file pypdfium2-0.8.1-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.8.1-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.8.1-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 81066eb55d8c7c9802e987b803df288a566efbfa125badcf792d83e1fd898c87
MD5 ea4674776106c0afef5b10c7702dd790
BLAKE2b-256 9a2746ad51da39b1c59b6420747a9d243df09ffebdeea76e2c41488f4ddae98b

See more details on using hashes here.

File details

Details for the file pypdfium2-0.8.1-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-0.8.1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.8.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 37e01edd7aae06dc873ccdbbd67ee2c02d93b98f25560c1f070b0fc692211426
MD5 adaa5d906e8da2c4098d9c74f53640cc
BLAKE2b-256 c4d24d0833217309be9c554b9a8ae8fec7d3af95704c15937aaa33385799c7a1

See more details on using hashes here.

File details

Details for the file pypdfium2-0.8.1-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-0.8.1-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.8.1-py3-none-win32.whl
Algorithm Hash digest
SHA256 b635f089f0af40d73ca225d612daf4cd054bf5ff3a7f44ee1d967eacbf1a3afc
MD5 0b4a3780d0ffbe0a0f2ae4d6e34e7a70
BLAKE2b-256 89c161ffa6fad9c2211bd9a19f7886caf2f4c681c74621dcb5f42dc0d848697b

See more details on using hashes here.

File details

Details for the file pypdfium2-0.8.1-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.8.1-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.8.1-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 28afb11c2960d91b3a9e60c7d9821dcb6aecd9a8df7eccc18a38e0896949dce4
MD5 c55f7d52e039daa091e5d043d687315e
BLAKE2b-256 8f50c3067b4163b9d17701e848f13fb4ee7838da60b150d0eb9818ec41b661a6

See more details on using hashes here.

File details

Details for the file pypdfium2-0.8.1-py3-none-manylinux_2_17_armv7l.whl.

File metadata

  • Download URL: pypdfium2-0.8.1-py3-none-manylinux_2_17_armv7l.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARMv7l
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.8.1-py3-none-manylinux_2_17_armv7l.whl
Algorithm Hash digest
SHA256 919cfc9bbd556605d04bab9cd1495235d4ee88f0393604609ae9b32ce7105394
MD5 900afeefccab504cd50a34c7a470f3b3
BLAKE2b-256 6b47541fa35d4e20e1a9dc57b92abf99bf43a3b433de293a09932224a3c12249

See more details on using hashes here.

File details

Details for the file pypdfium2-0.8.1-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: pypdfium2-0.8.1-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.8.1-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 484fc4d5d07674026e94820262377a06f0427d300cc4fddf09f8a7701eb1f50c
MD5 f1a43ec39efbc9a088a275a30c82b0fa
BLAKE2b-256 45dfb3f7b85d84454ff4c352c468e762a6ff66549d3bb48273bbfb9b06f12ce0

See more details on using hashes here.

File details

Details for the file pypdfium2-0.8.1-py3-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.8.1-py3-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.8.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8f603e6d7a48ba9cc5bc5fcf8a983cd6fb262ece435aef9076e7990d627c7058
MD5 ed67ee2e5ac40befd0a2eadb8f8ac091
BLAKE2b-256 83815a0cbc50478938c14442f85916ef18e8d4b0e29a6d3592ade3f6a2b54cdd

See more details on using hashes here.

File details

Details for the file pypdfium2-0.8.1-py3-none-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.8.1-py3-none-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.8.1-py3-none-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 93c8fe9dbf502eb82f98b401941fd9d75996379107cec63aedd697e40c0bdd94
MD5 2c5091192bd7e67d5660e4c82a990bbd
BLAKE2b-256 05643562204a1612ff68e9c6a641ef79300ececbc4093a6e99a34e4582cbc1eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page