Skip to main content

Python bindings to PDFium

Project description

PyPDFium2

PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

python3 -m pip install -U pypdfium2

Manual installation

The following steps require the external tools git, ctypesgen and gcc to be installed and available in PATH. Additionally, the python package wheel is required.

For source build, more dependencies may be necessary (see DEPS.txt).

Package locally

This will download a pre-built binary for PDFium, generate the bindings and build a wheel.

python3 update_pdfium.py -p ${platform_name}
python3 setup_${platform_name}.py bdist_wheel
python3 -m pip install -U dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Source build

If you are using a platform where no pre-compiled package is available, it might be possible to build PDFium from source. However, this is a complex process that can vary depending on the host system, and it may take a long time.

python3 build_pdfium.py
python3 setup_source.py bdist_wheel
pip3 install dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 -i your_file.pdf -o your_output_dir/ --scale 1 --rotation 0 --optimise-mode none

If you want to render multiple files at once, a bash for-loop may be suitable:

for file in ./*.pdf; do echo "$file" && pypdfium2 -i "$file" -o your_output_dir/ --scale 2; done

Dump the table of contents of a PDF:

pypdfium2 --show-toc -i your_file.pdf

To obtain a full list of possible command-line parameters, run

pypdfium2 --help

CLI documentation: https://pypdfium2.readthedocs.io/en/latest/cli.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF by function:

pdf = pdfium.open_pdf(filename)
# ... work with the PDF
pdfium.close_pdf(pdf)

Open a PDF by context manager:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the PDF

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = 0xFFFFFFFF,
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf(filename):
    image.save(f'out_{suffix}.png')
    image.close()

Read the table of contents:

with pdfium.PdfContext(filename) as pdf:
    toc = pdfium.get_toc(pdf)
    pdfium.print_toc(toc)

Support model documentation: https://pypdfium2.readthedocs.io/en/latest/support_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page   = pdfium.FPDF_LoadPage(doc, 0)
pdfium.FORM_OnAfterLoadPage(page, form_fill)

width  = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code and the examples directory.

Documentation for the PDFium API is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and PyPDFium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt, which is also shipped with binary redistributions.

Documentation and examples of PyPDFium2 are CC-BY-4.0 licensed.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

(In case bblanchon/pdfium-binaries adds support for more architectures, PyPDFium2 can be adapted easily.)

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

PyPDFium2 contains scripts to automate the release process:

  • To build the wheels, run make release. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artifacts.

Testing

Run pytest on the tests directory.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging most likely need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to build configuration should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Non-ascii file paths on Windows

On Windows, PDFium currently is not able to open documents with file names containing multi-byte, non-ascii characters. This issue is confirmed upstream, but has not been addressed yet.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

PyPDFium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, PyPDFium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-0.10.0.tar.gz (323.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-0.10.0-py3-none-win_arm64.whl (2.4 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-0.10.0-py3-none-win_amd64.whl (2.5 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-0.10.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-0.10.0-py3-none-manylinux_2_17_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-0.10.0-py3-none-manylinux_2_17_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-0.10.0-py3-none-manylinux_2_17_aarch64.whl (2.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-0.10.0-py3-none-macosx_11_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

pypdfium2-0.10.0-py3-none-macosx_10_11_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64

File details

Details for the file pypdfium2-0.10.0.tar.gz.

File metadata

  • Download URL: pypdfium2-0.10.0.tar.gz
  • Upload date:
  • Size: 323.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.10.0.tar.gz
Algorithm Hash digest
SHA256 06682302699ac100c9a42b009551007a5b946fa32e6743a91e3294e11ad45cf9
MD5 51e690441661fb1673ffafdb7027a9c2
BLAKE2b-256 d9a3be31ca62db47a9cbf9a12d253f1f5053b20f527092c392691190b74eeffc

See more details on using hashes here.

File details

Details for the file pypdfium2-0.10.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.10.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.10.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 7f433db50b7a7e6bab97306001e8b57132acff602147d87ae94efea21ddb6816
MD5 9bf613a4a40f0b4804c77fd9d007080a
BLAKE2b-256 0fc503cace0c7936e4162fe312f2625d15fabc3036c209593efc5aa3559e2823

See more details on using hashes here.

File details

Details for the file pypdfium2-0.10.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-0.10.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.10.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 feaecc7cfe246868fec637ced195ccdfb34f8255f7d3feef666bc58606c2e4b0
MD5 7515d2df5c9709e7be61fa23b55e7a63
BLAKE2b-256 d0d10b00b18ef72dd78f26123655b00da3531e64882fcc111df6ba10c8e69ff8

See more details on using hashes here.

File details

Details for the file pypdfium2-0.10.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-0.10.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.10.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 48bb13b3a9ebef5ce897cba57f4096b333c2960948d4a21b58cee00996fa2f60
MD5 c63e086ae457b24a21587d8bbdfdd0ae
BLAKE2b-256 d0540ad0458963b016c158483f764f525b053c4dbefc122bf3b526c8f51d0de2

See more details on using hashes here.

File details

Details for the file pypdfium2-0.10.0-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.10.0-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.10.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 fb95222b42584cb26b8ce5e4e9cd991378ecdfbced91219c7f327d1ad0a504a0
MD5 a9032a2f86927a04374b01bafead78e3
BLAKE2b-256 a8f3338bf03a62ad165ba61f5ddbe70f780eb34ddb94d07b5063a92245d56c4d

See more details on using hashes here.

File details

Details for the file pypdfium2-0.10.0-py3-none-manylinux_2_17_armv7l.whl.

File metadata

  • Download URL: pypdfium2-0.10.0-py3-none-manylinux_2_17_armv7l.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARMv7l
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.10.0-py3-none-manylinux_2_17_armv7l.whl
Algorithm Hash digest
SHA256 850101cb1ec5d0cb3508f70c55d5d1c6e1757a44df91ee6b4a5c1fa9590b6539
MD5 1629f918d2ea56a8c290adc0d6614c26
BLAKE2b-256 e68c0bd02126434b9b4e8fec44b78198b356cf624e12da7ec23aa7375eba6c55

See more details on using hashes here.

File details

Details for the file pypdfium2-0.10.0-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: pypdfium2-0.10.0-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.10.0-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 76df79b17c25ae5db507ce9ee2b67a17a5b6369d3128dbc0b70c2b6d7c523110
MD5 e35a2201466c7ce7619ec78254735c99
BLAKE2b-256 ba0238cd9833bc4795d0b94000eee73587a99935aa7ded0f164a527b296be069

See more details on using hashes here.

File details

Details for the file pypdfium2-0.10.0-py3-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.10.0-py3-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.10.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9a930c791d87c136fce047891165cc2c1a2e5b1ab34ff13e92b1e38cc846cd6d
MD5 90890b68005dbd6ef98e494d9bbf66d4
BLAKE2b-256 5831e20cd2940dcdc81a1cc8f6c80a4078264a0eda2cd17014576c3ccf72dd17

See more details on using hashes here.

File details

Details for the file pypdfium2-0.10.0-py3-none-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.10.0-py3-none-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.10.0-py3-none-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 24def092605cadd51019207181011e3dbf7dc5140dc88a98d20d32bc63ff8b07
MD5 a6cf3b368bc45ff6c8c7775c1492783e
BLAKE2b-256 8bea573265f52243802d94b3c02b6b4e97258a551167c0eb4ebcb60f3ad1670d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page