Skip to main content

Python bindings to PDFium

Project description

PyPDFium2

PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

python3 -m pip install -U pypdfium2

Manual installation

The following steps require the external tools git, ctypesgen and gcc to be installed and available in PATH. Additionally, the python package wheel is required.

For source build, more dependencies may be necessary (see DEPS.txt).

Package locally

This will download a pre-built binary for PDFium, generate the bindings and build a wheel.

python3 update.py -p ${platform_name}
python3 setup_${platform_name}.py bdist_wheel
python3 -m pip install -U dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Source build

If you are using a platform where no pre-compiled package is available, it might be possible to build PDFium from source. However, this is a complex process that can vary depending on the host system, and it may take a long time.

python3 build.py
python3 setup_source.py bdist_wheel
pip3 install dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 -i your_file.pdf -o your_output_dir/ --scale 1 --rotation 0 --optimise-mode none

If you want to render multiple files at once, a bash for-loop may be suitable:

for file in ./*.pdf; do echo "$file" && pypdfium2 -i "$file" -o your_output_dir/ --scale 2; done

Dump the table of contents of a PDF:

pypdfium2 --show-toc -i your_file.pdf

To obtain a full list of possible command-line parameters, run

pypdfium2 --help

CLI documentation: https://pypdfium2.readthedocs.io/en/latest/cli.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF by function:

pdf = pdfium.open_pdf(filename)
# ... work with the PDF
pdfium.close_pdf(pdf)

Open a PDF by context manager:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the PDF

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = 0xFFFFFFFF,
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf(filename):
    image.save(f'out_{suffix}.png')
    image.close()

Read the table of contents:

with pdfium.PdfContext(filename) as pdf:
    toc = pdfium.get_toc(pdf)
    pdfium.print_toc(toc)

Support model documentation: https://pypdfium2.readthedocs.io/en/latest/support_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

doc = pdfium.FPDF_LoadDocument(filename, None) # load document (filename, password string)
page_count = pdfium.FPDF_GetPageCount(doc)     # get page count
assert page_count >= 1

page   = pdfium.FPDF_LoadPage(doc, 0)                # load the first page
width  = math.ceil(pdfium.FPDF_GetPageWidthF(page))  # get page width
height = math.ceil(pdfium.FPDF_GetPageHeightF(page)) # get page height

# render to bitmap
bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)
pdfium.FPDF_RenderPageBitmap(
    bitmap, page, 0, 0, width, height, 0, 
    pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT
)

# retrieve data from bitmap
cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDF_CloseDocument(doc)

Documentation for the PDFium API is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and PyPDFium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Documentation and examples are CC-BY-4.0.

Various other BSD- and MIT-style licenses apply to the dependencies of PDFium.

License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt, which is also shipped with binary re-distributions.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Currently supported architectures:

  • macOS x86_64 *
  • macOS arm64 *
  • Linux x86_64
  • Linux aarch64 (64-bit ARM) *
  • Linux armv7l (32-bit ARM hard-float, e. g. Raspberry Pi 2)
  • Windows 64bit
  • Windows 32bit *

* Not tested yet

If you have access to a theoretically supported but untested system, please report success or failure on the issues panel.

(In case bblanchon/pdfium-binaries would add support for more architectures, PyPDFium2 could be adapted easily.)

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs.

PyPDFium2 contains scripts to automate the release process:

  • To build wheels for all platforms, run ./release.sh. This will download binaries and header files, write finished Python wheels to dist/, and run check-wheel-contents.
  • To clean up after a release, run ./clean.sh. This will remove downloaded files and build artifacts.

Testing

Run pytest -sv on the tests directory.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging most likely need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to build configuration should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Non-ascii file paths on Windows

On Windows, PDFium currently is not able to open documents with file names containing multi-byte, non-ascii characters. This issue is confirmed upstream, but has not been addressed yet.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

PyPDFium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, PyPDFium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-0.5.0.tar.gz (281.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-0.5.0-py3-none-win_amd64.whl (2.5 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-0.5.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-0.5.0-py3-none-manylinux_2_17_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-0.5.0-py3-none-manylinux_2_17_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-0.5.0-py3-none-manylinux_2_17_aarch64.whl (2.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-0.5.0-py3-none-macosx_10_11_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64

pypdfium2-0.5.0-py3-none-macosx_10_11_arm64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ ARM64

File details

Details for the file pypdfium2-0.5.0.tar.gz.

File metadata

  • Download URL: pypdfium2-0.5.0.tar.gz
  • Upload date:
  • Size: 281.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.5.0.tar.gz
Algorithm Hash digest
SHA256 58430a5e3489ccdae97906a780536154ebbe68638991127ebf090dfd6d891388
MD5 205e103e5320efbeec84da17b0e7d741
BLAKE2b-256 59b7f520e85ada7d353e7f9970795ce54dddfd7c987010e45e28dde4fcbe267a

See more details on using hashes here.

File details

Details for the file pypdfium2-0.5.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-0.5.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.5.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 f2f130cb7bcef01e38b65602e986d6c585300d33155f8b2dd721cb1d2aaad39f
MD5 6d9fce57c4082fe8cc786674d893cee4
BLAKE2b-256 f4ebbe3fcfb07da1c911c415ecbee64766317f04d2595f3b180dcdff3364801c

See more details on using hashes here.

File details

Details for the file pypdfium2-0.5.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-0.5.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.5.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 f55f177aa462713fbbe8d6070712617b6c289761700b4f2ea75afb41d6e4e55e
MD5 45efc9aa6bfac6698287453fa0536109
BLAKE2b-256 51f170be4f8a3bbaa7307917802a0f68e5813d0423add28f1e6e6e4f142d1ac8

See more details on using hashes here.

File details

Details for the file pypdfium2-0.5.0-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.5.0-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.5.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 fd186e5ca4e8ffa4bca89f3b0038f6b42a5e570ed254340cdc06bd492aa0172c
MD5 a64a5bc773aece5a1abf56d803af7d18
BLAKE2b-256 94aa0e27984e0f8946efecf0d9e7c2bf55dc7b3d3517c256a684eaf132c8631e

See more details on using hashes here.

File details

Details for the file pypdfium2-0.5.0-py3-none-manylinux_2_17_armv7l.whl.

File metadata

  • Download URL: pypdfium2-0.5.0-py3-none-manylinux_2_17_armv7l.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARMv7l
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.5.0-py3-none-manylinux_2_17_armv7l.whl
Algorithm Hash digest
SHA256 2d08b2536f37862c5a976017aff278e4bea88dbd551aeca1109360ef76aab09b
MD5 64f01b3abe37eece15a5683d243150b1
BLAKE2b-256 568835cdb2b075eba5a4292765a1735c078cc2ced5b847b5c57ae0554552f501

See more details on using hashes here.

File details

Details for the file pypdfium2-0.5.0-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: pypdfium2-0.5.0-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.5.0-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 16922584d2122021367ac5baa114ba584e165c337461be66561ac3679656b1c3
MD5 4e043921e5e6567da208f8f5107f9941
BLAKE2b-256 2c5b374b6b4af4961a51b10a693bd615b1676b85411000b6193f3a7c196b4ff6

See more details on using hashes here.

File details

Details for the file pypdfium2-0.5.0-py3-none-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.5.0-py3-none-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.5.0-py3-none-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 dea0ba2d11e41e726c63fdeef4dc1efeb4e6e26567904ef9238968976c61bf7f
MD5 2219e6a685f2bfdca84064889af1fd34
BLAKE2b-256 baacf778d451f2adfede224aac67ade99eafcab212836afa8b6c5b22b5b35d83

See more details on using hashes here.

File details

Details for the file pypdfium2-0.5.0-py3-none-macosx_10_11_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.5.0-py3-none-macosx_10_11_arm64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.5.0-py3-none-macosx_10_11_arm64.whl
Algorithm Hash digest
SHA256 cec66f0780a46c75042ba5d11eb61c5332b6dc928fccd6a2c3a549e74fb57302
MD5 5fe8cf73a7a771ee5f9406466a7923c1
BLAKE2b-256 4f8804f2b6f2b9a7ac3bd5b444d5dd05cc5d4c9cd152d84b3884774748537f65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page