Skip to main content

Python bindings to PDFium

Project description

PyPDFium2

PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. In addition, the Python dependencies setuptools, ctypesgen and wheel are needed. Furthermore, it is essential that you provide a recent enough version of pip (>= 21.3). For more information on dependencies, please refer to dependencies.md.

Package locally

To get pre-compiled binaries, generate bindings and install PyPDFium2, you may run

pip3 install . -v

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can do the following:

python3 build_pdfium.py --getdeps
python3 setup_source.py bdist_wheel
pip3 install dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

${version} and ${platform_tag} are placeholders that need to be replaced with the values that correspond to your platform (e. g. pypdfium2-0.11.0-py3-none-linux.whl).

In case building failed, you could try python3 build_pdfium.py --getdeps -p to prefer the use of system-provided build tools over the toolchain PDFium ships with. This might help since the toolchain is limited to a curated set of platforms, as PDFium target cross-compilation for "non-standard" architectures. (Make sure you installed all packages from the Native Build section of dependencies.md, in addition to the default requirements.)

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 2 --optimise-mode none

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 --help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand --help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/cli.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF by function:

pdf, loader_data = pdfium.open_pdf_auto(filename)
# ... work with the PDF
pdfium.close_pdf(pdf, loader_data)

Open a PDF by context manager:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the PDF

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = 0xFFFFFFFF,
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf(filename):
    image.save(f'out_{suffix}.png')
    image.close()

Read the table of contents:

with pdfium.PdfContext(filename) as pdf:
    toc = pdfium.get_toc(pdf)
    pdfium.print_toc(toc)

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/support_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page   = pdfium.FPDF_LoadPage(doc, 0)
pdfium.FORM_OnAfterLoadPage(page, form_fill)

width  = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code and the examples directory.

Documentation for the PDFium API is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and PyPDFium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt, which is also shipped with binary redistributions.

Documentation and examples of PyPDFium2 are CC-BY-4.0 licensed.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

(In case bblanchon/pdfium-binaries adds support for more architectures, PyPDFium2 can be adapted easily.)

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

PyPDFium2 contains scripts to automate the release process:

  • To build the wheels, run make release. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artifacts.

Testing

Run pytest on the tests directory.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to pre-compiled binaries should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Non-ascii file paths on Windows

On Windows, the FPDF_LoadDocument() method of PDFium currently is not able to open documents with file paths containing multi-byte, non-ascii characters (see Bug 682). The widestring branch includes a patch that would fix the issue in PDFium, but upstream has not merged it yet.

The support model of PyPDFium2 implements a workaround using FPDF_LoadCustomDocument() to be able to process non-ascii filepaths on Windows anyway.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

PyPDFium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, PyPDFium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-0.12.0.tar.gz (351.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-0.12.0-py3-none-win_arm64.whl (2.4 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-0.12.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-0.12.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-0.12.0-py3-none-manylinux_2_17_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-0.12.0-py3-none-manylinux_2_17_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-0.12.0-py3-none-manylinux_2_17_aarch64.whl (2.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-0.12.0-py3-none-macosx_11_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

pypdfium2-0.12.0-py3-none-macosx_10_11_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64

File details

Details for the file pypdfium2-0.12.0.tar.gz.

File metadata

  • Download URL: pypdfium2-0.12.0.tar.gz
  • Upload date:
  • Size: 351.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.12.0.tar.gz
Algorithm Hash digest
SHA256 d81babcced9f2e5cbe2017bd02e8a71e221b96a7761c91534fef8b78dd1542e0
MD5 66aff2375f27e53aa48d72d507939a43
BLAKE2b-256 26e6cbdc2da29ee6e1f1f7d5a9c2739115b4c2baaaa270882803b15a3f4de277

See more details on using hashes here.

File details

Details for the file pypdfium2-0.12.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.12.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.12.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 66f3d88ac2c80168d0439cc2d304ec23a7a3f8ba7c428e58ea4835afde8b17eb
MD5 5929e941ec92bb05d59b50956f06487a
BLAKE2b-256 c595bdef4e5003744f1d39269f077de64afdc6b6e69d293c439e1556132347e2

See more details on using hashes here.

File details

Details for the file pypdfium2-0.12.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-0.12.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.12.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 ea0873a23f1726df96110cc0d1102c88eada039a2888f2ec3e3325d847e7e33b
MD5 0f35467753d20b0db7fbf5de4bc329bd
BLAKE2b-256 4181829607325447f41bf1af7dcc29078c150c629cbbbbc0e82e6dcfeb795ff2

See more details on using hashes here.

File details

Details for the file pypdfium2-0.12.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-0.12.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.12.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 3511d351fcbfc3c561861a3cf46dad9143ffcff64af0505c4ce0fc0c5eb73aa3
MD5 adfbdc59ceb3b67c6596563eb2d311cb
BLAKE2b-256 0ce45bfbc21ae4fdfacb6621553782d2863bed9a54eb0aba7890afbd43917986

See more details on using hashes here.

File details

Details for the file pypdfium2-0.12.0-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.12.0-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.12.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 e3a17f9c0b271aede80ec31f1ad4a72707655ade5d08f97b4460193116b4f718
MD5 4e21832d2fb44cd267907ffdabbdb89a
BLAKE2b-256 6b5068e23ee4a720cd08e7ed572db3a471f60435b971609a22967e1dc43f2771

See more details on using hashes here.

File details

Details for the file pypdfium2-0.12.0-py3-none-manylinux_2_17_armv7l.whl.

File metadata

  • Download URL: pypdfium2-0.12.0-py3-none-manylinux_2_17_armv7l.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARMv7l
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.12.0-py3-none-manylinux_2_17_armv7l.whl
Algorithm Hash digest
SHA256 47b3963c343237b3fbb9493ed74702725568c653edf9f36679b439d667bb5cd8
MD5 dbf2d78532467ff65b9977bf3b27bed9
BLAKE2b-256 2ced15755ebf36825366f480a6d3da017388ac34065d508f103b89978881fafa

See more details on using hashes here.

File details

Details for the file pypdfium2-0.12.0-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: pypdfium2-0.12.0-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.12.0-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 5436c1973a3a67dc3c15c7e4282938c7059012d17e3111251ca34d2171dbb0ac
MD5 1594564cf79675a8ef4bb5b248360592
BLAKE2b-256 dabfed3b03a34d5290749e0104c573628c1c4f66229958416d255849a49a7a12

See more details on using hashes here.

File details

Details for the file pypdfium2-0.12.0-py3-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.12.0-py3-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.12.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d64ec8548f77ef93dedc3dc5ae91812183a55c9d5f6cb1819591805cbab2aeb9
MD5 689e6607d76ab832813a146910f3d4c8
BLAKE2b-256 96926428f727ec6892332d021b86cbc1c9301c982aaf9ebe990947af83d25589

See more details on using hashes here.

File details

Details for the file pypdfium2-0.12.0-py3-none-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.12.0-py3-none-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.12.0-py3-none-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 c96fe5c7ccb0769a781d77b3c78898cba7a4c048a8bc5c57d730e3b3215a8b9f
MD5 fd5b5a38a41455236e48a270682d8639
BLAKE2b-256 1d54f8144709204017bb63cf4dc9aa476b1e787f2a4d8af9766fe3a48852ed65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page