Skip to main content

Python bindings to PDFium

Project description

PyPDFium2

PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. In addition, the Python dependencies setuptools, ctypesgen and wheel are needed. Furthermore, it is essential that you provide a recent enough version of pip (>= 21.3). For more information, please refer to dependencies.md.

Package locally

To get pre-compiled binaries, generate bindings and install PyPDFium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can do the following:

make build
pip3 install dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

${version} and ${platform_tag} are placeholders that need to be replaced with the values that correspond to your platform (e. g. pypdfium2-0.11.0-py3-none-linux.whl).

In case building failed, you could try

python3 platform_setup/build_pdfium.py -p
python3 platform_setup/setup_source.py bdist_wheel
pip3 install dist/pypdfium2-${version}-py3-none-${platform_tag}.whl

to prefer the use of system-provided build tools over the toolchain PDFium ships with. The problem is that the toolchain is limited to a curated set of platforms, as PDFium target cross-compilation for "non-standard" architectures. (Make sure you installed all packages from the Native Build section of dependencies.md, in addition to the default requirements.)

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/cli.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument:

doc = pdfium.PdfDocument(filename)
# ... use methods provided by the helper class
pdf = doc.raw
# ... work with the actual PDFium document handle
doc.close()

Open a PDF using the context manager PdfContext:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the pdf

Open a PDF using the function open_pdf_auto():

pdf, loader_data = pdfium.open_pdf_auto(filename)
# ... work with the pdf
pdfium.close_pdf(pdf, loader_data)

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = 0xFFFFFFFF,
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf(filename):
    image.save(f'out_{suffix}.png')
    image.close()

Read the table of contents:

doc = pdfium.PdfDocument(filepath)
for item in doc.get_toc():
    print(
        '    ' * item.level +
        "{} -> {}  # {} {}".format(
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )
doc.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/support_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page   = pdfium.FPDF_LoadPage(doc, 0)
pdfium.FORM_OnAfterLoadPage(page, form_fill)

width  = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code and the examples directory.

Documentation for the PDFium API is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and PyPDFium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt, which is also shipped with binary redistributions.

Documentation and examples of PyPDFium2 are CC-BY-4.0 licensed.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

(In case bblanchon/pdfium-binaries adds support for more architectures, PyPDFium2 can be adapted easily.)

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

PyPDFium2 contains scripts to automate the release process:

  • To build the wheels, run make release. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artifacts.

Testing

Run pytest on the tests directory.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to pre-compiled binaries should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Non-ascii file paths on Windows

On Windows, the FPDF_LoadDocument() method of PDFium currently is not able to open documents with file paths containing multi-byte, non-ascii characters (see Bug 682). The widestring branch includes a patch that would fix the issue in PDFium, but upstream has not merged it yet.

The support model of PyPDFium2 implements a workaround using FPDF_LoadCustomDocument() to be able to process non-ascii filepaths on Windows anyway.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

PyPDFium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, PyPDFium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-0.13.1.tar.gz (350.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-0.13.1-py3-none-win_arm64.whl (2.4 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-0.13.1-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-0.13.1-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-0.13.1-py3-none-manylinux_2_17_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-0.13.1-py3-none-manylinux_2_17_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-0.13.1-py3-none-manylinux_2_17_aarch64.whl (2.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-0.13.1-py3-none-macosx_11_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

pypdfium2-0.13.1-py3-none-macosx_10_11_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64

File details

Details for the file pypdfium2-0.13.1.tar.gz.

File metadata

  • Download URL: pypdfium2-0.13.1.tar.gz
  • Upload date:
  • Size: 350.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.13.1.tar.gz
Algorithm Hash digest
SHA256 9408085c9957bff53ca10e626087c8cf882780b3a815f7e2e9f81d51c7eeeae5
MD5 bc939c86f0df436f5cae866565a9083a
BLAKE2b-256 47b0b8686faeeb2adde5a388cb59c0a36b13b07de8a5fd154df05a656e4724ca

See more details on using hashes here.

File details

Details for the file pypdfium2-0.13.1-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.13.1-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.13.1-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 efedb4015f4e1fa7809a2ae0e43b6b87a3817af71ce153ac196397c75dc0a1a9
MD5 20cff7f91a77a775731188206c9cc4c3
BLAKE2b-256 e214b8593720dc4a5d204ecaa5f819f4fa97728815e9a648aa0d80910d1edb0a

See more details on using hashes here.

File details

Details for the file pypdfium2-0.13.1-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-0.13.1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.13.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 f7aac20fc1a6b54ede5a77d3b97c6ecc68d1bf8a4aeacbc5c2b4cca1196ba1a6
MD5 1f71feb26787833f9e0613c3fb319a9c
BLAKE2b-256 e3f66b929526c444364c9f6df4c9e23812f7a30adedffdce7684efadcba82ac8

See more details on using hashes here.

File details

Details for the file pypdfium2-0.13.1-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-0.13.1-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.13.1-py3-none-win32.whl
Algorithm Hash digest
SHA256 208f126e22c6a81be5a11ffecdbc91f86cef60a641f4e18af898a18b3efa33e4
MD5 11dcd6d43f101db75d07c56dbca8dcd5
BLAKE2b-256 7bf019ecb6c149a1286042681ed7b7b33bf4dd8077f0061b3ef94df0b4b13c7e

See more details on using hashes here.

File details

Details for the file pypdfium2-0.13.1-py3-none-manylinux_2_17_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.13.1-py3-none-manylinux_2_17_x86_64.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.13.1-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 2a5ed0750f9cb338f2d14e52530f265eead3fd8696cd7d746cc6954f5b2d5bcc
MD5 48c72d15a38360668f044884694d32c6
BLAKE2b-256 738b1d79101afb9c5d4cb48c695bc299b63aa92bfe61a169cf594aafa94959e9

See more details on using hashes here.

File details

Details for the file pypdfium2-0.13.1-py3-none-manylinux_2_17_armv7l.whl.

File metadata

  • Download URL: pypdfium2-0.13.1-py3-none-manylinux_2_17_armv7l.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARMv7l
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.13.1-py3-none-manylinux_2_17_armv7l.whl
Algorithm Hash digest
SHA256 4d02841fd7c3c0be885351aa9a12be92266be99596594e474b31bd4cdfc719bf
MD5 e5b26251046b7583289c96b1b33ae859
BLAKE2b-256 240f0bf8b9cd5e8e8436d1bfbd9837344e3ae3ec064f0438b03a5b1b5a26a22b

See more details on using hashes here.

File details

Details for the file pypdfium2-0.13.1-py3-none-manylinux_2_17_aarch64.whl.

File metadata

  • Download URL: pypdfium2-0.13.1-py3-none-manylinux_2_17_aarch64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.13.1-py3-none-manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 383c441aec337288ae6ab254c0c5a43535ce0d030c8b667f65e86fd04940b5ba
MD5 d1422e7b9a76cfdbce11f2d4afea5a4d
BLAKE2b-256 12bf5990d0a9dcd46d3f94663e03dbd40ea5660dccceff4f48509567a79e4dd8

See more details on using hashes here.

File details

Details for the file pypdfium2-0.13.1-py3-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: pypdfium2-0.13.1-py3-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.13.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7c1f907f5fac7bd68a876ff2bbd69fbab82bdf3fec30e076447f38761f36b59d
MD5 196da4d43b9836897c6e33623f3a9d20
BLAKE2b-256 c352a21e786a9c7ac44d1578180a6e4b2cc90bfe8826fcb5ee7a143773eb271c

See more details on using hashes here.

File details

Details for the file pypdfium2-0.13.1-py3-none-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: pypdfium2-0.13.1-py3-none-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-0.13.1-py3-none-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 41ba9af81ca17d1a3b77a66d209e35557219bd2c217f1046d5bfe12689712ec8
MD5 87333c6989b93b1b3268b123335308e8
BLAKE2b-256 d247a7bbe6f49fa548048ab2f28398873cf97d796e137fc13cdf9232fd9dca4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page