Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. For Python setup and runtime dependencies, please refer to setup.cfg. It is recommended to install ctypesgen from the latest sources (git master).

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can try the following:

make build

Depending on the operating system, additional dependencies may need to be installed beforehand.

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 --help. Individual help for each subcommand can be accessed in the same way (pypdfium2 subcommand --help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument (supports file paths, bytes, and byte buffers):

pdf = pdfium.PdfDocument(filepath)
print(pdf)
# Work with the helper class
print(pdf.raw)
# Work with the raw PDFium object handle
pdf.close()

Render a single page:

pdf = pdfium.PdfDocument(filepath)
page = pdf.get_page(0)

pil_image = page.render_topil(
    scale = 1,
    rotation = 0,
    crop = (0, 0, 0, 0),
    colour = (255, 255, 255, 255),
    annotations = True,
    greyscale = False,
    optimise_mode = pdfium.OptimiseMode.NONE,
)
pil_image.save("out.png")

for g in (pil_image, page, pdf): g.close()

Render multiple pages concurrently:

pdf = pdfium.PdfDocument(filepath)

n_pages = len(pdf)
page_indices = [i for i in range(n_pages)]
renderer = pdf.render_topil(
    page_indices = page_indices,
)

for image, index in zip(renderer, page_indices):
    image.save('out_%s.jpg' % str(index).zfill(n_pages))
    image.close()

pdf.close()

Read the table of contents:

pdf = pdfium.PdfDocument(filepath)

for item in pdf.get_toc():
    print(
        "    " * item.level +
        "[%s] " % ("-" if item.is_closed else "+") +
        "%s -> %s  # %s %s" % (
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )

pdf.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
import os.path
from PIL import Image
import pypdfium2 as pdfium

filepath = os.path.abspath("tests/resources/render.pdf")

doc = pdfium.FPDF_LoadDocument(filepath, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDFs.
  • Extract-URLs use pypdfium2 to extract URLs from PDF documents.
  • py-pdf/benchmarks compares pypdfium2's text extraction capabilities with other libraries.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make packaging. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artefacts.

Testing

Run make test.

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker. For discussion and general questions, also consider joining the PDFium mailing list.

Issues related to pre-compiled packages should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Thanks to

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

This version

2.8.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-2.8.0.tar.gz (631.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-2.8.0-py3-none-win_arm64.whl (2.5 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-2.8.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-2.8.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-2.8.0-py3-none-musllinux_1_2_x86_64.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

pypdfium2-2.8.0-py3-none-musllinux_1_2_i686.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

pypdfium2-2.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-2.8.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-2.8.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-2.8.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-2.8.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (3.0 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-2.8.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-2.8.0.tar.gz.

File metadata

  • Download URL: pypdfium2-2.8.0.tar.gz
  • Upload date:
  • Size: 631.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypdfium2-2.8.0.tar.gz
Algorithm Hash digest
SHA256 84a0167aabf219865dee2ac8fb23ed2f51c202d0e0fa29a383c8a9141e5f7086
MD5 69b64cce6027a79459ff6da2d7b2254c
BLAKE2b-256 99d64f9b1addc1b98165872a3cd2507d844b248a893374488b04ffe39f594a7a

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-2.8.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypdfium2-2.8.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 7a9211bb7ee34518ae9ec371b143af2e3497b3843d5730d45eff4f6f32104795
MD5 3b952c83cb8811fd63384e34c81cf563
BLAKE2b-256 0d65f9874a8f9e532db357ad35eb9fa64937a86c1a935a0fc9593fa93849422f

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-2.8.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypdfium2-2.8.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 eaec6f4a226a3ac13016dc4d5a00193d58d82bc70160f2565707f458e3beb29e
MD5 bd2b4fdbf22884b46734127f44d2472a
BLAKE2b-256 4aa90187c2d429af50d5013a71b56ddb3622c5b07421daee4d6e6952c56d3d15

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-2.8.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypdfium2-2.8.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 07b8817a167a2ceabf7aadb54451c1b569c2bea003ba64ab2687df5f9a955172
MD5 8c1894a5eb2fbe915f0f90fdc8d9133c
BLAKE2b-256 5480b2f3e435d7f23dbd0da8e47328817103538abfaa57d1b0c65d6f74db5ee1

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.8.0-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 3902e6ae80ff98dda2b4e4cf35efda1c0257b249384459cc2b658a2e8d579ea1
MD5 d2f575b8172ce32b4c69dc0f8fa9e3aa
BLAKE2b-256 99a6a9081e2ec0c650b9e7d8789380177934c3965410a76e7320417b46e891c1

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.8.0-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 4a0a8428f4354281268c75ccde5eff424a543c7f3b14f73a4d29128f2b4646f8
MD5 10823f7e0221d65b2bd65bca5a8d127a
BLAKE2b-256 9731f964f9f278119d19bc169b598b4057983b28afa9c3f6480b38e6517278d0

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3e0354525395e592d0657efe7b94989f0f892695570dd862097898b1d1bd226c
MD5 b6fb9b97d8475f2d28c5f83a63d74dd3
BLAKE2b-256 4915e862f6295666c329e60acf7230acb57573afec41d5af724087caf209e79d

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.8.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 7759d9aa35c4dcb7e76ab1dc6ab22c9eb8503fa88837e2815bf326d851facad8
MD5 2e0aec807385f5cc6cd922dc760cd197
BLAKE2b-256 95e99943dd001854fa59369977e7cdae1cfcb8477d27b88ce41974551336680e

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-2.8.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 70d03dbc41c219a4cd3d1ee2c55cf2bd4ec670daefb33a13149f278a70d1a4dd
MD5 fa99f4be799312ae8adb4c8268c119a5
BLAKE2b-256 7fff5d1ee5668ec70f55832da46896e3e5ef74e35f2f4e3c65f9a687d3b0f6c5

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.8.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 30f0abfbd35353656f7a03f2153f1b402508f878aa323c6e7087e9abc545a652
MD5 8fa4561109fba1f114ee99842b873c4d
BLAKE2b-256 fb95a844a6e321416b3c4bb9e0b9b2ed3c1b66047800fd6fb1055096d22fceb1

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.8.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 b72c2633f1093ef460d8ef31853d7df4ffd21046f120f5725d1c98e4b6e8258d
MD5 4a8d8c4f1b11929fcf11ea41a3eb4120
BLAKE2b-256 c36fee4dc715ea45ac241c6f35c97f5b26018b76fe270d3b50e5db34ac6376bb

See more details on using hashes here.

File details

Details for the file pypdfium2-2.8.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.8.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 92323e6456f2be378979e60c45c458b75c2a0311d46e6f2e0b9b8b84f8c89f5c
MD5 93c9e69ba2e5fe16d6e6f03400ff5bce
BLAKE2b-256 2e2ca8454f9c6ad62790a728b70e2f676c922240802e8ada8b69efdd01c0e17e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page