Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. For Python setup and runtime dependencies, please refer to setup.cfg. It is recommended to install ctypesgen from the latest sources (git master).

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can try the following:

make build

Depending on the operating system, additional dependencies may need to be installed beforehand.

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument (supports file paths, bytes, and byte buffers):

pdf = pdfium.PdfDocument(filepath)
print(pdf)
# Work with the helper class
print(pdf.raw)
# Work with the raw PDFium object handle
pdf.close()

Render a single page:

pdf = pdfium.PdfDocument(filepath)
page = pdf.get_page(0)

pil_image = page.render_topil(
    scale = 1,
    rotation = 0,
    crop = (0, 0, 0, 0),
    colour = (255, 255, 255, 255),
    annotations = True,
    greyscale = False,
    optimise_mode = pdfium.OptimiseMode.NONE,
)
pil_image.save("out.png")

[g.close() for g in (pil_image, page, pdf)]

Render multiple pages concurrently:

pdf = pdfium.PdfDocument(filepath)

n_pages = len(pdf)
page_indices = [i for i in range(n_pages)]
renderer = pdf.render_topil(
    page_indices = page_indices,
)

for image, index in zip(renderer, page_indices):
    image.save('out_%s.jpg' % str(index).zfill(n_pages))
    image.close()

pdf.close()

Read the table of contents:

pdf = pdfium.PdfDocument(filepath)

for item in pdf.get_toc():
    print(
        "    " * item.level +
        "[%s] " % ("-" if item.is_closed else "+") +
        "%s -> %s  # %s %s" % (
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )

pdf.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
import os.path
from PIL import Image
import pypdfium2 as pdfium

filepath = os.path.abspath("tests/resources/render.pdf")

doc = pdfium.FPDF_LoadDocument(filepath, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDFs.
  • Extract-URLs use pypdfium2 to extract URLs from PDF documents.
  • py-pdf/benchmarks compares pypdfium2's text extraction capabilities with other libraries.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make packaging. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artefacts.

Testing

Run make test.

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker. For discussion and general questions, also consider joining the PDFium mailing list.

Issues related to pre-compiled packages should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Thanks to

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-2.6.0.tar.gz (629.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-2.6.0-py3-none-win_arm64.whl (2.5 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-2.6.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-2.6.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-2.6.0-py3-none-musllinux_1_2_x86_64.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

pypdfium2-2.6.0-py3-none-musllinux_1_2_i686.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

pypdfium2-2.6.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-2.6.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-2.6.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-2.6.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-2.6.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (3.0 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-2.6.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-2.6.0.tar.gz.

File metadata

  • Download URL: pypdfium2-2.6.0.tar.gz
  • Upload date:
  • Size: 629.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypdfium2-2.6.0.tar.gz
Algorithm Hash digest
SHA256 8012c12c9afa8cb24057adcc450ff58ea49a2ea481c770e85dab660c0675b485
MD5 33f7aeb0b909e443682d37df3a036a76
BLAKE2b-256 7080bc0147c9a080741cacb1e00a940e2a58018bd15c29fcc118de3b6fee8e66

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-2.6.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypdfium2-2.6.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 62483b9c7822c799d521d0502d587f015389df49793a32716e9ae1b5fc945a45
MD5 d86f590168ea4dac48063792ae73aad0
BLAKE2b-256 f7941d9dd8e63153ffb63b0af4ffaa59205f8f2d37ed424b039931befdb21a78

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-2.6.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypdfium2-2.6.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 3c3ef6c9b90a850db613e1233aab82db0e9fc7c09457ad01e2fed35e0bafa74c
MD5 0ee0f3e6c362e61dcdfdf324bebbfb4f
BLAKE2b-256 c1d410f82f99ac3210c2991476921d14ab7581bdfabe76e14206e951a1272ce1

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-2.6.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypdfium2-2.6.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 5a165e53f253996a618cdb365a9c9cae96ef1be07e68f0764f996fd19e8d8a95
MD5 3af5df702f6cc54930f2d2aba2ac6c2c
BLAKE2b-256 0d1bd1feb230e105a352659ae59dec9dc7a355f193693b1b36b2f76fd7127597

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.6.0-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 a770c5b6684e0d417be01152c0e35f2fd4bbd0cfd53668c92973c1ce4c7364a3
MD5 41604ff12c34a2d84e81dce1cca4006b
BLAKE2b-256 fe396307bd15ffb0b2c2d5e1dca28c5818838e81461bc9664bfdb3ea75d902a2

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.6.0-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 34e99f0e28622c70a36895d2088de1f350fc454fbe45c6525b2b9eaf646e4f26
MD5 7b45b04a3ab4a23691315954d104ca56
BLAKE2b-256 e9a3918be4c5cb3d39a80eff908ff030b3654cf6e7323df40f77f0e69ac6064c

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.6.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6d6548e29482b07185482ec7786784f4a976287e0f2626f11932935835c72857
MD5 08a1de580dc96cf6db8932480bab7ba8
BLAKE2b-256 d480023c74bb6cc3615ffe3893654e9ef82e95371e58dbbf865dc308311c46cd

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.6.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 d5a7656cf5b0751f76bdf168857361bcef7058e3e452ac37c399c8d013ab04b8
MD5 26658be06f7a10e9eba9a0aa8c46681e
BLAKE2b-256 d2b6d803707819aab773cf055e4d9311c57b2e0f012151f7ee5b4af9a71d60ec

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-2.6.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 74e6d188c4402770843ee0c1458b681dc2bdc411be6afdc47a870297966a843e
MD5 4130438a973ce3f22875315c84b563d9
BLAKE2b-256 8b80b5990a80209b4be61ae4c8e0eb4d901007f27041f490cdee8dad89648b0a

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.6.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e722b9a2e7611e899f45194c880a49c8df5ff4c504d283b0fbca75bd969cf70a
MD5 36f22a4fc527ebbbc001b6935a64b636
BLAKE2b-256 8ec5868d696ecb1538f1b23c1657f4fa8932491c397db495b1645b309948f518

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.6.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 58eea6c3c7ceceee692f6f6e905d907cde743e245f7bed3c114680512d7c4746
MD5 a02e4ed03d0a16f31507612e2799b6cd
BLAKE2b-256 4bec8e0e965f2460db36960a5db27403cba17b076b20029ea0faa2fbe5974708

See more details on using hashes here.

File details

Details for the file pypdfium2-2.6.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.6.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 2df3b9e54c2dcd52497960b2f852486235f5978d4ad2f9c00cb75ca804a23f53
MD5 5dbd7c74d5f8dc646ca742bc4a9dd3f0
BLAKE2b-256 22dbcadae3b7f5eff473a0cd63a97845e858302fdaef7c5b5b9e2f545e797a92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page