Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. For Python setup and runtime dependencies, please refer to setup.cfg. It is recommended to install ctypesgen from the latest sources (git master).

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can try the following:

make build

Depending on the operating system, additional dependencies may need to be installed beforehand.

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument (supports file paths, bytes, and byte buffers):

pdf = pdfium.PdfDocument(filepath)
print(pdf)
# Work with the helper class
print(pdf.raw)
# Work with the raw PDFium object handle
pdf.close()

Render a single page:

pdf = pdfium.PdfDocument(filepath)
page = pdf.get_page(0)

pil_image = page.render_topil(
    scale = 1,
    rotation = 0,
    crop = (0, 0, 0, 0),
    colour = (255, 255, 255, 255),
    annotations = True,
    greyscale = False,
    optimise_mode = pdfium.OptimiseMode.NONE,
)
pil_image.save("out.png")

[g.close() for g in (pil_image, page, pdf)]

Render multiple pages concurrently:

pdf = pdfium.PdfDocument(filepath)

n_pages = len(pdf)
page_indices = [i for i in range(n_pages)]
renderer = pdf.render_topil(
    page_indices = page_indices,
)

for image, index in zip(renderer, page_indices):
    image.save('out_%s.jpg' % str(index).zfill(n_pages))
    image.close()

pdf.close()

Read the table of contents:

pdf = pdfium.PdfDocument(filepath)

for item in pdf.get_toc():
    print(
        "    " * item.level +
        "[%s] " % ("-" if item.is_closed else "+") +
        "%s -> %s  # %s %s" % (
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )

pdf.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
import os.path
from PIL import Image
import pypdfium2 as pdfium

filepath = os.path.abspath("tests/resources/render.pdf")

doc = pdfium.FPDF_LoadDocument(filepath, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDFs.
  • Extract-URLs use pypdfium2 to extract URLs from PDF documents.
  • py-pdf/benchmarks compares pypdfium2's text extraction capabilities with other libraries.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make packaging. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artefacts.

Testing

Run make test.

Publishing

The release process is automated using a CI workflow that pushes to GitHub, TestPyPI and PyPI. To do a release, first run make packaging locally to check that everything works as expected. If all went well, upload changes to the version file and push a new tag to trigger the Release woirkflow. Always make sure the information in src/pypdfium2/version.py matches with the tag!

git tag -a A.B.C
git push --tags

Once a new version is released, update the stable branch to point at the commit of the latest tag.

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker. For discussion and general questions, also consider joining the PDFium mailing list.

Issues related to pre-compiled packages should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Thanks to

Fun facts

If you are on Linux, have a recent version of LibreOffice installed, and insist on saving as much disk space as anyhow possible, you can remove the PDFium binary shipped with pypdfium2 and create a symbolic link to the one provided by LibreOffice. This is not recommended, but the following proof-of-concept steps demonstrate that it is possible. (If using this strategy, it is likely that certain newer methods such as FPDF_ImportNPagesToOne() will not be available yet, since the PDFium build of LibreOffice may be a bit older.)

# Find out where the pypdfium2 installation is located
python3 -m pip show pypdfium2 |grep Location

# Now go to the path you happen to determine
# If pypdfium2 was installed locally (without root privileges), the path will look somewhat like this
cd ~/.local/lib/python3.8/site-packages/

# Descend into the pypdfium2 directory
cd pypdfium2/

# Delete the current PDFium binary
rm pdfium

# Create a symbolic link to the PDFium binary of LibreOffice
# The path might differ depending on the distribution - this is what applies for Ubuntu 20.04
ln -s /usr/lib/libreoffice/program/libpdfiumlo.so pdfium

Sadly, mainstream Linux distributors did not create an own package for PDFium, which causes it to be installed separately with every single program that uses it.

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-2.2.0.tar.gz (629.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-2.2.0-py3-none-win_arm64.whl (2.5 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-2.2.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-2.2.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-2.2.0-py3-none-musllinux_1_2_x86_64.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

pypdfium2-2.2.0-py3-none-musllinux_1_2_i686.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

pypdfium2-2.2.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-2.2.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-2.2.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-2.2.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-2.2.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-2.2.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-2.2.0.tar.gz.

File metadata

  • Download URL: pypdfium2-2.2.0.tar.gz
  • Upload date:
  • Size: 629.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.2.0.tar.gz
Algorithm Hash digest
SHA256 48a97d9ac81e8466df071c1c689d5585fc8a810b8c13e14cfa709b2accca676b
MD5 22febffc22c72e699e934f4230291b64
BLAKE2b-256 54f8b2932f5e1cf2846e2e0b2207b328bfb376c4f9eba5976a34fc48491ed75f

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-2.2.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.2.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 e44ce74c23df88ffa903277a79571e64d44b18053d2ab03e03ed2b68e953e5e3
MD5 126f145a524fda993746edd58dcae2aa
BLAKE2b-256 2dfce31092f9514f683c6ea1690785074a7747c7e73e3844047e3176f4e4595b

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-2.2.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.2.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 c29386b6cfb0aa42daedb7b51fe4f679d6d74ad6b808bf5d44f56590c274cb3e
MD5 087105998077328746948de83f91632a
BLAKE2b-256 a82508e526582fb46912217cfb18ddd7e532aef34193c1f55e867328ddc10ee3

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-2.2.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.2.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 db560c41f70ca92035b0ae5fa2764ea864f086a72a7b4bcb58b4d28d0ab15f20
MD5 226705ca27e16eb1c7bc8167062a3157
BLAKE2b-256 435dac346ccd549c5a5c0694f010d8b2af259a22da62a7fe8c7323a4dc0d9406

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.2.0-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 d5678cb9c45648622cb634ada7542f62605576ce0aa41a0575370e66aebada91
MD5 73e02804c18adac178a50092f4bbb302
BLAKE2b-256 f377235b83eb51c091cde812ce11867dd77a8c0b7b04218ec269ddaf36c32adc

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.2.0-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 b9cda8b894cbdbd7ce2507110565fbe6dc1b554d2820f2659fe0700e97296079
MD5 5de6ef98620648f69eaf92e8c59b02f8
BLAKE2b-256 a5f371cb3c173f0c2d14629d2bb4e5e81a2d6ab30cc7585640d382aa7f4dbfad

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.2.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cfd330de664889f600f8dcc0fbc942144dba4f878138063a13dfa172643acf87
MD5 9a0af01b5ba56ea688ce2d45793d4e4f
BLAKE2b-256 54fe45ce58519c24d79a19fce45f756c544b17aefc58c64fafc95d90157f4df0

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.2.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 005a16da8389421bdaf8b6c792905fb581caee8d8abcc8c488eabfacbfb4e13e
MD5 728f8bb5ef0340d102ec12dc76eabedd
BLAKE2b-256 e34c0b655904a21680ae1ede3337f1a2eea361830719064ca9cd9e24f1438710

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-2.2.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 e780ca82ba233663f96c4045f42e91e4f70b8cdc4627f83e2b118709d5e9e7c1
MD5 3ef7aa36996585a07620610801527790
BLAKE2b-256 968cc548f8ec1430fba5a720d545f33d229815d32f9a5ed811a6c26262560daf

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.2.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 faed775d11d4410481bc9ee6cbcf5dd88b0f7386ba55e3146cac066f588122e9
MD5 0b3e4cad1a264922c39bbdc0c8a25464
BLAKE2b-256 32d0a2f5efa98c52dca4baa97cc549ee13b94d9a55720cf8c00161a0087df979

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.2.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 d25904cb30f7cf5de1a2d7d5bc53fb3c0c58094396c6faca65f90a4fca04faa3
MD5 ece1ddffe472855f280d4eeace6b3cb2
BLAKE2b-256 3d78acde8ecf2dbab9081ff88b16ce7b3968b39439b4c870342d8c404340a5c6

See more details on using hashes here.

File details

Details for the file pypdfium2-2.2.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.2.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 48d8fee58f7d75bb38527ed5be7a6f8a49781e76cec17525ca8b924760262584
MD5 6643def593c30eda8de843dd1951366f
BLAKE2b-256 f9e4ef401a3b6d7f588463fea009aacec50781699212e344b7da1bb0321ec927

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page