Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. For Python setup and runtime dependencies, please refer to setup.cfg. It is recommended to install ctypesgen from the latest sources (git master).

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can try the following:

make build

Depending on the operating system, additional dependencies may need to be installed beforehand.

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument (supports file paths, bytes, and byte buffers):

pdf = pdfium.PdfDocument(filepath)
print(pdf)
# Work with the helper class
print(pdf.raw)
# Work with the raw PDFium object handle
pdf.close()

Render a single page:

pdf = pdfium.PdfDocument(filepath)
page = pdf.get_page(0)

pil_image = page.render_topil(
    scale = 1,
    rotation = 0,
    crop = (0, 0, 0, 0),
    colour = (255, 255, 255, 255),
    annotations = True,
    greyscale = False,
    optimise_mode = pdfium.OptimiseMode.NONE,
)
pil_image.save("out.png")

[g.close() for g in (pil_image, page, pdf)]

Render multiple pages concurrently:

pdf = pdfium.PdfDocument(filepath)

n_pages = len(pdf)
page_indices = [i for i in range(n_pages)]
renderer = pdf.render_topil(
    page_indices = page_indices,
)

for image, index in zip(renderer, page_indices):
    image.save('out_%s.jpg' % str(index).zfill(n_pages))
    image.close()

pdf.close()

Read the table of contents:

pdf = pdfium.PdfDocument(filepath)

for item in pdf.get_toc():
    print(
        '    ' * item.level +
        '[{}] '.format('-' if item.is_closed else '+') +
        '{} -> {}  # {} {}'.format(
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )

pdf.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
import os.path
from PIL import Image
import pypdfium2 as pdfium

filepath = os.path.abspath("tests/resources/render.pdf")

doc = pdfium.FPDF_LoadDocument(filepath, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDFs.
  • Extract-URLs use pypdfium2 to extract URLs from PDF documents.
  • py-pdf/benchmarks compares pypdfium2's text extraction capabilities with other libraries.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make packaging. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artefacts.

Testing

Run make test.

Publishing

The release process is automated using a CI workflow that pushes to GitHub, TestPyPI and PyPI. To do a release, first run make packaging locally to check that everything works as expected. If all went well, upload changes to the version file and push a new tag to trigger the Release woirkflow. Always make sure the information in src/pypdfium2/version.py matches with the tag!

git tag -a A.B.C
git push --tags

Once a new version is released, update the stable branch to point at the commit of the latest tag.

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker. For discussion and general questions, also consider joining the PDFium mailing list.

Issues related to pre-compiled packages should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Thanks to

Fun facts

If you are on Linux, have a recent version of LibreOffice installed, and insist on saving as much disk space as anyhow possible, you can remove the PDFium binary shipped with pypdfium2 and create a symbolic link to the one provided by LibreOffice. This is not recommended, but the following proof-of-concept steps demonstrate that it is possible. (If using this strategy, it is likely that certain newer methods such as FPDF_ImportNPagesToOne() will not be available yet, since the PDFium build of LibreOffice may be a bit older.)

# Find out where the pypdfium2 installation is located
python3 -m pip show pypdfium2 |grep Location

# Now go to the path you happen to determine
# If pypdfium2 was installed locally (without root privileges), the path will look somewhat like this
cd ~/.local/lib/python3.8/site-packages/

# Descend into the pypdfium2 directory
cd pypdfium2/

# Delete the current PDFium binary
rm pdfium

# Create a symbolic link to the PDFium binary of LibreOffice
# The path might differ depending on the distribution - this is what applies for Ubuntu 20.04
ln -s /usr/lib/libreoffice/program/libpdfiumlo.so pdfium

Sadly, mainstream Linux distributors did not create an own package for PDFium, which causes it to be installed separately with every single program that uses it.

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-2.0.0b1.tar.gz (627.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-2.0.0b1-py3-none-win_arm64.whl (2.5 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-2.0.0b1-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-2.0.0b1-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-2.0.0b1-py3-none-musllinux_1_2_x86_64.whl (2.7 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

pypdfium2-2.0.0b1-py3-none-musllinux_1_2_i686.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

pypdfium2-2.0.0b1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-2.0.0b1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-2.0.0b1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-2.0.0b1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-2.0.0b1-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-2.0.0b1-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-2.0.0b1.tar.gz.

File metadata

  • Download URL: pypdfium2-2.0.0b1.tar.gz
  • Upload date:
  • Size: 627.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.0.0b1.tar.gz
Algorithm Hash digest
SHA256 2d28e05425d007ad7b171d01b9bc1dd37b183b8ec0b5e6b90f79827350d22a07
MD5 f3006e8bac31b08c203e0abf7c49e858
BLAKE2b-256 328f7ba2f903f86b086e444383c71515b1f26fbbaccd1f4f40bb049c4aa45028

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-2.0.0b1-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 18e429159f5f4c23c2666935654dbdaafb19ec56a8abe053d37a76e9838d5e29
MD5 adf879de22f8dd62e8be833e5252f8ad
BLAKE2b-256 c2b734e78976c1c730327cc845dccb0c1eddcf51113108ac07f9a93eb53d9ab4

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-2.0.0b1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 4fc928f4006368d2ead53ec7c76b87dcbd14d836c529714b9dadd784b14b165b
MD5 e09fd586e940c47584ebe945cd7f7f8f
BLAKE2b-256 50f41c87cbab7f2a3ebc01e9ba8d1b559b427471f079a9e34839cfb5163f895c

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-2.0.0b1-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-win32.whl
Algorithm Hash digest
SHA256 99cd7b624d09ab9f8f233ab158dd8078fe08fe5e386f86463cae1fbae8869a24
MD5 897d5f684ba4024ffb659855301b49f9
BLAKE2b-256 4cc49a2ce2a29aa3a62feb2406aa2de520db003477cccaefebe2d41df1f6c1dc

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 540c92674e0a142fe2c29811baef8a0ac54159a599119d6dc7fc18b672c92fbd
MD5 fd237800229251e47244c23017447b2c
BLAKE2b-256 a2ac4d2485b6fb5f453bc708d74b425d13689f20b436fbd35d39a87e5b26d58d

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 bb8d87234355251aaeab107883dc40dc18f85c214a24b4d0024a55f2894935b2
MD5 27b6592e0de32941227c28bc7b65a964
BLAKE2b-256 6282a5b13139b2c4ff1f145b03f46e62bdc7262a378c1c0bd2a91d09f3d960a7

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f3cc49408022733b52bef18155e00a3c10b713b7408f464429c38d1de2fb778c
MD5 ef24e7ebd129227fc973bb3d45624f3b
BLAKE2b-256 f62d0ac96b2a90f6a9f1a30cc97dc688634a14e7f6e1b0690da4b34c0664f040

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 6395eab449a45b2b523e670f12a21428e3b2f0d70ec4f78397ef2fb7e4ea0fd8
MD5 d05e74deb8c78ae79894ef019167f1da
BLAKE2b-256 f658374a5114cdeb66ae38b51a0ef8dc850289bf54bb6bd3aef1290a4fcf5899

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 914137659e0c3e29e6d2f60e14f63c8cb1e6bdcfedf5a1557b5e86d28768b99b
MD5 fd1d4e18ea6e6f4529f9d76cc4380414
BLAKE2b-256 4e0ee91d68f2a675319f8224e4c02ff7193d990a784a728352d2b367fdd67b16

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ee07290841be6a1062c4765454df3fa426ae761d72b2e674efbb40e0638aa1f2
MD5 198e38d3a5526c782c3353caa0217ace
BLAKE2b-256 47e16bd0620d363f8f49e07003914e5d0af7b4bc9269a4c36856f6feb89b1add

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 378b5db06feb5d305b452d0910a39758a50623bc69cd2c4605808ad050ded69a
MD5 047c8f6e9e01903f201f88be200e36fc
BLAKE2b-256 0c1fc8b6756332fc4831e8b8c27d73880cbd85589603415b02689a50fdb529e3

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b1-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b1-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 0d5a3b4fa5be3704418b472193e64293d51ae7cc6f141917515b8d9dcf06f41b
MD5 ea26c295c7da4107112a92561f6477ac
BLAKE2b-256 344d58591c2c859f7a1a8e924c28e7b5020021ec636ae70be6b1fa5cada6adbe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page