Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. For Python setup and runtime dependencies, please refer to setup.cfg. It is recommended to install ctypesgen from the latest sources (git master).

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can try the following:

make build

Depending on the operating system, additional dependencies may need to be installed beforehand.

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument (supports file paths, bytes, and byte buffers):

pdf = pdfium.PdfDocument(filepath)
print(pdf)
# Work with the helper class
print(pdf.raw)
# Work with the raw PDFium object handle
pdf.close()

Render a single page:

pdf = pdfium.PdfDocument(filepath)
page = pdf.get_page(0)

pil_image = page.render_topil(
    scale = 1,
    rotation = 0,
    crop = (0, 0, 0, 0),
    colour = (255, 255, 255, 255),
    annotations = True,
    greyscale = False,
    optimise_mode = pdfium.OptimiseMode.NONE,
)
pil_image.save("out.png")

[g.close() for g in (pil_image, page, pdf)]

Render multiple pages concurrently:

pdf = pdfium.PdfDocument(filepath)

n_pages = len(pdf)
page_indices = [i for i in range(n_pages)]
renderer = pdf.render_topil(
    page_indices = page_indices,
)

for image, index in zip(renderer, page_indices):
    image.save('out_%s.jpg' % str(index).zfill(n_pages))
    image.close()

pdf.close()

Read the table of contents:

pdf = pdfium.PdfDocument(filepath)

for item in pdf.get_toc():
    print(
        '    ' * item.level +
        '[{}] '.format('-' if item.is_closed else '+') +
        '{} -> {}  # {} {}'.format(
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )

pdf.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
import os.path
from PIL import Image
import pypdfium2 as pdfium

filepath = os.path.abspath("tests/resources/render.pdf")

doc = pdfium.FPDF_LoadDocument(filepath, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDFs.
  • Extract-URLs use pypdfium2 to extract URLs from PDF documents.
  • py-pdf/benchmarks compares pypdfium2's text extraction capabilities with other libraries.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make packaging. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artefacts.

Testing

Run make test.

Publishing

The release process is automated using a CI workflow that pushes to GitHub, TestPyPI and PyPI. To do a release, first run make packaging locally to check that everything works as expected. If all went well, upload changes to the version file and push a new tag to trigger the Release woirkflow. Always make sure the information in src/pypdfium2/version.py matches with the tag!

git tag -a A.B.C
git push --tags

Once a new version is released, update the stable branch to point at the commit of the latest tag.

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker. For discussion and general questions, also consider joining the PDFium mailing list.

Issues related to pre-compiled packages should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Thanks to

Fun facts

If you are on Linux, have a recent version of LibreOffice installed, and insist on saving as much disk space as anyhow possible, you can remove the PDFium binary shipped with pypdfium2 and create a symbolic link to the one provided by LibreOffice. This is not recommended, but the following proof-of-concept steps demonstrate that it is possible. (If using this strategy, it is likely that certain newer methods such as FPDF_ImportNPagesToOne() will not be available yet, since the PDFium build of LibreOffice may be a bit older.)

# Find out where the pypdfium2 installation is located
python3 -m pip show pypdfium2 |grep Location

# Now go to the path you happen to determine
# If pypdfium2 was installed locally (without root privileges), the path will look somewhat like this
cd ~/.local/lib/python3.8/site-packages/

# Descend into the pypdfium2 directory
cd pypdfium2/

# Delete the current PDFium binary
rm pdfium

# Create a symbolic link to the PDFium binary of LibreOffice
# The path might differ depending on the distribution - this is what applies for Ubuntu 20.04
ln -s /usr/lib/libreoffice/program/libpdfiumlo.so pdfium

Sadly, mainstream Linux distributors did not create an own package for PDFium, which causes it to be installed separately with every single program that uses it.

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-2.0.0b2.tar.gz (627.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-2.0.0b2-py3-none-win_arm64.whl (2.5 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-2.0.0b2-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-2.0.0b2-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-2.0.0b2-py3-none-musllinux_1_2_x86_64.whl (2.7 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

pypdfium2-2.0.0b2-py3-none-musllinux_1_2_i686.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

pypdfium2-2.0.0b2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-2.0.0b2-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-2.0.0b2-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-2.0.0b2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-2.0.0b2-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-2.0.0b2-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-2.0.0b2.tar.gz.

File metadata

  • Download URL: pypdfium2-2.0.0b2.tar.gz
  • Upload date:
  • Size: 627.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.0.0b2.tar.gz
Algorithm Hash digest
SHA256 78860a1a0d3b132c5b545a922cb8bb6e723687f66ac63099237bd7243443e803
MD5 b34a14933fb31cf1b50ed46f866a5bd2
BLAKE2b-256 b8bd6003d25c45136e2484a00b9da21a9ce8303c2e30b6c67b93f2cab709b3d7

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-2.0.0b2-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 3402fa0ec115dcf6d643125d4b33d8a27f65d7c7dbe7760df5975380380fa2ff
MD5 01ed16f6f69c384faac27f3fe6db5b75
BLAKE2b-256 133d891038c06bc062ccb636baedfd4ec4c59f599dc0a75a29ccaac1dcad7a94

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-2.0.0b2-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 31b8afc3b7576a4a8310e6919a055d2eb400e30640a2a323b4f6f2a65f73e786
MD5 520d68e4bd567ab2f2ded48c7e9c9a80
BLAKE2b-256 07285ad6de786bc834ba88e2bf5aa8b17185d481854129e7d4379ff8c61443df

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-2.0.0b2-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-win32.whl
Algorithm Hash digest
SHA256 d6a979a52315aedc1479a09244252bc80098eaf7fde24dbca350761eccaa40dd
MD5 153bdcdf0c92ff603854ca0c1a02bade
BLAKE2b-256 77bc9d237af7e4a2f054c92f7fbc6a77a53886210e6fa605502ed4003ba176ba

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 c317b9f020b62ab9252f2acb71e93ee90fc186d64ec30b44936319cba3d3b60d
MD5 fd0ff0f0118e753774c11292367d4db4
BLAKE2b-256 a06e0088ce2d418b0ecbf47777b7f25343eb7487c829eacd524264157ebe9194

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 ab9e8a87c708094f942558abe57ed23ba32f7e041b78ebde5c6b4d9e4ed7d18f
MD5 9821350fbba50704b5477516f8fb985b
BLAKE2b-256 cb396d0b95893cc746e5490ffa6bb351b5160fa49dfde7df4253d78f950d22e3

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c40883776ccc02df995ab69c62efce81afafd0ef4b2d894a486bc8688f7e896b
MD5 dde0ce69893341c8bc516d5402feb482
BLAKE2b-256 694ab28654b9bf770af0b3d6a9b9a0cb61e6b9f1cb9610ba4636b98088d3b692

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 feda1150232cc287678e92b43ef420efc00156ee5df70792a727f7668571eebf
MD5 7a98134c588660015cf9a5cd93eb587e
BLAKE2b-256 d6f622d6271167fcf11195f0d84600ac705a291eaebec3577f5f087099681aa9

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 6eaeb2fea0184e972dd20efd3298ef1e04aac965f9e45dd6eecca68477953e53
MD5 4edf0fb426045a42edbb62c25bb15bc9
BLAKE2b-256 2d947f4d675878384fcfbad9c3ddd6b98484a9fd94b2585653805ab27270f8e7

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d6cb8c7784779d4ddd717e3058a02b248ced9d2918ebc523a88eb93fb50d671f
MD5 c382513805caa5e0bb048df02ff86dd2
BLAKE2b-256 8827a0f81720f724d2604b60d540dfab103d93ebd953ad101296142f1287bfc8

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 55f873dc6b6bacb5eb832716c4759e86d9afbd234c9209b2aacd1df5abca2f75
MD5 d8f37984a60a074db15fdffdcff1d89a
BLAKE2b-256 5ba8651ba33eb268d362e691444ad76914cb33edc9ceb979d84aed7ed9e046f8

See more details on using hashes here.

File details

Details for the file pypdfium2-2.0.0b2-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-2.0.0b2-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 b0df4739fcd0421c4360e602a0dde0e9f3ce63f0db7c6339ca897c2f9eb7b39a
MD5 3b634bbe3b7186aad5d0bea6c920cd49
BLAKE2b-256 7885abe2ea75bd98bacd20a7388ca9e2c133de42fbd74a9f6f69bc1013287c29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page