Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. In addition, the Python dependencies setuptools, setuptools-scm wheel, build, and ctypesgen are needed. Also make sure that your pip version is up-to-date. For more information, please refer to dependencies.md.

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can try the following:

make build

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument:

doc = pdfium.PdfDocument(filename)
# ... use methods provided by the helper class
pdf = doc.raw
# ... work with the actual PDFium document handle
doc.close()

Open a PDF using the context manager PdfContext:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the pdf

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page_topil(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = (255, 255, 255, 255),
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf_topil(filename):
    image.save('out_%s.png' % suffix)
    image.close()

Read the table of contents:

doc = pdfium.PdfDocument(filepath)
for item in doc.get_toc():
    print(
        '    ' * item.level +
        "{} -> {}  # {} {}".format(
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )
doc.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code and the examples directory.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDF documents.
  • The Extract-URLs project extracts URLs from PDFs using pypdfium2.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make packaging. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artefacts.

Testing

Run make test.

Publishing

The release process is automated using a CI workflow that pushes to GitHub, TestPyPI and PyPI. To do a release, first run make packaging locally to check that everything works as expected. If all went well, upload changes to the version file and push a new tag to trigger the Release woirkflow. Always make sure the information in src/pypdfium2/_version.py matches with the tag!

git tag -a A.B.C
git push --tags

Once a new version is released, update the stable branch to point at the commit of the latest tag.

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker. For discussion and general questions, also consider joining the PDFium mailing list.

Issues related to pre-compiled packages should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Thanks to

Fun facts

If you are on Linux, have a recent version of LibreOffice installed, and insist on saving as much disk space as anyhow possible, you can remove the PDFium binary shipped with pypdfium2 and create a symbolic link to the one provided by LibreOffice. This is not recommended, but the following proof-of-concept steps demonstrate that it is possible. (If using this strategy, it is likely that certain newer methods such as FPDF_ImportNPagesToOne() will not be available yet, since the PDFium build of LibreOffice may be a bit older.)

# Find out where the pypdfium2 installation is located
python3 -m pip show pypdfium2 |grep Location

# Now go to the path you happen to determine
# If pypdfium2 was installed locally (without root privileges), the path will look somewhat like this
cd ~/.local/lib/python3.8/site-packages/

# Descend into the pypdfium2 directory
cd pypdfium2/

# Delete the current PDFium binary
rm pdfium

# Create a symbolic link to the PDFium binary of LibreOffice
# The path might differ depending on the distribution - this is what applies for Ubuntu 20.04
ln -s /usr/lib/libreoffice/program/libpdfiumlo.so pdfium

Sadly, mainstream Linux distributors did not create an own package for PDFium, which causes it to be installed separately with every single program that uses it.

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-1.10.0.tar.gz (620.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-1.10.0-py3-none-win_arm64.whl (2.5 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-1.10.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-1.10.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-1.10.0-py3-none-musllinux_1_2_x86_64.whl (2.9 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

pypdfium2-1.10.0-py3-none-musllinux_1_2_i686.whl (2.9 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

pypdfium2-1.10.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-1.10.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-1.10.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-1.10.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-1.10.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-1.10.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-1.10.0.tar.gz.

File metadata

  • Download URL: pypdfium2-1.10.0.tar.gz
  • Upload date:
  • Size: 620.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.10.0.tar.gz
Algorithm Hash digest
SHA256 7f888812f7453c0e6105d035ccc426feaccc2b42616ac8a822ec71b14b5eb033
MD5 0d7653a7b0baf1562287d4dbb5d3eaf9
BLAKE2b-256 9a653a3e7d016a28e86b5a339fb4f371bc3fe7d70d1ef3f086560311a8cbe3bb

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-1.10.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.10.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 01613ba43e52bd229a689186459eb812b67ceb3894287b5fba9dc20040d93831
MD5 01d65f3b7fc57a6688b81d5f8b45ccb4
BLAKE2b-256 ef6c2b645908a4c4d98d988490419b196fb916e8a20c7e6203e48fc2eb1ccca7

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-1.10.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.10.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 d9a54adfb63b5b0f90ba913fca2cfb9037c35082e92b7fa599285d7395c48963
MD5 726b3cc2a6794466e5adfac670af4ab2
BLAKE2b-256 cce8422e057f320ac329420516eefac0c23a53998dcb84015085c09ed4e24464

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-1.10.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.10.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 1e781b54a71b0a0d3c932e4f44a4b91d2dfc39ed103c78100db0a3328a3a063a
MD5 107f52478438d1cb90e46b1a9a94c80e
BLAKE2b-256 4f02d1b67c46cacb2d770b546d85c4336fa24bc9dc8413b2b7abfe9eed9d512d

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.10.0-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 446d9196a9475c1e62d58913e2595d27e4e64c83ce2c600f9a344f073346db15
MD5 f78b58512464232bdb5ab5b3f7424222
BLAKE2b-256 70e39fb0639948a093728a39d58058a3bc210848831aca795e6e8c8b934f5479

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-1.10.0-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 1657e37a08582e27074c1a601950a81506ae1cf4652bf68e5b341769f368fe96
MD5 cad5e39436ee31248391f42a5bb1f32d
BLAKE2b-256 46538cbeb5b799f6d24be6bfbb33cfbceecd89289b59fb03f2f33c8aba4f89e5

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.10.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 23344a32245009df8cc6606762fa740b42897869e03bc275b97f2b12131ecd78
MD5 12e3383a3974edc1aa05f83642d52303
BLAKE2b-256 6d6be66d669f46dd46143f9a2b8b92e12dccdc1a6255825212e154b95b589ce0

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-1.10.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 32f13b25327307e781d1240fcbca790da9f257f1ce8e63430d588499a34ddc82
MD5 81f9460ef0fba04d8855a81e57e22b18
BLAKE2b-256 1cf0b7ea69d8909d81e63cc1f761f70e88027087cb9e3ed495c661cdb5ab3d70

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-1.10.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 4ccbf9df0a91189390f19efe413323c219777c10bf95f7079859cc571048b148
MD5 347a5d7d8174d34872ba0b85c35158f6
BLAKE2b-256 5dab8650cccb8ea1553130bb7fce2fa294444a6a84a5647eb2ce6d50c42f3e6c

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.10.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 fd839cb3f6b162df108abe56f7668b610868747ee26539b85dc83f7e0cc1e785
MD5 063176f5dec630d126936112dc8c56a4
BLAKE2b-256 56f90751de43ed279aef2b7e3f39df5ecedcc656812df56b1e7283f9a31a44a0

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.10.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 8e1fcd905be6525c31e65af2436c0d7cb45e7590bd1f5f139877398dd85f6665
MD5 227978c9ed1111c1aab225ba83170ece
BLAKE2b-256 790671c0fa20f0ea34af7ce100796613da340b50e268addbb41efc387a7510bc

See more details on using hashes here.

File details

Details for the file pypdfium2-1.10.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.10.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 a23b749ac73da94de52f6ed4c953ec5f693548c91190cb29267b90b8b46b9850
MD5 29288ec5779da44905f6aad904c33825
BLAKE2b-256 868e7b2a7d32679a08f941b3e1d7334baca2bb4e0c0c54d691de9e630fc58d99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page