Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. In addition, the Python dependencies setuptools, setuptools-scm wheel, build, and ctypesgen are needed. Also make sure that your pip version is up-to-date. For more information, please refer to dependencies.md.

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can do the following:

make build

In case building failed, you could try

python3 setupsrc/pl_setup/build_pdfium.py --nativebuild --check-deps
PYP_TARGET_PLATFORM="sourcebuild" python3 -m pip install . -v --no-build-isolation

to prefer the use of system-provided build tools over the toolchain PDFium ships with. The problem is that the toolchain is limited to a curated set of platforms, as PDFium target cross-compilation for "non-standard" architectures. (Make sure you installed all packages from the Nativebuild Extras section of dependencies.md, in addition to the default requirements.)

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument:

doc = pdfium.PdfDocument(filename)
# ... use methods provided by the helper class
pdf = doc.raw
# ... work with the actual PDFium document handle
doc.close()

Open a PDF using the context manager PdfContext:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the pdf

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page_topil(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = (255, 255, 255, 255),
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf_topil(filename):
    image.save('out_%s.png' % suffix)
    image.close()

Read the table of contents:

doc = pdfium.PdfDocument(filepath)
for item in doc.get_toc():
    print(
        '    ' * item.level +
        "{} -> {}  # {} {}".format(
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )
doc.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code and the examples directory.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDF documents.
  • The Extract-URLs project extracts URLs from PDFs using pypdfium2.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make packaging. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artifacts.

Testing

Run make test.

Publishing

Starting from version 1.3.0, the release process will be automated using a CI workflow that pushes to GitHub, TestPyPI and PyPI. To do a release, first run make packaging locally to check that everything works as expected. Then add, commit and push possible changes to the version file. Finally, add and push a tag to trigger the Release workflow, and monitor its process using the GitHub Actions panel:

git tag -a A.B.C
git push --tags

Always make sure the information in src/pypdfium2/_version.py matches with the tag!

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker. For discussion and general questions, also consider joining the PDFium mailing list.

Issues related to pre-compiled packages should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Fun facts

If you are on Linux, have a recent version of LibreOffice installed, and insist on saving as much disk space as anyhow possible, you can remove the PDFium binary shipped with pypdfium2 and create a symbolic link to the one provided by LibreOffice. This is not recommended, but the following proof-of-concept steps demonstrate that it is possible. (If using this strategy, it is likely that certain newer methods such as FPDF_ImportNPagesToOne() will not be available yet, since the PDFium build of LibreOffice may be a bit older.)

# Find out where the pypdfium2 installation is located
python3 -m pip show pypdfium2 |grep Location

# Now go to the path you happen to determine
# If pypdfium2 was installed locally (without root privileges), the path will look somewhat like this
cd ~/.local/lib/python3.8/site-packages/

# Descend into the pypdfium2 directory
cd pypdfium2/

# Delete the current PDFium binary
rm pdfium

# Create a symbolic link to the PDFium binary of LibreOffice
# The path might differ depending on the distribution - this is what applies for Ubuntu 20.04
ln -s /usr/lib/libreoffice/program/libpdfiumlo.so pdfium

Sadly, mainstream Linux distributors did not create an own package for PDFium, which causes it to be installed separately with every single program that uses it.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

This version

1.7.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-1.7.0.tar.gz (357.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-1.7.0-py3-none-win_arm64.whl (2.5 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-1.7.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-1.7.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-1.7.0-py3-none-musllinux_1_2_x86_64.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

pypdfium2-1.7.0-py3-none-musllinux_1_2_i686.whl (2.9 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

pypdfium2-1.7.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-1.7.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-1.7.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-1.7.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-1.7.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-1.7.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-1.7.0.tar.gz.

File metadata

  • Download URL: pypdfium2-1.7.0.tar.gz
  • Upload date:
  • Size: 357.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.7.0.tar.gz
Algorithm Hash digest
SHA256 9b95c2e18d07c1bfc9e1137a0ebbbb314fb25fab4342cd97f14f7fde61e5f5f9
MD5 572faa4ee8b0cad2cc4ef3fd346a3641
BLAKE2b-256 88e8c964515cc147dd8f643c66cac1b064c5d13b5540a31d43c72ba2a792dc9c

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-1.7.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.7.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 81ea6af34281bb95a7718579c6ab895629a9a61b374fa6a8826038549f2d46da
MD5 901016760e317472abfe3dfd3d2c97fa
BLAKE2b-256 03699666d11d03203a2c085aa31f2abb4f0bddf02fc9616053ce35f7a2c1e3b5

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-1.7.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.7.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 135bf082a8120355ad7def9db708d9b28fc7014297a93f1bf85d1b2f496163e9
MD5 9837c2822e4a90207dc618f63647842b
BLAKE2b-256 b154facbbe817c560529e406f030d40082b4a7002300dd8331d4810dec6fe017

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-1.7.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.7.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 2548a873e7c0d5c545063ad834b063fa127fd77d7a91ad90000e687b81fc2491
MD5 1ed837633c3ff929062d27523744e881
BLAKE2b-256 02f8f67586c59d017ba88d6a93fd298bf2a760aa3352b476a71a4626a6110065

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.7.0-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 6309345911bd73f1f0779340acc8ffdb81f14baa01d14693391529d4c8339a9a
MD5 58e1b6a599fcfd0a0cbb56f61f1842ef
BLAKE2b-256 519c73dc6867595e2ca2201f187eaee9aede9a78c8f7d42de83c6a7a5446c11f

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-1.7.0-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 ede84409e2c04ffb15a6bda6e5cfd5982776cb2e4a650487a5b8a851e670351d
MD5 3084ab38297563793d96ed311854c5c0
BLAKE2b-256 59b977b720d8fdf1ae117073150949b52a3b8d08e9dc924cc24c335547164b23

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.7.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 db42d94c8b341fd36713dc249b97559080a5d35893a56f0e487365395cd8c066
MD5 93eaa8498c2405d6caec77fc66a0a07f
BLAKE2b-256 8a488716d1f1a320e91192dda734355b6c03998cf64276d83b695294e8eaeadf

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-1.7.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 9900f25aea0d709397ec721064dc9c768ae2053ca0c098ca6d9b76888af4aab7
MD5 5fd52ef671769f5a2648d271994efe42
BLAKE2b-256 d64e3c08fa55911a54839a9cadad52f563af876bd8bd12e87b37814d6906c8d4

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-1.7.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 93cddec4b17354e581db976351024e7c31ac39d2e6a9883051c30d1725d7201f
MD5 08a36d22dbd93117b13eb7eae94ffb78
BLAKE2b-256 d9c78b5721abad246a02efa12502f3940c026d812b23df327b76c784f3757ebd

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.7.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2565f943777992c9ea22df58d41108e93da7d9137a9b19543590f532355e165f
MD5 a0a052bc24076062b2fdbb573fc15af7
BLAKE2b-256 8eddf7f233a8de6888c67222f9c397bcbb4d6eaf57a95481df8ecdab1dd200a7

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.7.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 dd953f4a81d60b51d890f77471ed7e6689428d546eae1a7c810905443550361e
MD5 90d26d3514bb220e5d2a7d7282771d68
BLAKE2b-256 f8356509b1513842485a761d363bb4957a5c9432dfa3cd2bb9744afdedae7daa

See more details on using hashes here.

File details

Details for the file pypdfium2-1.7.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.7.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 123ef86ef137a90a968eadd1928e9319b6e8a9d87278072549d478e7997e9ec0
MD5 c10f6f02bfe0f280975475ef28f3c160
BLAKE2b-256 53337a6ac93ac81baf9ee01bcb665667e1448e1bfde9e7bb9b550a91bdebd325

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page