Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. In addition, the Python dependencies setuptools, setuptools-scm wheel, build, and ctypesgen are needed. Also make sure that your pip version is up-to-date. For more information, please refer to dependencies.md.

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can try the following:

make build

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument:

doc = pdfium.PdfDocument(filename)
# ... use methods provided by the helper class
pdf = doc.raw
# ... work with the actual PDFium document handle
doc.close()

Open a PDF using the context manager PdfContext:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the pdf

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page_topil(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = (255, 255, 255, 255),
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf_topil(filename):
    image.save('out_%s.png' % suffix)
    image.close()

Read the table of contents:

doc = pdfium.PdfDocument(filepath)
for item in doc.get_toc():
    print(
        '    ' * item.level +
        "{} -> {}  # {} {}".format(
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )
doc.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code and the examples directory.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDF documents.
  • The Extract-URLs project extracts URLs from PDFs using pypdfium2.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make packaging. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artefacts.

Testing

Run make test.

Publishing

The release process is automated using a CI workflow that pushes to GitHub, TestPyPI and PyPI. To do a release, first run make packaging locally to check that everything works as expected. If all went well, upload changes to the version file and push a new tag to trigger the Release woirkflow. Always make sure the information in src/pypdfium2/_version.py matches with the tag!

git tag -a A.B.C
git push --tags

Once a new version is released, update the stable branch to point at the commit of the latest tag.

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker. For discussion and general questions, also consider joining the PDFium mailing list.

Issues related to pre-compiled packages should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Thanks to

Fun facts

If you are on Linux, have a recent version of LibreOffice installed, and insist on saving as much disk space as anyhow possible, you can remove the PDFium binary shipped with pypdfium2 and create a symbolic link to the one provided by LibreOffice. This is not recommended, but the following proof-of-concept steps demonstrate that it is possible. (If using this strategy, it is likely that certain newer methods such as FPDF_ImportNPagesToOne() will not be available yet, since the PDFium build of LibreOffice may be a bit older.)

# Find out where the pypdfium2 installation is located
python3 -m pip show pypdfium2 |grep Location

# Now go to the path you happen to determine
# If pypdfium2 was installed locally (without root privileges), the path will look somewhat like this
cd ~/.local/lib/python3.8/site-packages/

# Descend into the pypdfium2 directory
cd pypdfium2/

# Delete the current PDFium binary
rm pdfium

# Create a symbolic link to the PDFium binary of LibreOffice
# The path might differ depending on the distribution - this is what applies for Ubuntu 20.04
ln -s /usr/lib/libreoffice/program/libpdfiumlo.so pdfium

Sadly, mainstream Linux distributors did not create an own package for PDFium, which causes it to be installed separately with every single program that uses it.

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

This version

1.8.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-1.8.0.tar.gz (622.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-1.8.0-py3-none-win_arm64.whl (2.5 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-1.8.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-1.8.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-1.8.0-py3-none-musllinux_1_2_x86_64.whl (2.8 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

pypdfium2-1.8.0-py3-none-musllinux_1_2_i686.whl (2.9 MB view details)

Uploaded Python 3musllinux: musl 1.2+ i686

pypdfium2-1.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-1.8.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-1.8.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-1.8.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-1.8.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-1.8.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-1.8.0.tar.gz.

File metadata

  • Download URL: pypdfium2-1.8.0.tar.gz
  • Upload date:
  • Size: 622.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.8.0.tar.gz
Algorithm Hash digest
SHA256 8d6b81d259bc8da6481c76d6c27510e9919eaef75a78544855177b8305d16d49
MD5 a3b97bd361b0be2692efa0675b41c4c3
BLAKE2b-256 d00ee1168eefa7c5864c6b282f723667d550438e6f7d8591e2bebd74a7b30af8

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-1.8.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.8.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 08abbb5c9aea81db418afe15392c2ac8073d848d8de9317ab67920f22dfc6fab
MD5 9e5e07acbaea040035314b76dc187a88
BLAKE2b-256 f8cb5a59a27af3ab65e0630450c660776b97900b7cfb7b65cb83eadb9419a38c

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-1.8.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.8.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 15210ef06d477e7ce822327e829646b7e53123436703244f7f0e1325b83a6b6c
MD5 96c541208207e9720288c70d2ff06177
BLAKE2b-256 dfa447e03b3fe70ce5795bf377299ca583abe744807975c6e359c404ca62b6cb

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-1.8.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.8.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 a7f25707164f7118186a2c4e34c4c91c1b96930978f4aad6f90904e8db7798ce
MD5 1b57ce385614e2c662fc6ff21d966618
BLAKE2b-256 a403c4ee57ad81a19d3b3b61fea7c1078a4a0b1374de13b5caba487d648a5361

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.8.0-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 9b1f461ba095179b3a66f5267638bd69eaca374a07e69923b6e3b07cd1fe53c7
MD5 3331ca2eb057313d234a8e9d30f1f5ba
BLAKE2b-256 2a5e51754914cb781173545627e5e2be5f18da95c25a7799e92d4d83934f51ec

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-1.8.0-py3-none-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 ffb2f223762066fdad805e4027bfbc4bb67578fcb3d88756adbb411b1df69ecd
MD5 9a9b91fc072b24811e88d933f756f133
BLAKE2b-256 66072954e71a1eabf19d3713a83d70021f67c2a5ce77d2b081b2f016a0c6f743

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.8.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1b39451f05a3eeaf2307a4e9655b80e9d8dca170f16fdf8fc58cf0c6708df423
MD5 a41cdc2fd8a4c5523846518423a42f3a
BLAKE2b-256 fbdc1af9d5fc09c4f63129cdf10beca5f478993f08f707eeed9c6fbce91fb804

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-1.8.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 4009d1ce1185ae4c2bb5d348530776bc47b2f561a42d0ad4790ffbf67429df3b
MD5 a6b5de19dab2743879d87ac704467f6a
BLAKE2b-256 268d27d23134c9ceb93f26370c87eb8b794e23ddb866f54f8e785a7326042e7a

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-1.8.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 5e6a617ee5d8406284aa7fbcc0d9ece755a7bf81195a9ed7e5d4de90a62fd496
MD5 5e210cbb00d573f996d8caf3743d1ef3
BLAKE2b-256 3a80d07cc2a066a85d9bb1ec347306671b9a4811ca1315927a52415f7238f68f

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.8.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 47b314b89fbe8b6b836007e63c0590b4922f15f654e6151156e4529c54ca5cec
MD5 5a62f9f34c95734e68314d34d8884791
BLAKE2b-256 e74668074fc0bb2656dfcd8f582e1129d89efa605c49c7bb35ffcea40f05376d

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.8.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 0edbb34febbb811d576ee253382e6b297857e01da0e88b629892904f968769eb
MD5 4e656b3a81a2046b3bad5a2097f82c3b
BLAKE2b-256 0589b4333c50cdb02904b1d4e66a512a224ae8ae9e9cba6b89046e47cf88bd8a

See more details on using hashes here.

File details

Details for the file pypdfium2-1.8.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.8.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 2b49749560dca63bd8a5bb90559ef0951fb5446aad4718a3fdb3fa7f3d78cf28
MD5 478fb6ef4667fe5f67d21f7b1b6819e9
BLAKE2b-256 38376a65e52a27cd43208881c8b0a80454dad426c81e4e21b8ca18ef08dd1841

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page