Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. In addition, the Python dependencies setuptools, setuptools-scm wheel, build, and ctypesgen are needed. Also make sure that your pip version is up-to-date. For more information, please refer to dependencies.md.

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can do the following:

make build

In case building failed, you could try

python3 platform_setup/build_pdfium.py --nativebuild --check-deps
PYP_TARGET_PLATFORM="sourcebuild" python3 -m pip install . -v --no-build-isolation

to prefer the use of system-provided build tools over the toolchain PDFium ships with. The problem is that the toolchain is limited to a curated set of platforms, as PDFium target cross-compilation for "non-standard" architectures. (Make sure you installed all packages from the Nativebuild Extras section of dependencies.md, in addition to the default requirements.)

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument:

doc = pdfium.PdfDocument(filename)
# ... use methods provided by the helper class
pdf = doc.raw
# ... work with the actual PDFium document handle
doc.close()

Open a PDF using the context manager PdfContext:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the pdf

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page_topil(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = (255, 255, 255, 255),
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf_topil(filename):
    image.save('out_%s.png' % suffix)
    image.close()

Read the table of contents:

doc = pdfium.PdfDocument(filepath)
for item in doc.get_toc():
    print(
        '    ' * item.level +
        "{} -> {}  # {} {}".format(
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )
doc.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code and the examples directory.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDF documents.
  • The Extract-URLs project extracts URLs from PDFs using pypdfium2.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make packaging. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artifacts.

Testing

Run make test.

Publishing

Starting from version 1.3.0, the release process will be automated using a CI workflow that pushes to GitHub, TestPyPI and PyPI. To do a release, first run make packaging locally to check that everything works as expected. Then add, commit and push possible changes to the version file. Finally, add and push a tag to trigger the Release workflow, and monitor its process using the GitHub Actions panel:

git tag -a A.B.C
git push --tags

Always make sure the information in src/pypdfium2/_version.py matches with the tag!

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker. For discussion and general questions, also consider joining the PDFium mailing list.

Issues related to pre-compiled packages should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Fun facts

If you are on Linux, have a recent version of LibreOffice installed, and insist on saving as much disk space as anyhow possible, you can remove the PDFium binary shipped with pypdfium2 and create a symbolic link to the one provided by LibreOffice. This is not recommended, but the following proof-of-concept steps demonstrate that it is possible. (If using this strategy, it is likely that certain newer methods such as FPDF_ImportNPagesToOne() will not be available yet, since the PDFium build of LibreOffice may be a bit older.)

# Find out where the pypdfium2 installation is located
python3 -m pip show pypdfium2 |grep Location

# Now go to the path you happen to determine
# If pypdfium2 was installed locally (without root privileges), the path will look somewhat like this
cd ~/.local/lib/python3.8/site-packages/

# Descend into the pypdfium2 directory
cd pypdfium2/

# Delete the current PDFium binary
rm pdfium

# Create a symbolic link to the PDFium binary of LibreOffice
# The path might differ depending on the distribution - this is what applies for Ubuntu 20.04
ln -s /usr/lib/libreoffice/program/libpdfiumlo.so pdfium

Sadly, mainstream Linux distributors did not create an own package for PDFium, which causes it to be installed separately with every single program that uses it.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-1.6.0.tar.gz (356.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-1.6.0-py3-none-win_arm64.whl (2.4 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-1.6.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-1.6.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-1.6.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-1.6.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-1.6.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-1.6.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-1.6.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-1.6.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-1.6.0.tar.gz.

File metadata

  • Download URL: pypdfium2-1.6.0.tar.gz
  • Upload date:
  • Size: 356.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.6.0.tar.gz
Algorithm Hash digest
SHA256 8a45013788c5f0097bd87b900404e861bddc5bd0013529dba7fc594e943ea44c
MD5 6f52cc06d6b6ed6975ba5c3690d01e0e
BLAKE2b-256 25e3702630cb6a140397c11cfb2827c4b4c87d1d0eecf8b6dea726cb54b5ddc6

See more details on using hashes here.

File details

Details for the file pypdfium2-1.6.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-1.6.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.6.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 1a90030f158e856032f85a07e9a592bee498def21eda2f448c749867e839fa63
MD5 5ed7a1b1bfd45fc3cb5b34ca2f083082
BLAKE2b-256 483baee44415fd53eea461682386018bffff2563c3fecc304b3471605c0948e8

See more details on using hashes here.

File details

Details for the file pypdfium2-1.6.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-1.6.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.6.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 33749fa616ef54db2efe7e7e0ac49878338d25c554939a44662798cd0c1a3606
MD5 d4fff01071f2623e8d549e120a9d8c10
BLAKE2b-256 43aff8c19b745a6471c42e751dd7d95fb4e6cafb9eba8217f78bdcee022b042d

See more details on using hashes here.

File details

Details for the file pypdfium2-1.6.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-1.6.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pypdfium2-1.6.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 a2ff20c9f38394672a6e62e2bd271380d07d30195054515913049419861d6689
MD5 77f9fd6a2fdecfba8d529e5894a916ad
BLAKE2b-256 1d80211bf55079b27b2161e6fef9cd21df2d808359500d8ddd6268304758c7f4

See more details on using hashes here.

File details

Details for the file pypdfium2-1.6.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.6.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3d4410d21f30b56aa4c865e1f6698b909b865e6531af31ba4ffabced01f74e62
MD5 e3b34877317f0425425120b8c4f91cda
BLAKE2b-256 9044116718e65b272988517a9e5a12dfc0bd36bdd84209e864b30a6733470606

See more details on using hashes here.

File details

Details for the file pypdfium2-1.6.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for pypdfium2-1.6.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 6c6d28e5dfc0631d3679bd31cab0f213a168235e6f2bb51279f97a43366664a1
MD5 a951ff1459759a735f21b756aadbd277
BLAKE2b-256 153fcf26fc458f4d8c16b0db55778f1d68daf87bc2846055507e846ac2442acd

See more details on using hashes here.

File details

Details for the file pypdfium2-1.6.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-1.6.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 b17e77fc7bae07dd6c1701c562fda6b5e18b8207ee1954afd08cf3b845ee32a2
MD5 43a7dcb4aa4417b3a75039c6c3d9e6f7
BLAKE2b-256 7e26c8308443fb60e5a0a522cfcfa718cc1b4ff679928d86a4469c03d911e1b8

See more details on using hashes here.

File details

Details for the file pypdfium2-1.6.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.6.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8f5ac5d5e767cfdbfcb4497f5386907857b1f44896c58fe6a7ef0e74f6cd5948
MD5 32028045e298f1961b3f182f826abb6a
BLAKE2b-256 79bcc4d10a06f89108c36cf9ef596c8634cdb75f6dd72d4085ac7468dfaa1577

See more details on using hashes here.

File details

Details for the file pypdfium2-1.6.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.6.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 bb0f4935bba8ac861d6398fdf79279ddfaf83b257fcee5513dd72c8a2a46a98d
MD5 7094398d23d22ddbc302670b990c145d
BLAKE2b-256 71b71a179760a9e192e5292da9b293f1f84d449e7ab4456423e113140b08feda

See more details on using hashes here.

File details

Details for the file pypdfium2-1.6.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.6.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 c628b558e058ec652fa3d3bf58783f62821f1b1a07ac74fc9fca2b849f18c3f6
MD5 c8b5b775dcec70ff4507afdddac8dd01
BLAKE2b-256 c15840b6f3a6b5bfbc99594758b89d82dffb93b0696ce05d01bcb083e6693109

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page