Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. In addition, the Python dependencies setuptools, setuptools-scm wheel, build, and ctypesgen are needed. Also make sure that your pip version is up-to-date. For more information, please refer to dependencies.md.

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can do the following:

make build

In case building failed, you could try

python3 platform_setup/build_pdfium.py --nativebuild --check-deps
PYP_TARGET_PLATFORM="sourcebuild" python3 -m pip install . -v --no-build-isolation

to prefer the use of system-provided build tools over the toolchain PDFium ships with. The problem is that the toolchain is limited to a curated set of platforms, as PDFium target cross-compilation for "non-standard" architectures. (Make sure you installed all packages from the Native Build section of dependencies.md, in addition to the default requirements.)

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument:

doc = pdfium.PdfDocument(filename)
# ... use methods provided by the helper class
pdf = doc.raw
# ... work with the actual PDFium document handle
doc.close()

Open a PDF using the context manager PdfContext:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the pdf

Open a PDF using the function open_pdf_auto():

pdf, loader_data = pdfium.open_pdf_auto(filename)
# ... work with the pdf
pdfium.close_pdf(pdf, loader_data)

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page_topil(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = (255, 255, 255, 255),
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf_topil(filename):
    image.save(f'out_{suffix}.png')
    image.close()

Read the table of contents:

doc = pdfium.PdfDocument(filepath)
for item in doc.get_toc():
    print(
        '    ' * item.level +
        "{} -> {}  # {} {}".format(
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )
doc.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code and the examples directory.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

In Use

  • The doctr OCR library uses pypdfium2 to rasterise PDF documents.
  • The Extract-URLs project extracts URLs from PDFs using pypdfium2.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make release. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artifacts.

Testing

Run make test.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to pre-compiled binaries should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Problems with FPDF_LoadMemDocument()

The FPDF_LoadMemDocument() function to open PDF documents from bytes behaves weirdly. Tests fail and the output files look quite broken if all file opening is directed through this function. It is hard to determine which external component involved is causing these issues. The recommended way for document access are the support models PdfDocument or PdfContext, which use FPDF_LoadCustomDocument() for bytes or byte buffers, and FPDF_LoadDocument() for file paths.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-1.1.0.tar.gz (354.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-1.1.0-py3-none-win_arm64.whl (2.4 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-1.1.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-1.1.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-1.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-1.1.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-1.1.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-1.1.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-1.1.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-1.1.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-1.1.0.tar.gz.

File metadata

  • Download URL: pypdfium2-1.1.0.tar.gz
  • Upload date:
  • Size: 354.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ca4d1e60184a9f98c9404ca85137250a28b4da727aeedfa2c13424b20079c6a2
MD5 89f3cdc74a726613072c3e167431e0f5
BLAKE2b-256 64f902c549d1c0ddb2f4101b2e1946aa3b041ba859bb506b6a4227b3022cc8cd

See more details on using hashes here.

File details

Details for the file pypdfium2-1.1.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-1.1.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.1.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 420b3667cd0645dba961a6e60c7cbab8de7d3c77152cd659043b2a03fbf7a4cd
MD5 60ea1a229cf9dfac4928144a36273ca5
BLAKE2b-256 c2262733c919a36e78a0e0cbdfbfd20e14caf365482857547738c01f16fee861

See more details on using hashes here.

File details

Details for the file pypdfium2-1.1.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-1.1.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.1.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 196100dd345761060c6387b0cfd98c64088f0873a6a8c6eb99f52a778ebce486
MD5 58fd604ec108f84d8cafa612c6e2c86e
BLAKE2b-256 979258fb2ded6d8f6d0cd1473d224defef04571fc45c80045237b7b20f3b1306

See more details on using hashes here.

File details

Details for the file pypdfium2-1.1.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-1.1.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.1.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 0e5f27af5958e6359f785986cdd58cd4c92b65ebfc0f362bd9d3fc74a0ec5788
MD5 2acaa605ea511a44fe22a25821cdc20b
BLAKE2b-256 b42384c22bf44c23661b64bdd35e6a8ef1fb1f9f9254a002e7f3e8cee04a2db1

See more details on using hashes here.

File details

Details for the file pypdfium2-1.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c772a40c8def0495b56dd7b66c68b9da5aeebc2050adcf99cc2d34ea2951f062
MD5 3661d100318688084f420804eeda2764
BLAKE2b-256 22f6cb1fa3ac641944f99b2b94f8648bd80b31d20cb7b8bd657652bdddf57081

See more details on using hashes here.

File details

Details for the file pypdfium2-1.1.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

  • Download URL: pypdfium2-1.1.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, manylinux: glibc 2.17+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.1.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 084f5c567dacc959209668cbf47fd39f31bd208549713a19d49d7b859dbf4e43
MD5 c80412763bd19332843bb9259586c434
BLAKE2b-256 911733446e13ee22fcaebe663640d21dfe4dbfcddc0cdfa4ec8437f251a48046

See more details on using hashes here.

File details

Details for the file pypdfium2-1.1.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-1.1.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 3af94bdb47aa558064b12991bbf09b589ca26cc683f0b341b474ca048b45b8c3
MD5 076c050ff87d89e4c67be5bce1956811
BLAKE2b-256 2af41050cec01ca7b57a46397dbd35b1cf0a3e55001e22abdd39a32c977d15cd

See more details on using hashes here.

File details

Details for the file pypdfium2-1.1.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.1.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4ad5d7adc1e84e07657d6458f830a47b0abeeeed3048d6578a27470ca84c2459
MD5 11b07ca0883dd420da77e3d8157da362
BLAKE2b-256 7df2f8ffd5b4f20f5c254c45fea833acc1f76293caf054854cd5fbb8b5c1194b

See more details on using hashes here.

File details

Details for the file pypdfium2-1.1.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

  • Download URL: pypdfium2-1.1.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, macOS 11.0+ ARM64, macOS 12.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.1.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 2704335e4255b23305bbd79b209c39a17197fe5702afcc30e5f9eb3f4248e006
MD5 ff86919df15742471686b11a874fc245
BLAKE2b-256 d6b163be1e60edd112ae690bb5e58ee1784dd0430735889576207d3c1d146e1f

See more details on using hashes here.

File details

Details for the file pypdfium2-1.1.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.1.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 1f1086452b4730dc28cd51749c72e4b9f393e5b5049e8e29d6e46ccb3ad4fbc8
MD5 f51c5883305542de03d6011240d3454f
BLAKE2b-256 f0fb0d76af00c5cbb56fb6868bc14d54659d8a1844d249267bb14110bf54fc8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page