Skip to main content

Python bindings to PDFium

Project description

pypdfium2

pypdfium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.

Install/Update

Install from PyPI

pip3 install --no-build-isolation -U pypdfium2

Manual installation

The following steps require the system tools git and gcc to be installed and available in PATH. In addition, the Python dependencies setuptools, setuptools-scm wheel, build, and ctypesgen are needed. Also make sure that your pip version is up-to-date. For more information, please refer to dependencies.md.

Package locally

To get pre-compiled binaries, generate bindings and install pypdfium2, you may run

make install

in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.

Source build

If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can do the following:

make build

In case building failed, you could try

python3 platform_setup/build_pdfium.py --nativebuild --check-deps
PYP_TARGET_PLATFORM="sourcebuild" python3 -m pip install . -v --no-build-isolation

to prefer the use of system-provided build tools over the toolchain PDFium ships with. The problem is that the toolchain is limited to a curated set of platforms, as PDFium target cross-compilation for "non-standard" architectures. (Make sure you installed all packages from the Native Build section of dependencies.md, in addition to the default requirements.)

Examples

Using the command-line interface

Rasterise a PDF document:

pypdfium2 render document.pdf -o output_dir/ --scale 3

You may also rasterise multiple files at once:

pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/

Show the table of contents for a PDF:

pypdfium2 toc document.pdf

To obtain a list of subcommands, run pypdfium2 help. Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help)

CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html

Using the support model

Import pypdfium2:

import pypdfium2 as pdfium

Open a PDF using the helper class PdfDocument:

doc = pdfium.PdfDocument(filename)
# ... use methods provided by the helper class
pdf = doc.raw
# ... work with the actual PDFium document handle
doc.close()

Open a PDF using the context manager PdfContext:

with pdfium.PdfContext(filename) as pdf:
    # ... work with the pdf

Open a PDF using the function open_pdf_auto():

pdf, loader_data = pdfium.open_pdf_auto(filename)
# ... work with the pdf
pdfium.close_pdf(pdf, loader_data)

Render a single page:

with pdfium.PdfContext(filename) as pdf:
    pil_image = pdfium.render_page_topil(
        pdf,
        page_index = 0,
        scale = 1,
        rotation = 0,
        colour = (255, 255, 255, 255),
        annotations = True,
        greyscale = False,
        optimise_mode = pdfium.OptimiseMode.none,
    )

pil_image.save("out.png")
pil_image.close()

Render multiple pages concurrently:

for image, suffix in pdfium.render_pdf_topil(filename):
    image.save(f'out_{suffix}.png')
    image.close()

Read the table of contents:

doc = pdfium.PdfDocument(filepath)
for item in doc.get_toc():
    print(
        '    ' * item.level +
        "{} -> {}  # {} {}".format(
            item.title,
            item.page_index + 1,
            item.view_mode,
            item.view_pos,
        )
    )
doc.close()

Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html

Using the PDFium API

Rendering the first page of a PDF document:

import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium

filename = "your/path/to/document.pdf"

doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1

form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)

page = pdfium.FPDF_LoadPage(doc, 0)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))

bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)

render_args = [bitmap, page, 0, 0, width, height, 0,  pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)

cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))

img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")

pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)

pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)

For more examples of using the raw API, take a look at the support model source code and the examples directory.

Documentation for the PDFium API is available. pypdfium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.

Licensing

PDFium and pypdfium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.

Various other open-source licenses apply to the dependencies of PDFium. License texts for PDFium and its dependencies are included in the file LicenseRef-PdfiumThirdParty.txt, which is also shipped with binary redistributions.

Documentation and examples of pypdfium2 are CC-BY-4.0 licensed.

Development

PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen

Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.

For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.

pypdfium2 contains scripts to automate the release process:

  • To build the wheels, run make release. This will download binaries and header files, write finished Python binary distributions to dist/, and run some checks.
  • To clean up after a release, run make clean. This will remove downloaded files and build artifacts.

Testing

Run make test.

Publishing the wheels

  • You may want to upload to TestPyPI first to ensure everything works as expected:
    twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
    
  • If all went well, upload to the real PyPI:
    twine upload dist/*
    

Issues

Since pypdfium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the pypdfium2 issues panel is always a good place to start if you have any problems, questions or suggestions.

If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.

Issues related to pre-compiled binaries should be discussed at pdfium-binaries, though.

If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.

Known limitations

Incompatibility with CPython 3.7.6 and 3.8.1

pypdfium2 cannot be used with releases 3.7.6 and 3.8.1 of the CPython interpreter due to a regression that broke ctypesgen-created string handling code.

Problems with FPDF_LoadMemDocument()

The FPDF_LoadMemDocument() function to open PDF documents from bytes behaves weirdly. Tests fail and the output files look quite broken if all file opening is directed through this function. It is hard to determine which external component involved is causing these issues. The recommended way for document access are the support models PdfDocument or PdfContext, which use FPDF_LoadCustomDocument() for bytes or byte buffers, and FPDF_LoadDocument() for file paths.

Thanks

Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.

History

pypdfium2 is the successor of pypdfium and pypdfium-reboot.

The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.

pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.

pypdfium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, pypdfium2 includes facilities to build PDFium from source, to extend platform compatibility.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfium2-1.0.0.tar.gz (354.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypdfium2-1.0.0-py3-none-win_arm64.whl (2.4 MB view details)

Uploaded Python 3Windows ARM64

pypdfium2-1.0.0-py3-none-win_amd64.whl (2.6 MB view details)

Uploaded Python 3Windows x86-64

pypdfium2-1.0.0-py3-none-win32.whl (2.5 MB view details)

Uploaded Python 3Windows x86

pypdfium2-1.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

pypdfium2-1.0.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.8 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

pypdfium2-1.0.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

pypdfium2-1.0.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

pypdfium2-1.0.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl (2.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64macOS 12.0+ ARM64

pypdfium2-1.0.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (2.8 MB view details)

Uploaded Python 3macOS 10.11+ x86-64macOS 11.0+ x86-64macOS 12.0+ x86-64

File details

Details for the file pypdfium2-1.0.0.tar.gz.

File metadata

  • Download URL: pypdfium2-1.0.0.tar.gz
  • Upload date:
  • Size: 354.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b6296d5b3485166d637a3bf7ae9a234f17825e9e6ba0858a776d1620d010aaff
MD5 88caaff75dee404a59de7e186c02e8aa
BLAKE2b-256 19e71c993f8786fcf6e60b54629ad2a0ef8c68fd544452dd29511ce1c1e41180

See more details on using hashes here.

File details

Details for the file pypdfium2-1.0.0-py3-none-win_arm64.whl.

File metadata

  • Download URL: pypdfium2-1.0.0-py3-none-win_arm64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.0.0-py3-none-win_arm64.whl
Algorithm Hash digest
SHA256 1ed65a4aae11207b6bf623f5923f5142b331af3e62873c282196c099625b277d
MD5 e58ae697955ac13dc102cce5845f24d7
BLAKE2b-256 aa3b0341c84be6f8cdf6a3b9057548d43f929713cd0373664ce9758875cd67c4

See more details on using hashes here.

File details

Details for the file pypdfium2-1.0.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: pypdfium2-1.0.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.0.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 d582056abbdaac3ebc49eb94b7d51d233a0001fe12f744ab843f4b71a4e826fe
MD5 8e4bb11b3a97a05fedf80c17a0abdc76
BLAKE2b-256 7562fda3629fa669e280a36df41498836cf843b1fc174a7b81068509467abb2f

See more details on using hashes here.

File details

Details for the file pypdfium2-1.0.0-py3-none-win32.whl.

File metadata

  • Download URL: pypdfium2-1.0.0-py3-none-win32.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.0.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 eddb6aacdf944101b984153d3fe67f557a22ad6c1e40e40805464e231d29f6a3
MD5 3cde691eb0dd05b1278c06a6ff105c84
BLAKE2b-256 3a57ebb9399c5fe521b03e609818e6820b4d6b3ed3e445e10a11237970364cd7

See more details on using hashes here.

File details

Details for the file pypdfium2-1.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 008845c543bd661a496cd3c2d8a2c8a16b503bf5ec8df62aaac7b9e9a56ad454
MD5 808554fe9de811d47ae257a92b2c45f5
BLAKE2b-256 954b3f8aa8d66aeff942a3b6bbbc3925d0d2164802acf59ade8045d39d1fa772

See more details on using hashes here.

File details

Details for the file pypdfium2-1.0.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

  • Download URL: pypdfium2-1.0.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3, manylinux: glibc 2.17+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.0.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 40cfd81645317fd4be2fa0b457ddaa34ce90e9a8d3e1fd23f98f221d25d98694
MD5 1a0da2ebdae9b4225687eeba55e94532
BLAKE2b-256 2bb6494b6c4d5406f7942124a407843d1597553f9f4bdb8658096d8babef7e37

See more details on using hashes here.

File details

Details for the file pypdfium2-1.0.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for pypdfium2-1.0.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 838b5bdf81ba15e895b6f03429e0b3fea546a5bbad09e924da9de431e0ef979a
MD5 9e77be98b6c9b47b2ecaf96a9823b5d7
BLAKE2b-256 0061880036b78633a2e20b54327f4f293d4737d9923b6cdbbb2a9e7a4eaabd64

See more details on using hashes here.

File details

Details for the file pypdfium2-1.0.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.0.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 770e8bb3f6032c1e4b4b50ebe4db58ac4df92b8b5ace6b000dc5b1901ce62a3b
MD5 54953180f55c5a25f4f3b012d09195f4
BLAKE2b-256 4bb242a39e7681608fbfd2c51ff5b7f5e813ec3d34bd60fbe13cc279f1d50948

See more details on using hashes here.

File details

Details for the file pypdfium2-1.0.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl.

File metadata

  • Download URL: pypdfium2-1.0.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3, macOS 11.0+ ARM64, macOS 12.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for pypdfium2-1.0.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 a8b3c6562efab9ad1b4ed41d90bd5b17dc42cf499d26607c1e24b0d5c5690557
MD5 6341a068736b7bb84b269e1f212c0f26
BLAKE2b-256 9a6514abeb2a1c19b922e4e947c19d9670847be9be31ba19a78b5953fdfb88ce

See more details on using hashes here.

File details

Details for the file pypdfium2-1.0.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for pypdfium2-1.0.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 9653a8bac3a9c79a146d1fc591150e3faf2edae3fe0bcfdaf45b67886889fdd8
MD5 9afa28c69764798dc8c67dc1dfd83f52
BLAKE2b-256 5d41a51d01cee2b4de1a4eeb9c916f398360a5298eba01933aea0964730135ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page