Python bindings to PDFium
Project description
PyPDFium2
PyPDFium2 is a Python 3 binding to PDFium, the liberal-licensed PDF rendering library authored by Foxit and maintained by Google.
Install/Update
Install from PyPI
pip3 install -U pypdfium2
Manual installation
The following steps require the system tools git
and gcc
to be installed and available
in PATH
. In addition, the Python dependencies setuptools
, setuptools-scm
wheel
, build
,
and ctypesgen
are needed. Also make sure that your pip
version is up-to-date.
For more information, please refer to dependencies.md
.
Package locally
To get pre-compiled binaries, generate bindings and install PyPDFium2, you may run
make install
in the directory you downloaded the repository to. This will resort to building PDFium if no pre-compiled binaries are available for your platform.
Source build
If you wish to perform a source build regardless of whether PDFium binaries are available or not, you can do the following:
make build
In case building failed, you could try
python3 platform_setup/build_pdfium.py -p --check-deps
PYP_TARGET_PLATFORM="sourcebuild" python3 -m pip install . -v --no-build-isolation
to prefer the use of system-provided build tools over the toolchain PDFium ships with. The problem is
that the toolchain is limited to a curated set of platforms, as PDFium target cross-compilation for
"non-standard" architectures. (Make sure you installed all packages from the Native Build
section
of dependencies.md
, in addition to the default requirements.)
Examples
Using the command-line interface
Rasterise a PDF document:
pypdfium2 render document.pdf -o output_dir/ --scale 3
You may also rasterise multiple files at once:
pypdfium2 render doc_1.pdf doc_2.pdf doc_3.pdf -o output_dir/
Show the table of contents for a PDF:
pypdfium2 toc document.pdf
To obtain a list of subcommands, run pypdfium2 help
.
Individual help for each subcommand is available can be accessed in the same way (pypdfium any_subcommand help
)
CLI documentation: https://pypdfium2.readthedocs.io/en/stable/shell_api.html
Using the support model
Import pypdfium2:
import pypdfium2 as pdfium
Open a PDF using the helper class PdfDocument
:
doc = pdfium.PdfDocument(filename)
# ... use methods provided by the helper class
pdf = doc.raw
# ... work with the actual PDFium document handle
doc.close()
Open a PDF using the context manager PdfContext
:
with pdfium.PdfContext(filename) as pdf:
# ... work with the pdf
Open a PDF using the function open_pdf_auto()
:
pdf, loader_data = pdfium.open_pdf_auto(filename)
# ... work with the pdf
pdfium.close_pdf(pdf, loader_data)
Render a single page:
with pdfium.PdfContext(filename) as pdf:
pil_image = pdfium.render_page(
pdf,
page_index = 0,
scale = 1,
rotation = 0,
colour = 0xFFFFFFFF,
annotations = True,
greyscale = False,
optimise_mode = pdfium.OptimiseMode.none,
)
pil_image.save("out.png")
pil_image.close()
Render multiple pages concurrently:
for image, suffix in pdfium.render_pdf(filename):
image.save(f'out_{suffix}.png')
image.close()
Read the table of contents:
doc = pdfium.PdfDocument(filepath)
for item in doc.get_toc():
print(
' ' * item.level +
"{} -> {} # {} {}".format(
item.title,
item.page_index + 1,
item.view_mode,
item.view_pos,
)
)
doc.close()
Support model documentation: https://pypdfium2.readthedocs.io/en/stable/python_api.html
Using the PDFium API
Rendering the first page of a PDF document:
import math
import ctypes
from PIL import Image
import pypdfium2 as pdfium
filename = "your/path/to/document.pdf"
doc = pdfium.FPDF_LoadDocument(filename, None)
page_count = pdfium.FPDF_GetPageCount(doc)
assert page_count >= 1
form_config = pdfium.FPDF_FORMFILLINFO(2)
form_fill = pdfium.FPDFDOC_InitFormFillEnvironment(doc, form_config)
page = pdfium.FPDF_LoadPage(doc, 0)
pdfium.FORM_OnAfterLoadPage(page, form_fill)
width = math.ceil(pdfium.FPDF_GetPageWidthF(page))
height = math.ceil(pdfium.FPDF_GetPageHeightF(page))
bitmap = pdfium.FPDFBitmap_Create(width, height, 0)
pdfium.FPDFBitmap_FillRect(bitmap, 0, 0, width, height, 0xFFFFFFFF)
render_args = [bitmap, page, 0, 0, width, height, 0, pdfium.FPDF_LCD_TEXT | pdfium.FPDF_ANNOT]
pdfium.FPDF_RenderPageBitmap(*render_args)
pdfium.FPDF_FFLDraw(form_fill, *render_args)
cbuffer = pdfium.FPDFBitmap_GetBuffer(bitmap)
buffer = ctypes.cast(cbuffer, ctypes.POINTER(ctypes.c_ubyte * (width * height * 4)))
img = Image.frombuffer("RGBA", (width, height), buffer.contents, "raw", "BGRA", 0, 1)
img.save("out.png")
pdfium.FPDFBitmap_Destroy(bitmap)
pdfium.FPDF_ClosePage(page)
pdfium.FPDFDOC_ExitFormFillEnvironment(form_fill)
pdfium.FPDF_CloseDocument(doc)
For more examples of using the raw API, take a look at the support model source code and the examples directory.
Documentation for the PDFium API is available. PyPDFium2 transparently maps all PDFium classes, enums and functions to Python. However, there can sometimes be minor differences between Foxit and open-source PDFium. In case of doubt, take a look at the inline source code documentation of PDFium.
Licensing
PDFium and PyPDFium2 are available by the terms and conditions of either Apache 2.0 or BSD-3-Clause, at your choice.
Various other open-source licenses apply to the dependencies of PDFium.
License texts for PDFium and its dependencies are included in the file LICENSE-PDFium.txt
,
which is also shipped with binary redistributions.
Documentation and examples of PyPDFium2 are CC-BY-4.0 licensed.
Development
PDFium builds are retrieved from bblanchon/pdfium-binaries. Python bindings are auto-generated with ctypesgen
Please see #3 for a list of platforms where binary wheels are available. Some wheels are not tested, unfortunately. If you have access to a theoretically supported but untested system, please report success or failure on the issue or discussion panel.
(In case bblanchon/pdfium-binaries
adds support for more architectures, PyPDFium2 can be
adapted easily.)
For wheel naming conventions, please see Python Packaging: Platform compatibility tags and the various referenced PEPs. This thread may also provide helpful information.
PyPDFium2 contains scripts to automate the release process:
- To build the wheels, run
make release
. This will download binaries and header files, write finished Python binary distributions todist/
, and run some checks. - To clean up after a release, run
make clean
. This will remove downloaded files and build artifacts.
Testing
Run make test
.
Publishing the wheels
- You may want to upload to TestPyPI first to ensure
everything works as expected:
twine upload --verbose --repository-url https://test.pypi.org/legacy/ dist/*
- If all went well, upload to the real PyPI:
twine upload dist/*
Issues
Since PyPDFium2 is built using upstream binaries and an automatic bindings creator, issues that are not related to packaging or support model code probably need to be addressed upstream. However, the PyPDFium2 issues panel is always a good place to start if you have any problems, questions or suggestions.
If the cause of an issue could be determined to be in PDFium, the problem needs to be reported at the PDFium bug tracker.
Issues related to pre-compiled binaries should be discussed at pdfium-binaries, though.
If your issue is caused by the bindings generator, refer to the ctypesgen bug tracker.
Known limitations
Non-ascii file paths on Windows
On Windows, the FPDF_LoadDocument()
method of PDFium currently is not able to open documents with file paths containing multi-byte, non-ascii characters (see Bug 682).
There is a patch that might fix the issue, but it has not been merged yet.
The support model of PyPDFium2 implements a workaround using FPDF_LoadCustomDocument()
to be able to process non-ascii filepaths on Windows anyway.
Thanks
Patches to PDFium and DepotTools originate from the pdfium-binaries repository. Many thanks to @bblanchon and @BoLaMN.
History
PyPDFium2 is the successor of pypdfium and pypdfium-reboot.
The initial pypdfium was packaged manually and did not get regular updates. There were no platform-specific wheels, but only a single wheel that contained binaries for 64-bit Linux, Windows and macOS.
pypdfium-reboot then added a script to automate binary deployment and bindings generation to simplify regular updates. However, it was still not platform specific.
PyPDFium2 is a full rewrite of pypdfium-reboot to build platform-specific wheels. It also adds a basic support model and a command-line interface on top of the PDFium C API to simplify common use cases. Moreover, PyPDFium2 includes facilities to build PDFium from source, to extend platform compatibility.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pypdfium2-0.14.0-py3-none-win_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0222edd7320e86feea2503f8d70b3421846ebafb77f00cb2e70324d061bc06d7 |
|
MD5 | be8f4bb7b2e0f331e2921aa0ff886395 |
|
BLAKE2b-256 | cd9b01d80be705c2c9e62c7c6e619b4ad24603cff7bf032ee3fe12569fba35d2 |
Hashes for pypdfium2-0.14.0-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdd709b025b1cc093c1fc40523f9973427de9c10857391e97e7ceaf68a31fff0 |
|
MD5 | 41d0ff2caeeadee885352cf1ae408af5 |
|
BLAKE2b-256 | cc517b69c95ca8f42b51ab377282fc8985f25d69d35a64944d9f1de46f12db61 |
Hashes for pypdfium2-0.14.0-py3-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b2d12a7b469b369cbf2ce17e752eb2a5230a71ea6b325eb7d79f6f236e61856 |
|
MD5 | 0b2f8a470a937f32684c37ba5abc110d |
|
BLAKE2b-256 | 6138fb8afa0e2eab8ffc7a9b20b66829815387df5d886ea2ab5aae7285a1b695 |
Hashes for pypdfium2-0.14.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbb307f5983d92996a9c1a925c9cb0ea3cf2c826bdb812507cbcca4046555a39 |
|
MD5 | 89254dce3ea81a250d61f158bb80ae78 |
|
BLAKE2b-256 | 2156527b1a23d15c060fbe7631bedd1f95603cdca08c13d44f86130c2e545960 |
Hashes for pypdfium2-0.14.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 254bacaeeddbc1b5c9e20433292d4aa1f4ef7854f17b21639c2bc5662510fc58 |
|
MD5 | 17e8807e33b40a09380b48eeb572aa4f |
|
BLAKE2b-256 | 94f4a84513230ca7855ade6ce2d58f125e870d4c184224ee8b1d2692e36ae12d |
Hashes for pypdfium2-0.14.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7103c646a469c8dbe1144a22fd8c6a28907beb74171acf93fa2bf1f92a9749b5 |
|
MD5 | 59cdbc9143f3fd79480b029a8e7ce08a |
|
BLAKE2b-256 | e9a14d5fa9d9d4a56a18528483f45d0df38858e28558f1cd702df12665f18918 |
Hashes for pypdfium2-0.14.0-py3-none-macosx_11_0_arm64.macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c537c0b8ea2e695e9ee3fed1def15d53bba55a91f208dc79f847ffc68e10b1b |
|
MD5 | 1161b01b2fc35e7303b0041943f9550e |
|
BLAKE2b-256 | 5e7062f54b373d17e9f08c33d560fe1ed99d668f31bf12cf0f251a826ec84a21 |
Hashes for pypdfium2-0.14.0-py3-none-macosx_10_11_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6aaa09861617d46a2b24dd75494b405c13e5f31205a7c1976a68368cc0bb84af |
|
MD5 | 77cd9d528f08c5f74b52cf4f871a50f4 |
|
BLAKE2b-256 | fd58cce2d27a7ce9a5a3aace2ba3734fb40f01811384b1887cb46dbe305f2ad3 |