A wrapper around the pdftoppm and pdftocairo command line tools to convert PDF to a PIL Image list.

These details have not been verified by PyPI

Project links

Homepage

Project description

pdf2image

A python (3.5+) module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object

How to install

pip install pdf2image

Windows

Windows users will have to install poppler for Windows, then add the bin/ folder to PATH.

Mac

Mac users will have to install poppler for Mac.

Linux

Most distros ship with pdftoppm and pdftocairo. If they are not installed, refer to your package manager to install poppler-utils

Platform-independant (Using `conda`)

Install poppler: conda install -c conda-forge poppler
Install pdf2image: pip install pdf2image

How does it work?

from pdf2image import convert_from_path, convert_from_bytes

from pdf2image.exceptions import (
    PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError
)

Then simply do:

images = convert_from_path('/home/belval/example.pdf')

images = convert_from_bytes(open('/home/belval/example.pdf', 'rb').read())

OR better yet

import tempfile

with tempfile.TemporaryDirectory() as path:
    images_from_path = convert_from_path('/home/belval/example.pdf', output_folder=path)
    # Do something here

images will be a list of PIL Image representing each page of the PDF document.

Here are the definitions:

convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False)

convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', jpegopt=None, thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None, paths_only=False)

Need help?

Use the mattermost chat to ask questions on the helpdesk and get direct support.

What's new?

jpegopt parameter allows for tuning of the output JPEG when using fmt="jpeg" (-jpegopt in pdftoppm CLI) (Thank you @abieler)
pdfinfo_from_path and pdfinfo_from_bytes which expose the output of the pdfinfo CLI
paths_only parameter will return image paths instead of Image objects, to prevent OOM when converting a big PDF
size parameter allows you to define the shape of the resulting images (-scale-to in pdftoppm CLI)
- size=400 will fit the image to a 400x400 box, preserving aspect ratio
- size=(400, None) will make the image 400 pixels wide, preserving aspect ratio
- size=(500, 500) will resize the image to 500x500 pixels, not preserving aspect ratio
grayscale parameter allows you to convert images to grayscale (-gray in pdftoppm CLI)
single_file parameter allows you to convert the first PDF page only, without adding digits at the end of the output_file
Allow the user to specify poppler's installation path with poppler_path
Fixed a bug where PNGs buffer with a non-terminating I-E-N-D sequence would throw an exception
Fixed a bug that left open file descriptors when using convert_from_bytes() (Thank you @FabianUken)

Performance tips

Using an output folder is significantly faster if you are using an SSD. Otherwise i/o usually becomes the bottleneck.
Using multiple threads can give you some gains but avoid more than 4 as this will cause i/o bottleneck (even on my NVMe SSD!).
If i/o is your bottleneck, using the JPEG format can lead to significant gains.
PNG format is pretty slow, this is because of the compression.
If you want to know the best settings (most settings will be fine anyway) you can clone the project and run python tests.py to get timings.

Limitations / known issues

A relatively big PDF will use up all your memory and cause the process to be killed (unless you use an output folder)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.17.0

Jan 7, 2024

1.16.3

Feb 26, 2023

1.16.2

Dec 31, 2022

1.16.0

Jun 23, 2021

1.15.1

May 12, 2021

1.15.0

May 12, 2021

1.14.0

Aug 23, 2020

1.13.1

Apr 30, 2020

1.13.0 yanked

Apr 30, 2020

1.12.1

Feb 17, 2020

This version

1.11.0

Dec 19, 2019

1.10.0

Nov 4, 2019

1.9.0

Sep 21, 2019

1.8.0

Sep 15, 2019

1.7.1

Sep 3, 2019

1.7.0

Aug 27, 2019

1.6.0

Jul 3, 2019

1.5.4

Apr 30, 2019

1.5.3

Apr 28, 2019

1.5.2

Apr 27, 2019

1.5.1

Mar 24, 2019

1.5.0

Mar 23, 2019

1.4.2

Feb 28, 2019

1.4.1

Jan 29, 2019

1.4.0

Jan 9, 2019

1.3.1

Dec 29, 2018

1.3.0

Dec 26, 2018

1.2.1

Dec 19, 2018

1.2.0

Dec 17, 2018

1.1.0

Nov 20, 2018

1.0.0

Sep 13, 2018

0.1.14

Jun 10, 2018

0.1.13

May 29, 2018

0.1.12

May 29, 2018

0.1.11

May 2, 2018

0.1.10

Mar 25, 2018

0.1.9

Mar 20, 2018

0.1.7

Feb 3, 2018

0.1.6

Nov 14, 2017

0.1.5

Oct 24, 2017

0.1.4

Jun 4, 2017

0.1.3

Jun 4, 2017

0.1.2

Jun 4, 2017

0.1.1

Jun 4, 2017

0.1.0

Jun 4, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2image-1.11.0.tar.gz (8.1 kB view details)

Uploaded Dec 19, 2019 Source

File details

Details for the file pdf2image-1.11.0.tar.gz.

File metadata

Download URL: pdf2image-1.11.0.tar.gz
Upload date: Dec 19, 2019
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/28.8.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.5.7

File hashes

Hashes for pdf2image-1.11.0.tar.gz
Algorithm	Hash digest
SHA256	`787f6dd77dc02786913fd4ee5766bb9241fe807e3c6ee90e3cff18bcf2f23555`
MD5	`30c5e895866df7ffd604345551b40bb6`
BLAKE2b-256	`55bea08351b2c2b7c0896062a739018b069774167ffe8c78265daef63b6e060e`

See more details on using hashes here.

pdf2image 1.11.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pdf2image

How to install

Windows

Mac

Linux

Platform-independant (Using `conda`)

How does it work?

Need help?

What's new?

Performance tips

Limitations / known issues

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

pdf2image 1.11.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pdf2image

How to install

Windows

Mac

Linux

Platform-independant (Using conda)

How does it work?

Need help?

What's new?

Performance tips

Limitations / known issues

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Platform-independant (Using `conda`)