Skip to main content

Convert PDF file to image files ROBUSTLY.

Project description

pdf2images

Convert PDF file to image files ROBUSTLY.

Example

$ pdf2images -h
usage: pdf2images [-h] [--max-size MAX_SIZE] pdf_file output_dir

positional arguments:
  pdf_file
  output_dir

optional arguments:
  -h, --help           show this help message and exit
  --max-size MAX_SIZE  max size of either side of the image

Why another "pdf-to-image" package

Once in a while, I need to convert a pdf file (usually slides or academic paper) into image files (thumbnails) in order to get a fast glance to the readers without downloading the pdf file.

However, I found all the pdf2image solutions cannot robustly process all the pdf files, since many pdf files are in non-standard format or come up with extensions. They are always broken in some cases.

But to look them on the bright side, for any plausible case, there is almost one of them can process it successfully.

So I combined (a.k.a. ensemble) them together to make it work across most cases.

Installation

As mentioned above, we combined multiple pdf manipulation libraries. Here are the list of the libraries used:

where wand and preview-generator are python packages that can be automatically installed along with pdf2images. However, you have to install xpdf and qpdf manually.

On Ubuntu:

sudo apt install -y qpdf xpdf libimage-exiftool-perl

On Arch Linux:

sudo pacman -S --noconfirm qpdf xpdf perl-image-exiftool

On macOS:

brew install freetype imagemagick qpdf xpdf exiftool libmagic ghostscript

The installation of pdf2images is quite simple:

pip install pdf2images

Robustness

This package has successfully processed hundreds of thousands of arxiv papers (for generating thumbnails).

Gallary

The following images are converted from a slide from Deep Learning Book

page-0 page-1 page-2 page-3

Development

pip3 install -r requirements.dev.txt
pre-commit install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2images-0.0.6.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

pdf2images-0.0.6-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file pdf2images-0.0.6.tar.gz.

File metadata

  • Download URL: pdf2images-0.0.6.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for pdf2images-0.0.6.tar.gz
Algorithm Hash digest
SHA256 33786e248a2fd87981ed8c81757feda3da006d47cc51d51690b47a963b1ef779
MD5 81d92dd6cedca821cc332fa96af28407
BLAKE2b-256 d1310204b0f79e0da04ffb2b415feef9d3222fdee3e34d50be71a8bca8c4b461

See more details on using hashes here.

File details

Details for the file pdf2images-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pdf2images-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for pdf2images-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 52c71bb0e5ccc4b46a50bb6c9528984bd5c8301320302d3883e1465efbcb4fba
MD5 039fdb4094b6885d5784b4f091e622da
BLAKE2b-256 6bcc4687f2d82557408431d047adf5488a06e244f120295ab791392ac56a80f1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page