Skip to main content

Convert PDF file to image files ROBUSTLY.

Project description

pdf2images

Convert PDF file to image files ROBUSTLY.

Example

$ ./pdf2images.py -h
usage: pdf2images.py [-h] [--max-size MAX_SIZE] pdf_file output_dir

positional arguments:
  pdf_file
  output_dir

optional arguments:
  -h, --help           show this help message and exit
  --max-size MAX_SIZE  max size of either side of the image

Why another "pdf-to-image" package

Once in a while, I need to convert a pdf file (usually slides or academic paper) into image files (thumbnails) in order to get a fast glance to the readers without downloading the pdf file.

However, I found all the pdf2image solutions cannot robustly process all the pdf files, since many pdf files are in non-standard format or come up with extensions. They are always broken in some cases.

But to look them on the bright side, for any plausible case, there is almost one of them can process it successfully.

So I combined (a.k.a. ensemble) them together to make it work across most cases.

Installation

As mentioned above, we combined multiple pdf manipulation libraries. Here are the list of the libraries used:

where wand and preview-generator are python packages that can be automatically installed along with pdf2images. However, you have to install xpdf and qpdf manually.

On Ubuntu:

sudo apt install -y qpdf xpdf

On Arch Linux:

sudo pacman -S qpdf xpdf

The installation of pdf2images is quite simple:

pip install pdf2images

Robustness

This package has successfully processed hundreds of thousands of arxiv papers (for generating thumbnails).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2images-0.0.1.tar.gz (2.2 kB view details)

Uploaded Source

Built Distribution

pdf2images-0.0.1-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file pdf2images-0.0.1.tar.gz.

File metadata

  • Download URL: pdf2images-0.0.1.tar.gz
  • Upload date:
  • Size: 2.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for pdf2images-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b29445d3bfafbcfaa6006c248ec6b9605999003ae5e367ff04b7a14044df0c82
MD5 8844ada5f1c51846ac7281a9c6a4c151
BLAKE2b-256 7a518f8a7b63ffbde770db49ac290b277817ad556dcef17e97341270e4106235

See more details on using hashes here.

File details

Details for the file pdf2images-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pdf2images-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for pdf2images-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 65701ac6e3ca6f4b4739a099d9e8074941f748419a58bfd28f3d5a3033267339
MD5 73fdc0866b323b3f794e5f3f9123ba8f
BLAKE2b-256 f112f6595f420140b92f6562934f54392b583cfeb3a7d1240b4539fd23d7f87e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page