Convert PDF file to image files ROBUSTLY.
Project description
pdf2images
Convert PDF file to image files ROBUSTLY.
Example
$ pdf2images -h
usage: pdf2images [-h] [--max-size MAX_SIZE] pdf_file output_dir
positional arguments:
pdf_file
output_dir
optional arguments:
-h, --help show this help message and exit
--max-size MAX_SIZE max size of either side of the image
Why another "pdf-to-image" package
Once in a while, I need to convert a pdf file (usually slides or academic paper) into image files (thumbnails) in order to get a fast glance to the readers without downloading the pdf file.
However, I found all the pdf2image solutions cannot robustly process all the pdf files, since many pdf files are in non-standard format or come up with extensions. They are always broken in some cases.
But to look them on the bright side, for any plausible case, there is almost one of them can process it successfully.
So I combined (a.k.a. ensemble) them together to make it work across most cases.
Installation
As mentioned above, we combined multiple pdf manipulation libraries. Here are the list of the libraries used:
- wand, an ImageMagick python wrapper.
pdftotext
command line tool provided by xpdf- preview-generator
- qpdf
where wand and preview-generator are python packages that can be automatically installed along with pdf2images. However, you have to install xpdf and qpdf manually.
On Ubuntu:
sudo apt install -y qpdf xpdf libimage-exiftool-perl
On Arch Linux:
sudo pacman -S --noconfirm qpdf xpdf perl-image-exiftool
On macOS:
brew install freetype imagemagick qpdf xpdf exiftool libmagic ghostscript
The installation of pdf2images is quite simple:
pip install pdf2images
Robustness
This package has successfully processed hundreds of thousands of arxiv papers (for generating thumbnails).
Gallary
The following images are converted from a slide from Deep Learning Book
Development
pip3 install -r requirements.dev.txt
pre-commit install
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdf2images-0.0.6.tar.gz
.
File metadata
- Download URL: pdf2images-0.0.6.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33786e248a2fd87981ed8c81757feda3da006d47cc51d51690b47a963b1ef779 |
|
MD5 | 81d92dd6cedca821cc332fa96af28407 |
|
BLAKE2b-256 | d1310204b0f79e0da04ffb2b415feef9d3222fdee3e34d50be71a8bca8c4b461 |
File details
Details for the file pdf2images-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: pdf2images-0.0.6-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52c71bb0e5ccc4b46a50bb6c9528984bd5c8301320302d3883e1465efbcb4fba |
|
MD5 | 039fdb4094b6885d5784b4f091e622da |
|
BLAKE2b-256 | 6bcc4687f2d82557408431d047adf5488a06e244f120295ab791392ac56a80f1 |