A Python binding to poppler-cpp
Project description
python-poppler
python-poppler is a Python binding to the poppler-cpp library. It allows to read, render, or modify PDF documents. More specifically, it currently allows to:
- read an modify document meta data;
- list and read embedded documents;
- list the fonts used by the document;
- search or extract text on a given page of the document;
- render a page to a raw image;
- get info about transitions effects between the pages;
- read the table of contents of the document.
How to build
This package is currently distributed as source only, and is currently tested on Linux only. It requires poppler 0.62 or higher (but 0.87 is recommended). I will provide a WIndows build once I figure out how to compile poppler for Windows.
You need poppler-cpp with headers, python (3.7 or 3.8) with headers, and cmake. On Arch linux, you need the poppler package. On Ubuntu, you need to install libpoppler-cpp-dev.
The whole build process is handled by the setup.py
file.
For instance, to install in the current environment:
$ python setup.py install
This will compile the binary packages, and install the library.
Tests are run using tox:
$ tox
Usage
The package is installed as poppler
. It follows the interface of poppler-cpp
. Therefore, you can refer to the documentation of the C++ library.
Example:
from poppler import load_from_file, PageRenderer
pdf_document = load_from_file("sample.pdf")
page_1 = pdf_document.create_page(0)
page_1_text = page_1.text()
renderer = PageRenderer()
image = renderer.render_page(page_1)
image_data = image.data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.