A Python binding to poppler-cpp
Project description
python-poppler
python-poppler is a Python binding to the poppler-cpp library. It allows to read, render, or modify PDF documents. More specifically, it currently allows to:
- read an modify document meta data;
- list and read embedded documents;
- list the fonts used by the document;
- search or extract text on a given page of the document;
- render a page to a raw image;
- get info about transitions effects between the pages;
- read the table of contents of the document.
Documentation
https://cbrunet.github.io/python-poppler/
Documentation is currently a work-in-progress. Here you will find information about installation of the package, compilation from sources, and usage.
Meanwhile, because it follows the interface of poppler-cpp
, you can refer to the documentation of the C++ library.
Usage
The package is installed as poppler
.
Example:
from poppler import load_from_file, PageRenderer
pdf_document = load_from_file("sample.pdf")
page_1 = pdf_document.create_page(0)
page_1_text = page_1.text()
renderer = PageRenderer()
image = renderer.render_page(page_1)
image_data = image.data
Contributing
Contributions are welcome.
Please use the GitHub issue tracker to report bugs or request features. You can also submit Pull requests.
Code is formatted using black. Ensure that everything is well formatted. You can use
tox -e lint
to lint your code.
Please ensure that all tests pass, by running tox
.
Please provide unit tests covering the new feature, or proving that a bug is corrected, when possible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.