Handle PDF pagestreams with PikePDF and split them by outline
Project description
PageStream
Some scanners and PDF merge software results in a single merged PDF that has the original document title in the outline. The dutch government uses software like this to respond to FOIA requests.
This module was created to split these FOIA requests by outline. As an investigative journalism platform, we encounter different kinds of pagestreams in the wild. The intent for this module is to be a place where we collect different functionality regarding these streams.
Example usage:
stream = PDFPageStream("/path/to.pdf")
if stream.can_extract_by_outline():
stream.extract_to("/output/path")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pagestream-0.2.0.tar.gz
(3.1 kB
view hashes)
Built Distribution
Close
Hashes for pagestream-0.2.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1224b1cb104129676a1be5f06ba9516bfeacf5ca29cc96ba4f7452fc741d950c |
|
MD5 | 442c58c9d876b1d255c2a56048a97fd8 |
|
BLAKE2b-256 | 75d8bfdef693593fcb8f1ab8b0b764f8dc0846811921b817483ea415d2f4fcac |