PDFMiner Wrapper for extractions
Project description
pdf-wrangler
PDFMiner wrapper used to simplify PDF extraction and other PDF utilities.
Document class
The Document
class is used to represent a PDF document. It contains functionality to access the raw text by page, PDF metadata and images in the form of PDFMiner's LTImage
object.
Example Usage
from pdf_wrangler import Document
pdf_document = Document('path/to/pdf.pdf')
# to access pdf metadata
pdf_document.get_metadata()
# to access full pdf text
pdf_document.get_text()
# print text by pdf page
for page in pdf_document.pages:
print(page.get_text())
# to access pdf images by page
page_1_images = pdf_document.pages[0].images
# get first image bytes representation
page_1_images[0].stream.get_data()
Installation
To install, run:
pip install pdf-wrangler
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf_wrangler-0.0.31.tar.gz
(3.9 kB
view details)
Built Distribution
File details
Details for the file pdf_wrangler-0.0.31.tar.gz
.
File metadata
- Download URL: pdf_wrangler-0.0.31.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d501cb9a19b2b988d565408c4807a2614d96e6fb44d319196e7ea5f35fcbffa |
|
MD5 | b7236e3c3d6a10a37ee2e24d5d18113d |
|
BLAKE2b-256 | 3d400a96008339f3e970546fb3d9bce4c5c3e62530859c7c3d53d9b57dc25e62 |
File details
Details for the file pdf_wrangler-0.0.31-py3-none-any.whl
.
File metadata
- Download URL: pdf_wrangler-0.0.31-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e06b18378569ccb49de4b0cce41531558decf1a38ae2f4171bedf27e529471d |
|
MD5 | 1450163b6e26cff8e0581ba8bdafb4ac |
|
BLAKE2b-256 | bfb1d665c8b639e1b45096d37f6a70241ec42cba4fddebc49d8e62c25c05fc99 |