Skip to main content

PDFMiner Wrapper for extractions

Project description

pdf-wrangler

PDFMiner wrapper used to simplify PDF extraction and other PDF utilities.

Document class

The Document class is used to represent a PDF document. It contains functionality to access the raw text by page, PDF metadata and images in the form of PDFMiner's LTImage object.

Example Usage

from pdf_wrangler import Document

pdf_document = Document('path/to/pdf.pdf')

# to access pdf metadata
pdf_document.get_metadata()

# to access full pdf text
pdf_document.get_text()

# print text by pdf page
for page in pdf_document.pages:
    print(page.get_text())

# to access pdf images by page
page_1_images = pdf_document.pages[0].images

# get first image bytes representation
page_1_images[0].stream.get_data()

Installation

To install, run:

pip install pdf-wrangler

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_wrangler-0.0.31.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

pdf_wrangler-0.0.31-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file pdf_wrangler-0.0.31.tar.gz.

File metadata

  • Download URL: pdf_wrangler-0.0.31.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for pdf_wrangler-0.0.31.tar.gz
Algorithm Hash digest
SHA256 0d501cb9a19b2b988d565408c4807a2614d96e6fb44d319196e7ea5f35fcbffa
MD5 b7236e3c3d6a10a37ee2e24d5d18113d
BLAKE2b-256 3d400a96008339f3e970546fb3d9bce4c5c3e62530859c7c3d53d9b57dc25e62

See more details on using hashes here.

File details

Details for the file pdf_wrangler-0.0.31-py3-none-any.whl.

File metadata

  • Download URL: pdf_wrangler-0.0.31-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for pdf_wrangler-0.0.31-py3-none-any.whl
Algorithm Hash digest
SHA256 3e06b18378569ccb49de4b0cce41531558decf1a38ae2f4171bedf27e529471d
MD5 1450163b6e26cff8e0581ba8bdafb4ac
BLAKE2b-256 bfb1d665c8b639e1b45096d37f6a70241ec42cba4fddebc49d8e62c25c05fc99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page