Skip to main content

Find and extract content in PDFs converted to XML

Project description

# PDFCutter

There are better ways than storing data in a PDF.
**pdfcutter** is for when you need to get it out again.

Works on XML output of `pdftohtml` which belongs to `poppler-utils`.


import pdfcutter

cutter = pdfcutter.PDFCutter(filename='./some.pdf')

name_label = cutter.filter(page=1, search='Name:')
name = cutter.filter(page=1).strictly_right_of(name_label).text()

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfcutter-0.0.1.tar.gz (7.5 kB view hashes)

Uploaded source

Built Distribution

pdfcutter-0.0.1-py3-none-any.whl (8.3 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page