Find and extract content in PDFs converted to XML
Project description
# PDFCutter
There are better ways than storing data in a PDF.
**pdfcutter** is for when you need to get it out again.
Works on XML output of `pdftohtml` which belongs to `poppler-utils`.
```python
import pdfcutter
cutter = pdfcutter.PDFCutter(filename='./some.pdf')
name_label = cutter.filter(page=1, search='Name:')
name = cutter.filter(page=1).strictly_right_of(name_label).text()
```
There are better ways than storing data in a PDF.
**pdfcutter** is for when you need to get it out again.
Works on XML output of `pdftohtml` which belongs to `poppler-utils`.
```python
import pdfcutter
cutter = pdfcutter.PDFCutter(filename='./some.pdf')
name_label = cutter.filter(page=1, search='Name:')
name = cutter.filter(page=1).strictly_right_of(name_label).text()
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdfcutter-0.0.1.tar.gz
(7.5 kB
view hashes)
Built Distribution
Close
Hashes for pdfcutter-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99447f47302afcdb2fa60fc7012e367c7721be975042c89be55f5ce8a6889bbe |
|
MD5 | 2b3bb4ffbfcc66206ec040134b1e0221 |
|
BLAKE2b-256 | f6d23f62e276c25f57dfabeed74b6245fab67ba33d9b188cabff501608ffce87 |