Extracting text and data from PDFs
Project description
{pdfdata}
Python package for extracting text and data from PDFs.
Installation
pip install pdfdata
Usage
from pdfdata import *
from pprint import pprint
# parse pdf as dictionary
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res = pdf_doc_extract_span_list(pdf_parsed)
pprint(res, depth=3)
# parse pdf as list of spans
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res = pdf_doc_extract_span_df(pdf_parsed)
pprint(res[0])
# transform pdf text to jsonnl
pdf_text_to_jsonnl('pdfs/0641-20.pdf', '0641-20.jsonnl')
DevNotes
build
python -m build
pypi test upload
python -m twine upload --repository testpypi dist/* --skip-existing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdfdata-0.1.1.tar.gz
(3.5 kB
view hashes)