Skip to main content

Data Object Layer for PDF data

Project description

pdfdol

Data Object Layer for PDF data

To install: pip install pdfdol

Documentation

Examples

Pdf "Stores"

Get a dict-like object to list and read the pdfs of a folder, as text:

>>> from pdfdol import PdfFilesReader
>>> from pdfdol.tests import get_test_pdf_folder
>>> folder_path = get_test_pdf_folder()
>>> pdfs = PdfFilesReader(folder_path)
>>> sorted(pdfs)
['sample_pdf_1', 'sample_pdf_2']
>>> assert pdfs['sample_pdf_2'] == [
...     'Page 1\nThis is a sample text for testing Python PDF tools.'
... ]

See that the values of a PdfFilesReader are lists of pages. If you need strings (i.e. all the pages together) you can add a decoder like so:

from dol import add_decoder
page_separator = '---------------------'
pdfs = add_decoder(pdfs, decoder=page_separator.join)

If you need this at the level of the class, just do this:

from dol import add_decoder
page_separator = '---------------------'
FilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)
# and then
pdfs = FilesReader(folder_path)
# ...

If you need to concatinate a bunch of pdfs together, you can do so in many ways. Here's one:

from dol import Files
from pdfdol import concat_pdfs

s = Files('~/Downloads/cosmograph_documentation_pdfs/')
concat_pdfs(s, key_order=sorted)

Get pdf from various sources

Example with a URL

pdf_data = get_pdf("https://pypi.org", src_kind="url")
print("Got PDF data of length:", len(pdf_data))

Example with HTML content

html_content = "<html><body><h1>Hello, PDF!</h1></body></html>"
pdf_data = get_pdf(html_content, src_kind="html")
print("Got PDF data of length:", len(pdf_data))

Example saving to file

filepath = get_pdf("https://pypi.org", egress="output.pdf", src_kind="url")
print("PDF saved to:", filepath)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfdol-0.1.23.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfdol-0.1.23-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file pdfdol-0.1.23.tar.gz.

File metadata

  • Download URL: pdfdol-0.1.23.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for pdfdol-0.1.23.tar.gz
Algorithm Hash digest
SHA256 3ff15b4b27e0c52b627f05cca7464dda0de6b0ea140431c4c3c4aeab6c53ab42
MD5 3df574b1230c4db77e67620181526724
BLAKE2b-256 a789da229b38f4ef38bc0fdc3d687d4a59da7adbba8668399aa8a4c1442dbcdf

See more details on using hashes here.

File details

Details for the file pdfdol-0.1.23-py3-none-any.whl.

File metadata

  • Download URL: pdfdol-0.1.23-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for pdfdol-0.1.23-py3-none-any.whl
Algorithm Hash digest
SHA256 ca86390909ff4223e3a8de131dc323beb6eea6bd24c9264f49a287200a27cb6d
MD5 e4eb64b99d49d4edf5f724ea8b4999d9
BLAKE2b-256 d35f97098a25e77eecaa64534d344673b6a0a6ad0f96d86a51f1f51bc72b919b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page