Skip to main content

Data Object Layer for PDF data

Project description

pdfdol

Data Object Layer for PDF data

To install: pip install pdfdol

Documentation

Examples

Get a dict-like object to list and read the pdfs of a folder, as text:

>>> from pdfdol import PdfFilesReader
>>> from pdfdol.tests import get_test_pdf_folder
>>> folder_path = get_test_pdf_folder()
>>> pdfs = PdfFilesReader(folder_path)
>>> sorted(pdfs)
['sample_pdf_1', 'sample_pdf_2']
>>> assert pdfs['sample_pdf_2'] == [
...     'Page 1\nThis is a sample text for testing Python PDF tools.'
... ]

See that the values of a PdfFilesReader are lists of pages. If you need strings (i.e. all the pages together) you can add a decoder like so:

from dol import add_decoder
page_separator = '---------------------'
pdfs = add_decoder(pdfs, decoder=page_separator.join)

If you need this at the level of the class, just do this:

from dol import add_decoder
page_separator = '---------------------'
FilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)
# and then
pdfs = FilesReader(folder_path)
# ...

If you need to concatinate a bunch of pdfs together, you can do so in many ways. Here's one:

from dol import Files
from pdfdol import concat_pdfs

s = Files('~/Downloads/cosmograph_documentation_pdfs/')
concat_pdfs(s, key_order=sorted)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfdol-0.1.10.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

pdfdol-0.1.10-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file pdfdol-0.1.10.tar.gz.

File metadata

  • Download URL: pdfdol-0.1.10.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for pdfdol-0.1.10.tar.gz
Algorithm Hash digest
SHA256 e7e679d4b054ed1b08799258c97b008b8df55aef714dca36d84a466006c69b18
MD5 0a4b91f7a5d50eac3d02e5a2a5ba6fb5
BLAKE2b-256 43b7570134f0e4891dfeb5153f6b4d1c3a8a82d9fb839afebc2dfd257ab13d50

See more details on using hashes here.

File details

Details for the file pdfdol-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: pdfdol-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for pdfdol-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 5641db5ccb9f3e6f2e7a2bebf32298db28851431a743607cccca0782783460dd
MD5 ddb9bcbc5f49054ada04cd247ea73afb
BLAKE2b-256 73f6f8c8761d9936dfd142d1f949b5cc88ad93cfcac463841a052687b0dba6e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page