Skip to main content

Data Object Layer for PDF data

Project description

pdfdol

Data Object Layer for PDF data

To install: pip install pdfdol

Documentation

Examples

Get a dict-like object to list and read the pdfs of a folder, as text:

>>> from pdfdol import PdfFilesReader
>>> from pdfdol.tests import get_test_pdf_folder
>>> folder_path = get_test_pdf_folder()
>>> pdfs = PdfFilesReader(folder_path)
>>> sorted(pdfs)
['sample_pdf_1', 'sample_pdf_2']
>>> assert pdfs['sample_pdf_2'] == [
...     'Page 1\nThis is a sample text for testing Python PDF tools.'
... ]

See that the values of a PdfFilesReader are lists of pages. If you need strings (i.e. all the pages together) you can add a decoder like so:

from dol import add_decoder
page_separator = '---------------------'
pdfs = add_decoder(pdfs, decoder=page_separator.join)

If you need this at the level of the class, just do this:

from dol import add_decoder
page_separator = '---------------------'
FilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)
# and then
pdfs = FilesReader(folder_path)
# ...

If you need to concatinate a bunch of pdfs together, you can do so in many ways. Here's one:

from dol import Files
from pdfdol import concat_pdfs

s = Files('~/Downloads/cosmograph_documentation_pdfs/')
concat_pdfs(s, key_order=sorted)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfdol-0.1.11.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

pdfdol-0.1.11-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file pdfdol-0.1.11.tar.gz.

File metadata

  • Download URL: pdfdol-0.1.11.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for pdfdol-0.1.11.tar.gz
Algorithm Hash digest
SHA256 6cee1702eab25f6cc1ba9909c5785b0694850def05a68e954d141864355aad7f
MD5 02c0c5f9f4e030ebbeba7a60b03c7932
BLAKE2b-256 5f11c199841803301fdd295e6a8aabb5eb93aff600808210f7d009f32a81e67e

See more details on using hashes here.

File details

Details for the file pdfdol-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: pdfdol-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for pdfdol-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 cb90fc79a15c447770de72348f9f3d1bd1515f819d82620c8cd1fe8012af0b37
MD5 bc2e9a2f20a7d727dd36dfff59fd317a
BLAKE2b-256 24f3daa12c2238cf82b13f1d23e0e9b28618f86e9e3c5d74e07a6a9927d24600

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page