Data Object Layer for PDF data

Project description

pdfdol

Data Object Layer for PDF data

To install: pip install pdfdol

Examples

Get a dict-like object to list and read the pdfs of a folder, as text:

>>> from pdfdol import PdfFilesReader
>>> from pdfdol.tests import get_test_pdf_folder
>>> folder_path = get_test_pdf_folder()
>>> pdfs = PdfFilesReader(folder_path)
>>> sorted(pdfs)
['sample_pdf_1', 'sample_pdf_2']
>>> assert pdfs['sample_pdf_2'] == [
...     'Page 1\nThis is a sample text for testing Python PDF tools.'
... ]

See that the values of a PdfFilesReader are lists of pages. If you need strings (i.e. all the pages together) you can add a decoder like so:

from dol import add_decoder
page_separator = '---------------------'
pdfs = add_decoder(pdfs, decoder=page_separator.join)

If you need this at the level of the class, just do this:

from dol import add_decoder
page_separator = '---------------------'
FilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)
# and then
pdfs = FilesReader(folder_path)
# ...

Project details

Release history Release notifications | RSS feed

0.1.9

Jun 11, 2024

0.1.8

Jun 7, 2024

0.1.7

Jun 7, 2024

0.1.6

Jun 3, 2024

0.1.5

May 23, 2024

0.1.4

May 23, 2024

0.1.3

Apr 3, 2024

This version

0.1.2

Jan 2, 2024

0.1.1

Jan 2, 2024

0.1.0

Jan 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfdol-0.1.2.tar.gz (3.0 kB view hashes)

Uploaded Jan 2, 2024 Source

Built Distribution

pdfdol-0.1.2-py3-none-any.whl (4.1 kB view hashes)

Uploaded Jan 2, 2024 Python 3

Hashes for pdfdol-0.1.2.tar.gz

Hashes for pdfdol-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`abce6f3535e3fd70ff56c3c91d3aaa6d0d5baad5b468e76517d2da1d76d60a7c`
MD5	`349e54743166856776a3a84b78305c97`
BLAKE2b-256	`5d8c37bb8416bedcf8fa1ba6cb2c413a600a79e68de8ca5666d07fef7243d957`

Hashes for pdfdol-0.1.2-py3-none-any.whl

Hashes for pdfdol-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d8bc6cd10f26bb2d0bf63bf42adf5d0a071a4c5bc1ef6d47a9ec3f5e152b0d8`
MD5	`418e9109f65f58c7152bbb1a8c40c74c`
BLAKE2b-256	`0451e228fd5e4627561199e74ebe890bb6d5087fcab241712cfe6553dfb27acb`