Data Object Layer for PDF data
Project description
pdfdol
Data Object Layer for PDF data
To install: pip install pdfdol
Examples
Get a dict-like object to list and read the pdfs of a folder, as text:
>>> from pdfdol import PdfFilesReader
>>> from pdfdol.tests import get_test_pdf_folder
>>> folder_path = get_test_pdf_folder()
>>> pdfs = PdfFilesReader(folder_path)
>>> sorted(pdfs)
['sample_pdf_1', 'sample_pdf_2']
>>> assert pdfs['sample_pdf_2'] == [
... 'Page 1\nThis is a sample text for testing Python PDF tools.'
... ]
See that the values of a PdfFilesReader
are lists of pages.
If you need strings (i.e. all the pages together) you can add a decoder like so:
from dol import add_decoder
page_separator = '---------------------'
pdfs = add_decoder(pdfs, decoder=page_separator.join)
If you need this at the level of the class, just do this:
from dol import add_decoder
page_separator = '---------------------'
FilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)
# and then
pdfs = FilesReader(folder_path)
# ...
If you need to concatinate a bunch of pdfs together, you can do so in many ways. Here's one:
from dol import Files
from pdfdol import concat_pdfs
s = Files('~/Downloads/cosmograph_documentation_pdfs/')
concat_pdfs(s, key_order=sorted)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdfdol-0.1.10.tar.gz
(6.2 kB
view details)
Built Distribution
File details
Details for the file pdfdol-0.1.10.tar.gz
.
File metadata
- Download URL: pdfdol-0.1.10.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7e679d4b054ed1b08799258c97b008b8df55aef714dca36d84a466006c69b18 |
|
MD5 | 0a4b91f7a5d50eac3d02e5a2a5ba6fb5 |
|
BLAKE2b-256 | 43b7570134f0e4891dfeb5153f6b4d1c3a8a82d9fb839afebc2dfd257ab13d50 |
File details
Details for the file pdfdol-0.1.10-py3-none-any.whl
.
File metadata
- Download URL: pdfdol-0.1.10-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5641db5ccb9f3e6f2e7a2bebf32298db28851431a743607cccca0782783460dd |
|
MD5 | ddb9bcbc5f49054ada04cd247ea73afb |
|
BLAKE2b-256 | 73f6f8c8761d9936dfd142d1f949b5cc88ad93cfcac463841a052687b0dba6e4 |