Data Object Layer for PDF data
Project description
pdfdol
Data Object Layer for PDF data
To install: pip install pdfdol
Examples
Get a dict-like object to list and read the pdfs of a folder, as text:
>>> from pdfdol import PdfFilesReader
>>> from pdfdol.tests import get_test_pdf_folder
>>> folder_path = get_test_pdf_folder()
>>> pdfs = PdfFilesReader(folder_path)
>>> sorted(pdfs)
['sample_pdf_1', 'sample_pdf_2']
>>> assert pdfs['sample_pdf_2'] == [
... 'Page 1\nThis is a sample text for testing Python PDF tools.'
... ]
See that the values of a PdfFilesReader
are lists of pages.
If you need strings (i.e. all the pages together) you can add a decoder like so:
from dol import add_decoder
page_separator = '---------------------'
pdfs = add_decoder(pdfs, decoder=page_separator.join)
If you need this at the level of the class, just do this:
from dol import add_decoder
page_separator = '---------------------'
FilesReader = add_decoder(PdfFilesReader, decoder=page_separator.join)
# and then
pdfs = FilesReader(folder_path)
# ...
If you need to concatinate a bunch of pdfs together, you can do so in many ways. Here's one:
from dol import Files
from pdfdol import concat_pdfs
s = Files('~/Downloads/cosmograph_documentation_pdfs/')
concat_pdfs(s, key_order=sorted)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdfdol-0.1.11.tar.gz
(6.6 kB
view details)
Built Distribution
File details
Details for the file pdfdol-0.1.11.tar.gz
.
File metadata
- Download URL: pdfdol-0.1.11.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cee1702eab25f6cc1ba9909c5785b0694850def05a68e954d141864355aad7f |
|
MD5 | 02c0c5f9f4e030ebbeba7a60b03c7932 |
|
BLAKE2b-256 | 5f11c199841803301fdd295e6a8aabb5eb93aff600808210f7d009f32a81e67e |
File details
Details for the file pdfdol-0.1.11-py3-none-any.whl
.
File metadata
- Download URL: pdfdol-0.1.11-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb90fc79a15c447770de72348f9f3d1bd1515f819d82620c8cd1fe8012af0b37 |
|
MD5 | bc2e9a2f20a7d727dd36dfff59fd317a |
|
BLAKE2b-256 | 24f3daa12c2238cf82b13f1d23e0e9b28618f86e9e3c5d74e07a6a9927d24600 |