Pythonic API for parsing PDF files
Project description
- Info:
See the tutorials & documentation for more information.
See GitHub for the latest source.
About
- pdfreader is a Pythonic API for:
extracting texts, images and other data from PDF documents (plain or protected)
accessing different objects within PDF documents
- pdfreader is NOT a tool (maybe one day it become!):
to create or update PDF files
to split PDF files into pages or other pieces
convert PDFs to any other format
Nevertheless it can be used as a part of such tools.
Features
Extracts texts (plain text and formatted text objects)
Extract PDF forms data (pure strings and formatted text objects)
Supports all PDF encodings, CMap, predefined cmaps.
Extracts images and image masks as Pillow/PIL Images
Supports encrypted and password-protected PDF documents
Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
Follows PDF-1.7 specification
Lazy objects access allows to process huge PDF documents quite fast
Installation
pdfreader can be installed with pip:
$ python -m pip install pdfreader
Or easy_install from setuptools:
$ python -m easy_install pdfreader
You can also download the project source and do:
$ python setup.py install
Tutorial and Documentation
Support, Bugs & Feature Requests
pdfreader uses GitHub issues to keep track of bugs, feature requests, etc.
References
Donation
If this project is helpful, you can treat me to coffee :-)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdfreader-0.1.15.tar.gz
.
File metadata
- Download URL: pdfreader-0.1.15.tar.gz
- Upload date:
- Size: 2.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ee1252cc5f21a2f8cadb458decd85c1313271abb5bac1e4363a3e0e17e2dd87 |
|
MD5 | 16934adfd2b9d1bdc86b965c40b6eb44 |
|
BLAKE2b-256 | d1a730d10f94d700b6779de8c079d77cec21b7de497d9f4339b64f580cc4afaf |
File details
Details for the file pdfreader-0.1.15-py3-none-any.whl
.
File metadata
- Download URL: pdfreader-0.1.15-py3-none-any.whl
- Upload date:
- Size: 135.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfd0d29c0d70a81b767b42b6959dd588a8290086c8c72a828739c1e2bda07eba |
|
MD5 | b3f22d940a73cf4bf1b5c34e5680c544 |
|
BLAKE2b-256 | 4966b26fe92f2088f763e043d285ebeb0bb9353348bb9c7acb8ea0b237fd5342 |