Skip to main content

Pythonic API for parsing PDF files

Project description

Info:

See the tutorials & documentation for more information.

Author & Maintainer:

Maksym Polshcha <maxp@sterch.net>

See GitHub for the latest source.

About

pdfreader is a Pythonic API for:
  • extracting texts, images and other data from PDF documents (plain or protected)

  • accessing different objects within PDF documents

pdfreader is NOT a tool (maybe one day it become!):
  • to create or update PDF files

  • to split PDF files into pages or other pieces

  • convert PDFs to any other format

Nevertheless it can be used as a part of such tools.

See Tutorials & Documentation.

Features

  • Extracts texts (plain text and formatted text objects)

  • Extract PDF forms data (pure strings and formatted text objects)

  • Supports all PDF encodings, CMap, predefined cmaps.

  • Extracts images and image masks as Pillow/PIL Images

  • Supports encrypted and password-protected PDF documents

  • Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)

  • Follows PDF-1.7 specification

  • Lazy objects access allows to process huge PDF documents quite fast

Installation

pdfreader can be installed with pip:

$ python -m pip install pdfreader

Or easy_install from setuptools:

$ python -m easy_install pdfreader

You can also download the project source and do:

$ python setup.py install

Tutorial and Documentation

Tutorial, real-life examples and documentation

Support, Bugs & Feature Requests

pdfreader uses GitHub issues to keep track of bugs, feature requests, etc.

References

Donation

If this project is helpful, you can treat me to coffee :-)

https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfreader-0.1.15.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

pdfreader-0.1.15-py3-none-any.whl (135.6 kB view details)

Uploaded Python 3

File details

Details for the file pdfreader-0.1.15.tar.gz.

File metadata

  • Download URL: pdfreader-0.1.15.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for pdfreader-0.1.15.tar.gz
Algorithm Hash digest
SHA256 2ee1252cc5f21a2f8cadb458decd85c1313271abb5bac1e4363a3e0e17e2dd87
MD5 16934adfd2b9d1bdc86b965c40b6eb44
BLAKE2b-256 d1a730d10f94d700b6779de8c079d77cec21b7de497d9f4339b64f580cc4afaf

See more details on using hashes here.

File details

Details for the file pdfreader-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: pdfreader-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 135.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for pdfreader-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 bfd0d29c0d70a81b767b42b6959dd588a8290086c8c72a828739c1e2bda07eba
MD5 b3f22d940a73cf4bf1b5c34e5680c544
BLAKE2b-256 4966b26fe92f2088f763e043d285ebeb0bb9353348bb9c7acb8ea0b237fd5342

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page