Skip to main content

Pythonic API for parsing PDF files

Project description

Info:See the tutorials & documentation for more information.
Author & Maintainer:
 Maksym Polshcha <>

See GitHub for the latest source.


pdfreader is a Pythonic API for:
  • extracting texts, images and other data from PDF documents
  • accessing different objects within PDF documents
pdfreader is NOT a tool:
  • to create or update PDF files
  • to split PDF files into pages or other pieces
  • convert PDFs to any other format

Nevertheless it can be used as a part of such tools.

See Tutorials & Documentation.


  • Extracts texts (plain text and formatted text objects)
  • Extract PDF forms data (pure strings and formatted text objects)
  • Supports all PDF encodings, CMap, predefined cmaps.
  • Extracts images and image masks as Pillow/PIL Images
  • Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
  • Follows PDF-1.7 specification
  • Lazy objects access allows to process huge PDF documents quite fast


pdfreader can be installed with pip:

$ python -m pip install pdfreader

Or easy_install from setuptools:

$ python -m easy_install pdfreader

You can also download the project source and do:

$ python install

Support, Bugs & Feature Requests

pdfreader uses GitHub issues to keep track of bugs, feature requests, etc.


If this project is helpful, you can treat me to coffee :-)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pdfreader, version 0.1.5
Filename, size File type Python version Upload date Hashes
Filename, size pdfreader-0.1.5.tar.gz (2.8 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page