Skip to main content

Simple PDF text extraction

Project description

pdftotext

PyPI Status Travis Status Azure Status AppVeyor status Coverage Status Downloads

Simple PDF text extraction

import pdftotext

# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# If it's password-protected
with open("secure.pdf", "rb") as f:
    pdf = pdftotext.PDF(f, "secret")

# How many pages?
print(len(pdf))

# Iterate over all the pages
for page in pdf:
    print(page)

# Read some individual pages
print(pdf[0])
print(pdf[1])

# Read all the text into one string
print("\n\n".join(pdf))

OS Dependencies

These instructions assume you're using Python 3 on a recent OS. Package names may differ for Python 2 or for an older OS.

Debian, Ubuntu, and friends

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

Fedora, Red Hat, and friends

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel

macOS

brew install pkg-config poppler python

Windows

Currently tested only when using conda:

  • Install the Microsoft Visual C++ Build Tools
  • Install poppler through conda:
    conda install -c conda-forge poppler
    

Install

pip install pdftotext

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdftotext-2.1.6.tar.gz (99.5 kB view details)

Uploaded Source

File details

Details for the file pdftotext-2.1.6.tar.gz.

File metadata

  • Download URL: pdftotext-2.1.6.tar.gz
  • Upload date:
  • Size: 99.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.9

File hashes

Hashes for pdftotext-2.1.6.tar.gz
Algorithm Hash digest
SHA256 caf8ddbaeaf0a5897f07655a71747242addab2e695e84c5d47f2ea92dfe2a594
MD5 b91afa4d5168bd477b4e6e1aef22e08c
BLAKE2b-256 aa8d172052e9d618f6029dee0f8c34143c70b967889d4c4b2f1848c93269c066

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page