Skip to main content

Simple PDF text extraction

Project description

pdftotext

PyPI Status Build Status Coverage Status Downloads

Simple PDF text extraction

import pdftotext

# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# If it's password-protected
with open("secure.pdf", "rb") as f:
    pdf = pdftotext.PDF(f, "secret")

# How many pages?
print(len(pdf))

# Iterate over all the pages
for page in pdf:
    print(page)

# Read some individual pages
print(pdf[0])
print(pdf[1])

# Read all the text into one string
print("\n\n".join(pdf))

OS Dependencies

Debian, Ubuntu, and friends:

sudo apt-get update
sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev

Fedora, Red Hat, and friends:

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config

macOS:

brew install pkg-config poppler

Conda users may also need libgcc:

conda install libgcc

Install

pip install pdftotext

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdftotext-2.1.2.tar.gz (113.3 kB view details)

Uploaded Source

File details

Details for the file pdftotext-2.1.2.tar.gz.

File metadata

  • Download URL: pdftotext-2.1.2.tar.gz
  • Upload date:
  • Size: 113.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.5

File hashes

Hashes for pdftotext-2.1.2.tar.gz
Algorithm Hash digest
SHA256 c8bdc47b08baa17b8e03ba1f960fc6335b183d2644eaf7300e088516758a6090
MD5 8dfdefaafd94b7f4a3073bb35fdc5c4f
BLAKE2b-256 a6a7c202adb0bcd3adc3030b0c5f7f0e21f62a721913e93296e6c4ddc305cbd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page