Skip to main content

Simple PDF text extraction

Project description

# pdftotext

[![PyPI Status](https://img.shields.io/pypi/v/pdftotext.svg)](https://pypi.python.org/pypi/pdftotext)
[![Build Status](https://travis-ci.org/jalan/pdftotext.svg?branch=master)](https://travis-ci.org/jalan/pdftotext)
[![Coverage Status](https://coveralls.io/repos/github/jalan/pdftotext/badge.svg?branch=master)](https://coveralls.io/github/jalan/pdftotext?branch=master)

Simple PDF text extraction

```python
import pdftotext

# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
pdf = pdftotext.PDF(f)

# If it's password-protected
with open("secure.pdf", "rb") as f:
pdf = pdftotext.PDF(f, "secret")

# How many pages?
print(len(pdf))

# Iterate over all the pages
for page in pdf:
print(page)

# Read some individual pages
print(pdf[0])
print(pdf[1])

# Read all the text into one string
print("\n\n".join(pdf))
```


## OS Dependencies

Debian, Ubuntu, and friends:

```
sudo apt-get update
sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev
```

Fedora, Red Hat, and friends:

```
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config
```

macOS:

```
brew install pkg-config poppler
```


## Install

```
pip install pdftotext
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdftotext-2.1.1.tar.gz (112.9 kB view details)

Uploaded Source

File details

Details for the file pdftotext-2.1.1.tar.gz.

File metadata

  • Download URL: pdftotext-2.1.1.tar.gz
  • Upload date:
  • Size: 112.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/35.0.1 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/2.7.15

File hashes

Hashes for pdftotext-2.1.1.tar.gz
Algorithm Hash digest
SHA256 e3ad11efe0aa22cbfc46aa1296b2ea5a52ad208b778288311f2801adef178ccb
MD5 be525c7a29ce6b1fad1bd8285ba906b2
BLAKE2b-256 213560094dbadd9de2035873390b1cac25e01da605844eba6a07a53a82fa4adc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page