Simple PDF text extraction
Project description
# pdftotext
[](https://pypi.python.org/pypi/pdftotext)
[](https://travis-ci.org/jalan/pdftotext)
[](https://coveralls.io/github/jalan/pdftotext?branch=master)
Simple PDF text extraction
```python
import pdftotext
# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
pdf = pdftotext.PDF(f)
# If it's password-protected
with open("secure.pdf", "rb") as f:
pdf = pdftotext.PDF(f, "secret")
# How many pages?
print(len(pdf))
# Iterate over all the pages
for page in pdf:
print(page)
# Read some individual pages
print(pdf[0])
print(pdf[1])
# Read all the text into one string
print("\n\n".join(pdf))
```
## OS Dependencies
Debian, Ubuntu, and friends:
```
sudo apt-get update
sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev
```
Fedora, Red Hat, and friends:
```
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config
```
macOS:
```
brew install pkg-config poppler
```
## Install
```
pip install pdftotext
```
[](https://pypi.python.org/pypi/pdftotext)
[](https://travis-ci.org/jalan/pdftotext)
[](https://coveralls.io/github/jalan/pdftotext?branch=master)
Simple PDF text extraction
```python
import pdftotext
# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
pdf = pdftotext.PDF(f)
# If it's password-protected
with open("secure.pdf", "rb") as f:
pdf = pdftotext.PDF(f, "secret")
# How many pages?
print(len(pdf))
# Iterate over all the pages
for page in pdf:
print(page)
# Read some individual pages
print(pdf[0])
print(pdf[1])
# Read all the text into one string
print("\n\n".join(pdf))
```
## OS Dependencies
Debian, Ubuntu, and friends:
```
sudo apt-get update
sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev
```
Fedora, Red Hat, and friends:
```
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config
```
macOS:
```
brew install pkg-config poppler
```
## Install
```
pip install pdftotext
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdftotext-2.1.1.tar.gz
(112.9 kB
view details)
File details
Details for the file pdftotext-2.1.1.tar.gz.
File metadata
- Download URL: pdftotext-2.1.1.tar.gz
- Upload date:
- Size: 112.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/35.0.1 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/2.7.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3ad11efe0aa22cbfc46aa1296b2ea5a52ad208b778288311f2801adef178ccb
|
|
| MD5 |
be525c7a29ce6b1fad1bd8285ba906b2
|
|
| BLAKE2b-256 |
213560094dbadd9de2035873390b1cac25e01da605844eba6a07a53a82fa4adc
|