PDF parser and analyzer

These details have not been verified by PyPI

Project links

Homepage

Project description

pdfminer.six

We fathom PDF

Pdfminer.six is a community maintained fork of the original PDFMiner. It is a tool for extracting information from PDF documents. It focuses on getting and analyzing text data. Pdfminer.six extracts the text from a page directly from the sourcecode of the PDF. It can also be used to get the exact location, font or color of the text.

It is built in a modular way such that each component of pdfminer.six can be replaced easily. You can implement your own interpreter or rendering device that uses the power of pdfminer.six for other purposes than text analysis.

Check out the full documentation on Read the Docs.

Features

Written entirely in Python.
Parse, analyze, and convert PDF documents.
Extract content as text, images, html or hOCR.
Support for PDF-1.7 specification (well, almost).
Support for CJK languages and vertical writing.
Support for various font types (Type1, TrueType, Type3, and CID) support.
Support for extracting embedded images (JPG, PNG, TIFF, JBIG2, bitmaps).
Support for decoding various compressions (ASCIIHexDecode, ASCII85Decode, LZWDecode, FlateDecode, RunLengthDecode, CCITTFaxDecode)
Support for RC4 and AES encryption.
Support for AcroForm interactive form extraction.
Table of contents extraction.
Tagged contents extraction.
Automatic layout analysis.

How to use

Install Python 3.10 or newer.
Install pdfminer.six.
```
pip install pdfminer.six
```
(Optionally) install extra dependencies for extracting images.
```
pip install 'pdfminer.six[image]'
```
Use the command-line interface to extract text from pdf.
```
pdf2txt.py example.pdf
```

Or use it with Python.

from pdfminer.high_level import extract_text

text = extract_text("example.pdf")
print(text)

Contributing

We welcome contributions! Whether you want to fix a bug, add a feature, or improve documentation, your help is appreciated.

Please note that as a community-maintained project with limited maintainer availability, the best way to get an issue resolved is to submit a pull request yourself.

To get started:

Read CONTRIBUTING.md for setup instructions and coding standards
Check out the open issues to find something to work on
Join the discussion on Gitter if you have questions

Acknowledgement

This repository includes code from pyHanko ; the original license has been included here.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

20260107

Jan 7, 2026

20251230

Dec 30, 2025

This version

20251229

Dec 29, 2025

20251228

Dec 28, 2025

20251227

Dec 27, 2025

20251107

Nov 7, 2025

20250506

May 6, 2025

20250416

Apr 16, 2025

20250327

Mar 27, 2025

20250324

Mar 24, 2025

20240706

Jul 6, 2024

20231228

Dec 28, 2023

20221105

Nov 5, 2022

20220524

May 24, 2022

20220506

May 6, 2022

20220319

Mar 19, 2022

20211012

Oct 12, 2021

20201018

Oct 18, 2020

20200726

Jul 26, 2020

20200720

Jul 20, 2020

20200517

May 17, 2020

20200402

Apr 1, 2020

20200401

Apr 1, 2020

20200124

Jan 24, 2020

20200121

Jan 21, 2020

20200104

Jan 4, 2020

20191110

Nov 10, 2019

20191107

Nov 7, 2019

20191020

Oct 20, 2019

20181108

Nov 8, 2018

20170720

Jul 20, 2017

20170419

Apr 20, 2017

20170418

Apr 18, 2017

20160614

Jun 14, 2016

20160202

Feb 2, 2016

20151013

Oct 13, 2015

20140915

Sep 15, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfminer_six-20251229.tar.gz (7.5 MB view details)

Uploaded Dec 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdfminer_six-20251229-py3-none-any.whl (5.6 MB view details)

Uploaded Dec 29, 2025 Python 3

File details

Details for the file pdfminer_six-20251229.tar.gz.

File metadata

Download URL: pdfminer_six-20251229.tar.gz
Upload date: Dec 29, 2025
Size: 7.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdfminer_six-20251229.tar.gz
Algorithm	Hash digest
SHA256	`60c6d8745de92e02a06cff7edc82125aed70b88f139cde72c3a22a45d044c6c6`
MD5	`81d4e731cc78c792e850fd523441638b`
BLAKE2b-256	`9dd0c62b1802f1eebe04648e720b3a04ced58b4a7f1387923cf5f1b175ded86b`

See more details on using hashes here.

File details

Details for the file pdfminer_six-20251229-py3-none-any.whl.

File metadata

Download URL: pdfminer_six-20251229-py3-none-any.whl
Upload date: Dec 29, 2025
Size: 5.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdfminer_six-20251229-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d1856f8d013ad733dc4e55c8c68553eaf5e17ced3d3e2122a2f59a2a3e9c675`
MD5	`f42a9f17b74c044a1f52b7120bad9e20`
BLAKE2b-256	`f07c0f0189516525329e000f0c1fa81ee5c4d1000e718e731ff6a87c084793c6`

See more details on using hashes here.

pdfminer.six 20251229

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pdfminer.six

Features

How to use

Contributing

Acknowledgement

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes