PDF parser and analyzer
Project description
pdfminer.rtl
This is a fork of pdfminer.six that attempts to add RTL support with python-bidi. This version is experimental and probably buggy. Please don't rely on it for critical projects.
Check out the full original documentation on Read the Docs.
Features
- (Added RTL support)
- Written entirely in Python.
- Parse, analyze, and convert PDF documents.
- Extract content as text, images, html or hOCR.
- PDF-1.7 specification support. (well, almost).
- CJK languages and vertical writing scripts support.
- Various font types (Type1, TrueType, Type3, and CID) support.
- Support for extracting images (JPG, JBIG2, Bitmaps).
- Support for various compressions (ASCIIHexDecode, ASCII85Decode, LZWDecode, FlateDecode, RunLengthDecode, CCITTFaxDecode)
- Support for RC4 and AES encryption.
- Support for AcroForm interactive form extraction.
- Table of contents extraction.
- Tagged contents extraction.
- Automatic layout analysis.
How to use
-
Install Python 3.8 or newer.
-
Install pdfminer.rtl.
pip install pdfminer.rtl
-
(Optionally) install extra dependencies for extracting images.
pip install 'pdfminer.rtl[image]'
-
Use the command-line interface to extract text from pdf.
pdf2txt.py example.pdf
-
Or use it with Python.
from pdfminer.high_level import extract_text
text = extract_text("example.pdf")
print(text)
Acknowledgement
This repository includes code from pyHanko
; the original license has been included here and to all the other contirbutors of the original project see here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pdfminer.rtl-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf493bb81ee21d06d3e42d59e632c7bbfbbeb76138e6d57539450f6d5d40b81f |
|
MD5 | 775b936b674a2efa376e4bf50489ee35 |
|
BLAKE2b-256 | 6fe8ec13b13fb448dd1ee75d81b0f8950b409f93a501afc9e2f33de7ca331b92 |