PDF parser and analyzer
Project description
pdfminer.rtl
This is a fork of pdfminer.six that attempts to add RTL support with python-bidi. This version is experimental and probably buggy. Please don't rely on it for critical projects.
Check out the full original documentation on Read the Docs.
Features
- (Added RTL support)
- Written entirely in Python.
- Parse, analyze, and convert PDF documents.
- Extract content as text, images, html or hOCR.
- PDF-1.7 specification support. (well, almost).
- CJK languages and vertical writing scripts support.
- Various font types (Type1, TrueType, Type3, and CID) support.
- Support for extracting images (JPG, JBIG2, Bitmaps).
- Support for various compressions (ASCIIHexDecode, ASCII85Decode, LZWDecode, FlateDecode, RunLengthDecode, CCITTFaxDecode)
- Support for RC4 and AES encryption.
- Support for AcroForm interactive form extraction.
- Table of contents extraction.
- Tagged contents extraction.
- Automatic layout analysis.
How to use
-
Install Python 3.8 or newer.
-
Install pdfminer.rtl.
pip install pdfminer.rtl
-
(Optionally) install extra dependencies for extracting images.
pip install 'pdfminer.rtl[image]'
-
Use the command-line interface to extract text from pdf.
pdf2txt.py example.pdf
-
Or use it with Python.
from pdfminer.high_level import extract_text
text = extract_text("example.pdf")
print(text)
Acknowledgement
This repository includes code from pyHanko
; the original license has been included here and to all the other contirbutors of the original project see here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pdfminer.rtl-0.0.2.dev6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f85653ac166d44dc3b1b68d91a110a99cefe35d2f0a4e06f9bf31534a03050a3 |
|
MD5 | a3358ac9c9a5a6673083116cd660de20 |
|
BLAKE2b-256 | 73418effec6a2de86f1c29379d233464888b1fe2e7e11f3363661f175a474ce5 |