Lightweight command-line and Python API toolkit for PDF text extraction, encryption, permissions, and more.
Project description
PdfHandlerETC
PdfHandlerETC is a lightweight command-line and Python toolkit for handling common PDF tasks including text extraction, encryption, decryption, permissions inspection, word counting, page resizing, and file merging.
This project is released under the CC0 1.0 Public Domain Dedication.
Features
- Extract text from PDFs by page or range
- Encrypt and decrypt PDFs with customizable permissions
- Count words across entire documents or selected pages
- Inspect encryption status and permissions
- Resize page dimensions
- Merge two PDFs with optional visual separators (blank page or black bar)
- Detect duplicate PDFs based on text content
- Includes both a Python API and command-line interface (CLI)
Installation
Install from PyPI:
pip install pdfhandleretc
Command-Line Usage
After installation, you can use the pdfhandler CLI tool:
python -m pdfhandler wordcount document.pdf --pages "1, 3" > document_text.txt
python -m pdfhandler encrypt document.pdf --output secure.pdf
python -m pdfhandler decrypt secure.pdf --in-place
python -m pdfhandler permissions secure.pdf
python -m pdfhandler resize document.pdf 612 792 --output resized.pdf
python -m pdfhandler dupe-check file1.pdf file2.pdf
python -m pdfhandler merge intro.pdf appendix.pdf merged.pdf --add-separator black
python -m pdfhandler extract document.pdf --pages "1-3, 5"
Use --help for details:
python -m pdfhandler --help
python -m pdfhandler extract --help
Python Usage
from pdfhandler import PdfHandler
handler = PdfHandler("example.pdf")
# Extract text
text = handler.get_pdf_text("1-2, 4")
print(text)
# Word count
print("Words:", handler.word_count("1-3"))
# Encrypt the file
handler.encrypt(output="example-encrypted.pdf")
# Show permissions
handler.print_permissions()
# Resize pages
handler.resize(width=612, height=792, output_path="resized.pdf")
# Merge with a visual separator (black bar or blank page)
PdfHandler.merge_pdfs(
"intro.pdf",
"appendix.pdf",
"merged.pdf",
add_separator=True,
separator_type="black" # or "blank"
)
License
This project is licensed under the CC0 1.0 Universal public domain dedication. You may use, modify, and distribute it freely without attribution or restriction.
Dependencies
- pdfminer.six - for text extraction
- pikepdf - for encryption and PDF manipulation
- colorama - for cross-platform terminal colors
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfhandleretc-0.1.4.tar.gz.
File metadata
- Download URL: pdfhandleretc-0.1.4.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba86576a54a2e04295e7119a33df612145bbead2dc3fa6d3cf0d6f081e19f8fd
|
|
| MD5 |
185fe866f67bb7f85ca15b1ae8e9bea3
|
|
| BLAKE2b-256 |
28346711d22816035f974ff1848d975bfdbd78e523159910e764aefb529025e3
|
File details
Details for the file pdfhandleretc-0.1.4-py3-none-any.whl.
File metadata
- Download URL: pdfhandleretc-0.1.4-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37af66d56b8e64960225d8180a55b2cab6c99d50fdcc54b14b1a32ae9813e6cc
|
|
| MD5 |
df8cae3835fee047088dc96f7c9f5356
|
|
| BLAKE2b-256 |
7e1b90dcc5059574103eb550c92434a2e9fd51a85fcee51474ec299edf9a0a82
|