Skip to main content

Lightweight command-line and Python API toolkit for PDF text extraction, encryption, permissions, and more.

Project description

PyPI Documentation Status CI codecov

PdfHandlerETC

PdfHandlerETC is a lightweight command-line and Python toolkit for handling common PDF tasks including text extraction, encryption, decryption, permissions inspection, word counting, page resizing, and file merging.

This project is released under the CC0 1.0 Public Domain Dedication.

Features

  • Extract text from PDFs by page or range
  • Encrypt and decrypt PDFs with customizable permissions
  • Count words across entire documents or selected pages
  • Inspect encryption status and permissions
  • Resize page dimensions
  • Merge two PDFs with optional visual separators (blank page or black bar)
  • Detect duplicate PDFs based on text content
  • Includes both a Python API and command-line interface (CLI)

Installation

Install from PyPI:

pip install pdfhandleretc

Command-Line Usage

After installation, you can use the pdfhandler CLI tool:

python -m pdfhandler wordcount document.pdf --pages "1, 3" > document_text.txt
python -m pdfhandler encrypt document.pdf --output secure.pdf
python -m pdfhandler decrypt secure.pdf --in-place
python -m pdfhandler permissions secure.pdf
python -m pdfhandler resize document.pdf 612 792 --output resized.pdf
python -m pdfhandler dupe-check file1.pdf file2.pdf
python -m pdfhandler merge intro.pdf appendix.pdf merged.pdf --add-separator black
python -m pdfhandler extract document.pdf --pages "1-3, 5"

Use --help for details:

python -m pdfhandler --help
python -m pdfhandler extract --help

Python Usage

from pdfhandler import PdfHandler

handler = PdfHandler("example.pdf")

# Extract text
text = handler.get_pdf_text("1-2, 4")
print(text)

# Word count
print("Words:", handler.word_count("1-3"))

# Encrypt the file
handler.encrypt(output="example-encrypted.pdf")

# Show permissions
handler.print_permissions()

# Resize pages
handler.resize(width=612, height=792, output_path="resized.pdf")

# Merge with a visual separator (black bar or blank page)
PdfHandler.merge_pdfs(
    "intro.pdf",
    "appendix.pdf",
    "merged.pdf",
    add_separator=True,
    separator_type="black"  # or "blank"
)

License

This project is licensed under the CC0 1.0 Universal public domain dedication. You may use, modify, and distribute it freely without attribution or restriction.

Dependencies

  • pdfminer.six - for text extraction
  • pikepdf - for encryption and PDF manipulation
  • colorama - for cross-platform terminal colors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfhandleretc-0.1.4.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfhandleretc-0.1.4-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file pdfhandleretc-0.1.4.tar.gz.

File metadata

  • Download URL: pdfhandleretc-0.1.4.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for pdfhandleretc-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ba86576a54a2e04295e7119a33df612145bbead2dc3fa6d3cf0d6f081e19f8fd
MD5 185fe866f67bb7f85ca15b1ae8e9bea3
BLAKE2b-256 28346711d22816035f974ff1848d975bfdbd78e523159910e764aefb529025e3

See more details on using hashes here.

File details

Details for the file pdfhandleretc-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: pdfhandleretc-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for pdfhandleretc-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 37af66d56b8e64960225d8180a55b2cab6c99d50fdcc54b14b1a32ae9813e6cc
MD5 df8cae3835fee047088dc96f7c9f5356
BLAKE2b-256 7e1b90dcc5059574103eb550c92434a2e9fd51a85fcee51474ec299edf9a0a82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page