Skip to main content

Lightweight command-line and Python API toolkit for PDF text extraction, encryption, permissions, and more.

Project description

PyPI
Documentation Status
CI

PdfHandlerETC

PdfHandlerETC is a lightweight command-line and Python toolkit for handling common PDF tasks including text extraction, encryption, decryption, permissions inspection, word counting, page resizing, and file merging.

This project is released under the CC0 1.0 Public Domain Dedication.

Features

  • Extract text from PDFs by page or range
  • Encrypt and decrypt PDFs with customizable permissions
  • Count words across entire documents or selected pages
  • Inspect encryption status and permissions
  • Resize page dimensions
  • Merge two PDFs with optional visual separators (blank page or black bar)
  • Detect duplicate PDFs based on text content
  • Includes both a Python API and command-line interface (CLI)

Installation

Install from PyPI:

pip install pdfhandleretc

Or from source:

git clone https://github.com/yourusername/pdfhandleretc.git
cd pdfhandleretc
pip install .

Command-Line Usage

After installation, you can use the pdfhandler CLI tool:

pdfhandler extract document.pdf --pages "1-3, 5"
pdfhandler wordcount document.pdf --pages "1, 3"
pdfhandler encrypt document.pdf --output secure.pdf
pdfhandler decrypt secure.pdf --in-place
pdfhandler permissions secure.pdf
pdfhandler resize document.pdf 612 792 --output resized.pdf
pdfhandler dupe-check file1.pdf file2.pdf
pdfhandler merge intro.pdf appendix.pdf merged.pdf --add-separator black

Use --help for details:

pdfhandler --help
pdfhandler extract --help

Python Usage

from pdfhandler import PdfHandler, PdfHandlerUtils

handler = PdfHandler("example.pdf")

# Extract text
text = handler.get_pdf_text("1-2, 4")
print(text)

# Word count
print("Words:", handler.word_count("1-3"))

# Encrypt the file
handler.encrypt(output="example-encrypted.pdf")

# Show permissions
handler.print_permissions()

# Resize pages
handler.resize(width=612, height=792, output_path="resized.pdf")

# Merge with a visual separator (black bar or blank page)
PdfHandlerUtils.merge_pdfs(
    "intro.pdf",
    "appendix.pdf",
    "merged.pdf",
    add_separator="black"  # or "blank"
)

License

This project is licensed under the CC0 1.0 Universal public domain dedication. You may use, modify, and distribute it freely without attribution or restriction.

Dependencies

  • pdfminer.six - for text extraction
  • pikepdf - for encryption and PDF manipulation
  • colorama - for cross-platform terminal colors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfhandleretc-0.1.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfhandleretc-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file pdfhandleretc-0.1.0.tar.gz.

File metadata

  • Download URL: pdfhandleretc-0.1.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for pdfhandleretc-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5d9c10a955c45f95c65b9f506219f2185ef41846cfdb7ca32b4945f5ea52848a
MD5 567671f15c487534547838df8f78f2a9
BLAKE2b-256 fc10197a47e2701ecc476e0687e87c2377402b4892e9ee6f81fa15a176a445e3

See more details on using hashes here.

File details

Details for the file pdfhandleretc-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdfhandleretc-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for pdfhandleretc-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73a5f0c1c22ac8ec2a3f7d9c2c4ebc1dd9b1493aa5c8d61bc6593143943cb0f7
MD5 7cb499209c68ba2755993dd775639c36
BLAKE2b-256 d39cef0a1603725ab917f278b45f479445a3df5dacec989facec019843912ef4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page