Skip to main content

for performing a wide range of operations on PDF files, including merging, splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.

Project description

PDF Ghost

PDF Ghost is a Python library designed for performing a wide range of operations on PDF files, including merging, splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.

Features

  • Merge PDFs: Combine multiple PDFs into a single file.
  • Split PDFs: Split a PDF into smaller files based on page ranges.
  • Remove Pages: Remove specific pages with page index, remove page from start and end.
  • Rotate Pages: Rotate all or specific pages in a PDF.
  • Insert Pages: Insert pages or specific pages in a PDF.
  • Rearrange Pages: Rearrange pages of a pdf file or merge pdf files and then rearrange all the pages.
  • Compress PDFs: Reduce the file size of a PDF by optimizing images and removing unnecessary metadata.
  • Watermarking: Add or remove text or image watermarks to PDFs.
  • Image to PDF: Covert Images to PDF file.
  • PDF to Image: Convert pages of a PDF file to images.
  • Encrypt/Decrypt PDFs: Add password protection to PDFs and decrypt them with the correct password.
  • Extract Text/Images: Extract text or images from a PDF.
  • Add Page Numbers: Insert page numbers at the bottom or top of each page.
  • Convert PDFs to HTML: Convert PDFs into structured HTML files.
  • Generate PDFs from Markdown/LaTeX: Convert Markdown or LaTeX files into well-formatted PDFs.
  • Compare PDFs: Identify differences between two PDF files.
  • PDF Signing: Add digital signatures to PDFs using cryptographic certificates.
  • Batch Processing: Apply operations (merge, split, rotate, etc.) on multiple PDFs at once.

Installation

Python Requirements

  • Python 3.7+

Install via pip

pip install pdfghost

External Dependencies

For Markdown-to-PDF and LaTeX-to-PDF conversion, the following external tools are required:

  1. Pandoc: For converting Markdown to PDF.
  2. BasicTeX: A lightweight LaTeX distribution for converting LaTeX to PDF.

Installing Pandoc

MacOS

If you have Homebrew installed, run:

brew install pandoc
Linux (Debian/Ubuntu)
sudo apt-get update
sudo apt-get install pandoc
Windows

Download the Pandoc installer from the official website here and follow the installation instructions.

Installing BasicTeX

MacOS
  1. Download BasicTeX from here.
  2. Install it by following the on-screen instructions.
  3. Add the following to your .bashrc or .zshrc file:
    export PATH="/usr/local/texlive/2023/bin/universal-darwin:$PATH"
    
Linux (Debian/Ubuntu)
  1. Install texlive (a full LaTeX distribution):
    sudo apt-get update
    sudo apt-get install texlive
    
Windows
  1. Download and install MiKTeX (a lightweight LaTeX distribution) from here.
  2. Follow the installation instructions.

Usage

Merge PDFs

from pdfghost import merge_pdfs

merge_pdfs("output.pdf", "file1.pdf", "file2.pdf")

Split PDF

from pdfghost import split_pdf

split_pdf("input.pdf", "output_folder", split_range=(0, 2))

Remove Specific Pages

from pdfghost import remove_pages

# Remove pages with indices 0, 2, and 4 (0-based)
remove_pages("input.pdf", "output.pdf", pages_to_remove=[0, 2, 4])

Remove Pages from Start

from pdfghost import remove_pages_from_start

# Remove the first 3 pages
remove_pages_from_start("input.pdf", "output.pdf", num_pages=3)

Remove Pages from End

from pdfghost import remove_pages_from_end

# Remove the last 2 pages
remove_pages_from_end("input.pdf", "output.pdf", num_pages=2)

Rotate Pages

from pdfghost import rotate_pdf

# Rotate all pages by 90 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=90)

# Rotate specific pages by 180 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=180, pages_to_rotate=[0, 2])

Insert Pages

from pdfghost import insert_pages

# Insert pages at specific positions
insertions = [
    (1, "insert1.pdf"),  # Insert pages from insert1.pdf at position 1
    (4, "insert2.pdf"),  # Insert pages from insert2.pdf at position 4
]
insert_pages("input.pdf", "output.pdf", insertions)

Rearrange Pages

from pdfghost import rearrange_pdf

# Rearrange pages in a PDF
page_order = [2, 0, 1]  # New order: Page 3, Page 1, Page 2
rearrange_pdf("input.pdf", "output.pdf", page_order)

Merge and Rearrange Pages

from pdfghost import merge_and_rearrange

# Merge multiple PDFs and rearrange their pages
page_order = [
    (0, 0),  # Page 1 from file1.pdf
    (1, 0),  # Page 1 from file2.pdf
    (0, 1),  # Page 2 from file1.pdf
]
merge_and_rearrange("output.pdf", page_order, "file1.pdf", "file2.pdf")

Compress PDF

from pdfghost import compress_pdf

# Compress a PDF with medium compression
compress_pdf("input.pdf", "output.pdf", power=3)

# Compress a PDF with maximum compression
compress_pdf("input.pdf", "output.pdf", power=5)

Add Text Watermark

from pdfghost import add_text_watermark

# Add a text watermark to all pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential")

# Add a text watermark to specific pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential", pages_to_watermark=[0, 2])

Add Image Watermark

from pdfghost import add_image_watermark

# Add an image watermark to all pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png")

# Add an image watermark to specific pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png", pages_to_watermark=[1])

Remove Watermark

from pdfghost import remove_watermark

# Remove watermarks from all pages
remove_watermark("input.pdf", "output.pdf")

# Remove watermarks from specific pages
remove_watermark("input.pdf", "output.pdf", pages_to_clean=[0, 2])

Convert PDF to Images

from pdfghost import pdf_to_images

# Convert each page of a PDF into PNG images
pdf_to_images("input.pdf", "output_folder", format="png")

# Convert each page of a PDF into JPG images
pdf_to_images("input.pdf", "output_folder", format="jpg")

Convert Images to PDF

from pdfghost import images_to_pdf

# Convert multiple image files into a single PDF
images_to_pdf("output.pdf", "image1.png", "image2.jpg")

Encrypt PDF

from pdfghost import encrypt_pdf

# Encrypt a PDF with a password
encrypt_pdf("input.pdf", "output.pdf", password="mypassword")

Decrypt PDF

from pdfghost import decrypt_pdf

# Decrypt a PDF with a password
decrypt_pdf("input.pdf", "output.pdf", password="mypassword")

Extract Text

from pdfghost import extract_text

# Extract text from a PDF and save it as a .txt file
extract_text("input.pdf", "output.txt", format="txt")

# Extract text from a PDF and save it as a .csv file
extract_text("input.pdf", "output.csv", format="csv")

Extract Images

from pdfghost import extract_images

# Extract all images from a PDF and save them as separate image files
extract_images("input.pdf", "output_folder")

Add Page Numbers

from pdfghost import add_page_numbers

# Add page numbers at the bottom of each page
add_page_numbers("input.pdf", "output.pdf", position="bottom")

# Add page numbers at the top of each page
add_page_numbers("input.pdf", "output.pdf", position="top")

Convert PDF to HTML

from pdfghost import pdf_to_html

# Convert a PDF into a structured HTML file
pdf_to_html("input.pdf", "output.html")

Convert Markdown to PDF

from pdfghost import markdown_to_pdf

# Convert a Markdown file into a PDF
markdown_to_pdf("input.md", "output.pdf")

Convert LaTeX to PDF

from pdfghost import latex_to_pdf

# Convert a LaTeX file into a PDF
latex_to_pdf("input.tex", "output.pdf")

Compare PDFs

from pdfghost import compare_pdfs

# Compare two PDFs and generate a summary of differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="summary")
print(result)

# Compare two PDFs with side-by-side output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="side_by_side")
print(result)

# Compare two PDFs with highlighted differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="highlight_differences")
print(result)

# Compare two PDFs with version control-style output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="version_control")
print(result)

# Compare two PDFs with annotations
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="annotations")
print(result)

Sign PDFs

from pdfghost import sign_pdf

# Sign a PDF with a cryptographic certificate
sign_pdf("input.pdf", "signed.pdf", "certificate.pfx", password="mypassword")

Batch Processing

from pdfghost import batch_process, rotate_pdf

# Rotate all PDFs in a folder by 90 degrees
batch_process("input_folder", "output_folder", rotate_pdf, rotation=90)

Testing

To run unit tests, first install the development dependencies, and then use:

python -m unittest discover tests/

Contributing

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/your-feature).
  3. Commit your changes (git commit -am 'Add new feature').
  4. Push to the branch (git push origin feature/your-feature).
  5. Open a new Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_ghost-0.1.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_ghost-0.1.0-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file pdf_ghost-0.1.0.tar.gz.

File metadata

  • Download URL: pdf_ghost-0.1.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for pdf_ghost-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0ceceb9f43e8e33b2aecb69803f89b3f26c8615bff15f21475f5bf09e97a4ce7
MD5 ce745a6e69a9f6afcda96dc36be05830
BLAKE2b-256 2ec3bec197b4f3c570ef6b791b5d2365a93f1eb4707119aac93f3f529ee41e17

See more details on using hashes here.

File details

Details for the file pdf_ghost-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdf_ghost-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for pdf_ghost-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9fc21e63e33ff408ccf0971a933188c8c9042751e5358ee20c6357311a8dd2b7
MD5 d85286fdab655e018e44f37e055628b5
BLAKE2b-256 5a1b534cf2884820dd08f0a4baf4114acba20f4c61c0b0c79137f65651d09217

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page