for performing a wide range of operations on PDF files, including merging, splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.
Project description
PDF Ghost
PDF Ghost is a Python library designed for performing a wide range of operations on PDF files, including merging,
splitting, rotating, compressing, watermarking, converting, encrypting/decrypting, extracting text/images, adding page
numbers, batch processing, and comparing PDFs. It also supports generating PDFs from Markdown or LaTeX files.
Features
- Merge PDFs: Combine multiple PDFs into a single file.
- Split PDFs: Split a PDF into smaller files based on page ranges.
- Remove Pages: Remove specific pages with page index, remove page from start and end.
- Rotate Pages: Rotate all or specific pages in a PDF.
- Insert Pages: Insert pages or specific pages in a PDF.
- Rearrange Pages: Rearrange pages of a pdf file or merge pdf files and then rearrange all the pages.
- Compress PDFs: Reduce the file size of a PDF by optimizing images and removing unnecessary metadata.
- Watermarking: Add or remove text or image watermarks to PDFs.
- Image to PDF: Covert Images to PDF file.
- PDF to Image: Convert pages of a PDF file to images.
- Encrypt/Decrypt PDFs: Add password protection to PDFs and decrypt them with the correct password.
- Extract Text/Images: Extract text or images from a PDF.
- Add Page Numbers: Insert page numbers at the bottom or top of each page.
- Convert PDFs to HTML: Convert PDFs into structured HTML files.
- Generate PDFs from Markdown/LaTeX: Convert Markdown or LaTeX files into well-formatted PDFs.
- Compare PDFs: Identify differences between two PDF files.
- PDF Signing: Add digital signatures to PDFs using cryptographic certificates.
- Batch Processing: Apply operations (merge, split, rotate, etc.) on multiple PDFs at once.
Installation
Python Requirements
- Python 3.7+
Install via pip
pip install pdfghost
External Dependencies
For Markdown-to-PDF and LaTeX-to-PDF conversion, the following external tools are required:
- Pandoc: For converting Markdown to PDF.
- BasicTeX: A lightweight LaTeX distribution for converting LaTeX to PDF.
Installing Pandoc
MacOS
If you have Homebrew installed, run:
brew install pandoc
Linux (Debian/Ubuntu)
sudo apt-get update
sudo apt-get install pandoc
Windows
Download the Pandoc installer from the official website here and follow the installation instructions.
Installing BasicTeX
MacOS
- Download BasicTeX from here.
- Install it by following the on-screen instructions.
- Add the following to your
.bashrcor.zshrcfile:export PATH="/usr/local/texlive/2023/bin/universal-darwin:$PATH"
Linux (Debian/Ubuntu)
- Install
texlive(a full LaTeX distribution):sudo apt-get update sudo apt-get install texlive
Windows
- Download and install MiKTeX (a lightweight LaTeX distribution) from here.
- Follow the installation instructions.
Usage
Merge PDFs
from pdfghost import merge_pdfs
merge_pdfs("output.pdf", "file1.pdf", "file2.pdf")
Split PDF
from pdfghost import split_pdf
split_pdf("input.pdf", "output_folder", split_range=(0, 2))
Remove Specific Pages
from pdfghost import remove_pages
# Remove pages with indices 0, 2, and 4 (0-based)
remove_pages("input.pdf", "output.pdf", pages_to_remove=[0, 2, 4])
Remove Pages from Start
from pdfghost import remove_pages_from_start
# Remove the first 3 pages
remove_pages_from_start("input.pdf", "output.pdf", num_pages=3)
Remove Pages from End
from pdfghost import remove_pages_from_end
# Remove the last 2 pages
remove_pages_from_end("input.pdf", "output.pdf", num_pages=2)
Rotate Pages
from pdfghost import rotate_pdf
# Rotate all pages by 90 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=90)
# Rotate specific pages by 180 degrees
rotate_pdf("input.pdf", "output.pdf", rotation=180, pages_to_rotate=[0, 2])
Insert Pages
from pdfghost import insert_pages
# Insert pages at specific positions
insertions = [
(1, "insert1.pdf"), # Insert pages from insert1.pdf at position 1
(4, "insert2.pdf"), # Insert pages from insert2.pdf at position 4
]
insert_pages("input.pdf", "output.pdf", insertions)
Rearrange Pages
from pdfghost import rearrange_pdf
# Rearrange pages in a PDF
page_order = [2, 0, 1] # New order: Page 3, Page 1, Page 2
rearrange_pdf("input.pdf", "output.pdf", page_order)
Merge and Rearrange Pages
from pdfghost import merge_and_rearrange
# Merge multiple PDFs and rearrange their pages
page_order = [
(0, 0), # Page 1 from file1.pdf
(1, 0), # Page 1 from file2.pdf
(0, 1), # Page 2 from file1.pdf
]
merge_and_rearrange("output.pdf", page_order, "file1.pdf", "file2.pdf")
Compress PDF
from pdfghost import compress_pdf
# Compress a PDF with medium compression
compress_pdf("input.pdf", "output.pdf", power=3)
# Compress a PDF with maximum compression
compress_pdf("input.pdf", "output.pdf", power=5)
Add Text Watermark
from pdfghost import add_text_watermark
# Add a text watermark to all pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential")
# Add a text watermark to specific pages
add_text_watermark("input.pdf", "output.pdf", text="Confidential", pages_to_watermark=[0, 2])
Add Image Watermark
from pdfghost import add_image_watermark
# Add an image watermark to all pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png")
# Add an image watermark to specific pages
add_image_watermark("input.pdf", "output.pdf", image_path="watermark.png", pages_to_watermark=[1])
Remove Watermark
from pdfghost import remove_watermark
# Remove watermarks from all pages
remove_watermark("input.pdf", "output.pdf")
# Remove watermarks from specific pages
remove_watermark("input.pdf", "output.pdf", pages_to_clean=[0, 2])
Convert PDF to Images
from pdfghost import pdf_to_images
# Convert each page of a PDF into PNG images
pdf_to_images("input.pdf", "output_folder", format="png")
# Convert each page of a PDF into JPG images
pdf_to_images("input.pdf", "output_folder", format="jpg")
Convert Images to PDF
from pdfghost import images_to_pdf
# Convert multiple image files into a single PDF
images_to_pdf("output.pdf", "image1.png", "image2.jpg")
Encrypt PDF
from pdfghost import encrypt_pdf
# Encrypt a PDF with a password
encrypt_pdf("input.pdf", "output.pdf", password="mypassword")
Decrypt PDF
from pdfghost import decrypt_pdf
# Decrypt a PDF with a password
decrypt_pdf("input.pdf", "output.pdf", password="mypassword")
Extract Text
from pdfghost import extract_text
# Extract text from a PDF and save it as a .txt file
extract_text("input.pdf", "output.txt", format="txt")
# Extract text from a PDF and save it as a .csv file
extract_text("input.pdf", "output.csv", format="csv")
Extract Images
from pdfghost import extract_images
# Extract all images from a PDF and save them as separate image files
extract_images("input.pdf", "output_folder")
Add Page Numbers
from pdfghost import add_page_numbers
# Add page numbers at the bottom of each page
add_page_numbers("input.pdf", "output.pdf", position="bottom")
# Add page numbers at the top of each page
add_page_numbers("input.pdf", "output.pdf", position="top")
Convert PDF to HTML
from pdfghost import pdf_to_html
# Convert a PDF into a structured HTML file
pdf_to_html("input.pdf", "output.html")
Convert Markdown to PDF
from pdfghost import markdown_to_pdf
# Convert a Markdown file into a PDF
markdown_to_pdf("input.md", "output.pdf")
Convert LaTeX to PDF
from pdfghost import latex_to_pdf
# Convert a LaTeX file into a PDF
latex_to_pdf("input.tex", "output.pdf")
Compare PDFs
from pdfghost import compare_pdfs
# Compare two PDFs and generate a summary of differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="summary")
print(result)
# Compare two PDFs with side-by-side output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="side_by_side")
print(result)
# Compare two PDFs with highlighted differences
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="highlight_differences")
print(result)
# Compare two PDFs with version control-style output
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="version_control")
print(result)
# Compare two PDFs with annotations
result = compare_pdfs("file1.pdf", "file2.pdf", output_type="annotations")
print(result)
Sign PDFs
from pdfghost import sign_pdf
# Sign a PDF with a cryptographic certificate
sign_pdf("input.pdf", "signed.pdf", "certificate.pfx", password="mypassword")
Batch Processing
from pdfghost import batch_process, rotate_pdf
# Rotate all PDFs in a folder by 90 degrees
batch_process("input_folder", "output_folder", rotate_pdf, rotation=90)
Testing
To run unit tests, first install the development dependencies, and then use:
python -m unittest discover tests/
Contributing
- Fork the repository.
- Create your feature branch (
git checkout -b feature/your-feature). - Commit your changes (
git commit -am 'Add new feature'). - Push to the branch (
git push origin feature/your-feature). - Open a new Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_ghost-0.1.0.tar.gz.
File metadata
- Download URL: pdf_ghost-0.1.0.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ceceb9f43e8e33b2aecb69803f89b3f26c8615bff15f21475f5bf09e97a4ce7
|
|
| MD5 |
ce745a6e69a9f6afcda96dc36be05830
|
|
| BLAKE2b-256 |
2ec3bec197b4f3c570ef6b791b5d2365a93f1eb4707119aac93f3f529ee41e17
|
File details
Details for the file pdf_ghost-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pdf_ghost-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fc21e63e33ff408ccf0971a933188c8c9042751e5358ee20c6357311a8dd2b7
|
|
| MD5 |
d85286fdab655e018e44f37e055628b5
|
|
| BLAKE2b-256 |
5a1b534cf2884820dd08f0a4baf4114acba20f4c61c0b0c79137f65651d09217
|