A utility package for PDF processing, including splitting, merging, and page counting

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

HJIMI PDF Processor

PDF Processor is a powerful PDF file processing toolkit that provides various PDF file manipulation functions.

Main Features

PDF File Splitting
- Split by file size
- Split by page count
- Split by bookmarks (supports first-level bookmarks)
PDF File Merging
- Support merging multiple PDF files
- Maintain original page content and format
- Error handling and logging
PDF File Information
- Get total page count
- Filename normalization

Features

Easy to use: Provides intuitive static method interfaces
Flexible configuration: Supports custom split sizes and page counts
Error handling: Comprehensive exception handling and error messages
File safety: Automatic temporary file cleanup

Requirements

Python 3.8 or higher
PyPDF2 3.0.0 or higher

Installation

pip install hjimi-pdf-processor

Usage Examples

1. Get PDF Page Count

from pdf_processor import PDFProcessor

# Get single file page count
page_count = PDFProcessor.get_pdf_page_count("document.pdf")
print(f"PDF pages: {page_count}")

# Get multiple file page counts
pdf_files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
for file in pdf_files:
    count = PDFProcessor.get_pdf_page_count(file)
    print(f"{file} pages: {count}")

2. Split PDF Files

# Split by page count
PDFProcessor.split_pdf_by_pages("large_doc.pdf", pages_per_split=10)

# Split by file size (in KB)
PDFProcessor.split_pdf_by_size("large_doc.pdf", max_size_kb=1024)

# Split by bookmarks
PDFProcessor.split_pdf_by_bookmarks("book.pdf")

3. Merge PDF Files

# Merge multiple PDF files
pdf_files = ["chapter1.pdf", "chapter2.pdf", "chapter3.pdf"]
PDFProcessor.merge_pdfs(pdf_files, "merged_document.pdf")

Use Cases

File Splitting
- Split large PDF files for easier transmission
- Split textbooks or documents by chapters (bookmarks)
- Split documents by fixed page count for printing
File Merging
- Merge multiple scanned documents
- Combine report or article sections
- Integrate multiple PDF files into a single document
File Processing
- Batch retrieve PDF file information
- Normalize PDF filenames
- Control PDF file sizes

API Documentation

PDFProcessor Class Methods

1. sanitize_filename(filename: str) -> str

Cleans illegal characters from filenames.

Parameters:
- filename: Original filename
Returns: Cleaned legal filename
Usage: Handles filenames containing special characters, replacing illegal characters with underscores

2. get_pdf_page_count(file_path: str) -> int

Gets the total page count of a PDF file.

Parameters:
- file_path: PDF file path
Returns: Total PDF pages, None if error occurs
Exception Handling: Catches and prints file reading errors

3. split_pdf_by_size(input_file: str, max_size_kb: int) -> None

Splits PDF file by size.

Parameters:
- input_file: Input PDF file path
- max_size_kb: Maximum size for each split file (KB)
Output Format: original_filename_part_number.pdf
Features: Auto-cleans temporary files, displays real-time progress

4. split_pdf_by_pages(input_file: str, pages_per_split: int) -> None

Splits PDF file by page count.

Parameters:
- input_file: Input PDF file path
- pages_per_split: Pages per split file
Output Format: original_filename_part_number.pdf
Features: Shows split progress and page ranges

5. split_pdf_by_bookmarks(input_file: str) -> None

Splits PDF file by first-level bookmarks.

Parameters:
- input_file: Input PDF file path
Output Format: original_filename_part_number_bookmark_name.pdf
Limitations: Only supports first-level bookmark splitting
Features: Automatically handles illegal characters in bookmark names

6. merge_pdfs(pdf_files: List[str], output_path: str) -> None

Merges multiple PDF files.

Parameters:
- pdf_files: List of PDF file paths
- output_path: Output file path
Features:
- Maintains original page content and format
- Single file failure doesn't affect overall merge
- Detailed error logging

Notes

File Operations
- Ensure sufficient disk space
- Keep original file backups
- Be aware of filename conflicts
Performance Considerations
- Large file processing may take time
- Test with small files first
- Monitor memory usage
Limitations
- Does not support encrypted PDF files
- Only supports first-level bookmark splitting
- Some special PDF formats may not be compatible

License

MIT License

Contact

Author: wenquanshan
Email: wenquanshan@sximi.com
Project Homepage: https://github.com/zidanewenqsh/pdf_processor
Issue Tracking: https://github.com/zidanewenqsh/pdf_processor/issues

Contributing

We welcome issue reports and feature suggestions. To contribute code:

Fork the repository
Create your feature branch
Commit your changes
Ensure all tests pass
Submit a pull request

Changelog

v0.1.0

Initial release
Implemented basic PDF splitting and merging functions
Added file information retrieval features

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.2

Feb 11, 2025

0.1.1

Feb 11, 2025

This version

0.1.0

Feb 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hjimi_pdf_processor-0.1.0.tar.gz (5.9 kB view details)

Uploaded Feb 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hjimi_pdf_processor-0.1.0-py3-none-any.whl (6.3 kB view details)

Uploaded Feb 11, 2025 Python 3

File details

Details for the file hjimi_pdf_processor-0.1.0.tar.gz.

File metadata

Download URL: hjimi_pdf_processor-0.1.0.tar.gz
Upload date: Feb 11, 2025
Size: 5.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for hjimi_pdf_processor-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3843ca175a10affd41f50ea6aff979e2ee1a3b90bda25bf29656934c5ad2d3fb`
MD5	`100d28f9d5b5cd0032f0286eaf0dc71e`
BLAKE2b-256	`a2f86774e56f4a087b8f298f62d2f91a44ba4eca5a31fc88e32381d52131822e`

See more details on using hashes here.

File details

Details for the file hjimi_pdf_processor-0.1.0-py3-none-any.whl.

File metadata

Download URL: hjimi_pdf_processor-0.1.0-py3-none-any.whl
Upload date: Feb 11, 2025
Size: 6.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for hjimi_pdf_processor-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3025fd506997d8e09ac29600f40b7a78dba67175b4773773ec5a9ca203edec99`
MD5	`e0d6a6ab3453af54c0cfdcb1f505ac7b`
BLAKE2b-256	`00d237ed0fd51f2b8f25433b5c0744cd24e562e6508cf0bcb045e520c7e845b3`

See more details on using hashes here.

hjimi-pdf-processor 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HJIMI PDF Processor

Main Features

Features

Requirements

Installation

Usage Examples

1. Get PDF Page Count

2. Split PDF Files

3. Merge PDF Files

Use Cases

API Documentation

PDFProcessor Class Methods

1. sanitize_filename(filename: str) -> str

2. get_pdf_page_count(file_path: str) -> int

3. split_pdf_by_size(input_file: str, max_size_kb: int) -> None

4. split_pdf_by_pages(input_file: str, pages_per_split: int) -> None

5. split_pdf_by_bookmarks(input_file: str) -> None

6. merge_pdfs(pdf_files: List[str], output_path: str) -> None

Notes

License

Contact

Contributing

Changelog

v0.1.0

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes