A package to help manage pdf pages, images and their conversions during different NLP, CV or other tasks to avoid repetitive code blocks and give a simple function call to make it happen
Project description
PdfSnipper
A package to help manage PDF pages, images, and their conversions during different NLP, CV, or other tasks to avoid repetitive code blocks and provide a simple function call for operations.
Installation
To install PdfSnipper, use:
pip install -i https://test.pypi.org/simple/ pdf-snip
Dependencies
If you face an error involving poppler-utils
-
For Google Colab:
!apt-get install -y poppler-utils
-
For Ubuntu/Debian:
sudo apt install poppler-utils
-
For Windows:
Download the latest release from here. After installation in/ProgramFiles, set the PATH environment variable:import os os.environ['PATH'] += os.pathsep + r'C:\path\to\poppler\bin'
Features
1. Remove First N Pages
Removes the first N pages from all PDFs in a folder.
remove_first_pages(input_folder: str, output_folder: str, pages_to_remove: int)
Arguments
input_folder: Path to the folder containing PDFs.output_folder: Path to save modified PDFs.pages_to_remove: Number of pages to remove from the start.
Usage
from PDFSNIPPER import remove_first_pages
remove_first_pages('/content/input', '/content/output', 2)
2. Remove Last N Pages
Removes the last N pages from all PDFs in a folder.
remove_last_pages(input_folder: str, output_folder: str, pages_to_remove: int)
Arguments
input_folder: Path to the folder containing PDFs.output_folder: Path to save modified PDFs.pages_to_remove: Number of pages to remove from the end.
Usage
from PDFSNIPPER import remove_last_pages
remove_last_pages('/content/input', '/content/output', 3)
3. Remove Pages Outside a Specified Range
Keeps only the pages within a specified range [start_page, end_page] inclusive, removing all others.
remove_pages_outside_range(input_folder: str, output_folder: str, start_page: int, end_page: int)
Arguments
input_folder: Path to the folder containing PDFs.output_folder: Path to save modified PDFs.start_page: First page to keep (0-indexed).end_page: Last page to keep (0-indexed).
Usage
from PDFSNIPPER import remove_pages_outside_range
remove_pages_outside_range('/content/input', '/content/output', 2, 5)
4. Save Specific Pages
Saves only specific pages from PDFs into a new folder.
save_specific_pages(input_folder: str, output_folder: str, pages_to_save: list)
Arguments
input_folder: Path to the folder containing PDFs.output_folder: Path to save modified PDFs.pages_to_save: List of page numbers (0-indexed) to keep.
Usage
from PDFSNIPPER import save_specific_pages
save_specific_pages('/content/input', '/content/output', [0, 2, 3])
5. Save Pages as Images
Saves specific pages as PNG images in a new folder.
save_pages_as_images(input_folder: str, output_folder: str, pages_to_save: list)
Arguments
input_folder: Path to the folder containing PDFs.output_folder: Path to save PNG images.pages_to_save: List of page numbers (0-indexed) to save as images.
Usage
from PDFSNIPPER import save_pages_as_images
save_pages_as_images('/content/input', '/content/output', [0, 2, 4])
6. Split PDF
Splits each page of a PDF into individual PDF files.
split_pdf(input_folder: str, output_folder: str)
Arguments
input_folder: Path to the folder containing PDFs.output_folder: Path to save split PDFs.
Usage
from PDFSNIPPER import split_pdf
split_pdf('/content/input', '/content/output')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_snip-0.0.3.tar.gz.
File metadata
- Download URL: pdf_snip-0.0.3.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3ba665b7b10c50196060bf104614fad0f9f46eb70f021434b5b7efe64187e9c
|
|
| MD5 |
0b710782dc671d404b1b668614467bdb
|
|
| BLAKE2b-256 |
87bba6210234efd6b6bd285c0be95b05fe7eea6d559ad565d3228fbc36266b54
|
File details
Details for the file pdf_snip-0.0.3-py3-none-any.whl.
File metadata
- Download URL: pdf_snip-0.0.3-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
474f6cd44574b9369c2893c1c02720d2faf23d94368fc8f72de66a0599f58b38
|
|
| MD5 |
dee389d370f80b010a7345b583af60cd
|
|
| BLAKE2b-256 |
f1b494f79e7983902579b1e8e2ea8403d850080526ed0cff862cdc2ebecb63ba
|