Skip to main content

A package to help manage pdf pages, images and their conversions during different NLP, CV or other tasks to avoid repetitive code blocks and give a simple function call to make it happen

Project description

PdfSnipper

A package to help manage PDF pages, images, and their conversions during different NLP, CV, or other tasks to avoid repetitive code blocks and provide a simple function call for operations.


Installation

To install PdfSnipper, use:

pip install -i https://test.pypi.org/simple/ pdf-snip

Dependencies

If you face an error involving poppler-utils

  1. For Google Colab:

    !apt-get install -y poppler-utils
    
  2. For Ubuntu/Debian:

    sudo apt install poppler-utils
    
  3. For Windows:
    Download the latest release from here. After installation in /ProgramFiles, set the PATH environment variable:

    import os
    os.environ['PATH'] += os.pathsep + r'C:\path\to\poppler\bin'
    

Features

1. Remove First N Pages

Removes the first N pages from all PDFs in a folder.

remove_first_pages(input_folder: str, output_folder: str, pages_to_remove: int)

Arguments

  • input_folder: Path to the folder containing PDFs.
  • output_folder: Path to save modified PDFs.
  • pages_to_remove: Number of pages to remove from the start.

Usage

from PDFSNIPPER import remove_first_pages
remove_first_pages('/content/input', '/content/output', 2)

2. Remove Last N Pages

Removes the last N pages from all PDFs in a folder.

remove_last_pages(input_folder: str, output_folder: str, pages_to_remove: int)

Arguments

  • input_folder: Path to the folder containing PDFs.
  • output_folder: Path to save modified PDFs.
  • pages_to_remove: Number of pages to remove from the end.

Usage

from PDFSNIPPER import remove_last_pages
remove_last_pages('/content/input', '/content/output', 3)

3. Remove Pages Outside a Specified Range

Keeps only the pages within a specified range [start_page, end_page] inclusive, removing all others.

remove_pages_outside_range(input_folder: str, output_folder: str, start_page: int, end_page: int)

Arguments

  • input_folder: Path to the folder containing PDFs.
  • output_folder: Path to save modified PDFs.
  • start_page: First page to keep (0-indexed).
  • end_page: Last page to keep (0-indexed).

Usage

from PDFSNIPPER import remove_pages_outside_range
remove_pages_outside_range('/content/input', '/content/output', 2, 5)

4. Save Specific Pages

Saves only specific pages from PDFs into a new folder.

save_specific_pages(input_folder: str, output_folder: str, pages_to_save: list)

Arguments

  • input_folder: Path to the folder containing PDFs.
  • output_folder: Path to save modified PDFs.
  • pages_to_save: List of page numbers (0-indexed) to keep.

Usage

from PDFSNIPPER import save_specific_pages
save_specific_pages('/content/input', '/content/output', [0, 2, 3])

5. Save Pages as Images

Saves specific pages as PNG images in a new folder.

save_pages_as_images(input_folder: str, output_folder: str, pages_to_save: list)

Arguments

  • input_folder: Path to the folder containing PDFs.
  • output_folder: Path to save PNG images.
  • pages_to_save: List of page numbers (0-indexed) to save as images.

Usage

from PDFSNIPPER import save_pages_as_images
save_pages_as_images('/content/input', '/content/output', [0, 2, 4])

6. Split PDF

Splits each page of a PDF into individual PDF files.

split_pdf(input_folder: str, output_folder: str)

Arguments

  • input_folder: Path to the folder containing PDFs.
  • output_folder: Path to save split PDFs.

Usage

from PDFSNIPPER import split_pdf
split_pdf('/content/input', '/content/output')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_snip-0.0.3.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_snip-0.0.3-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file pdf_snip-0.0.3.tar.gz.

File metadata

  • Download URL: pdf_snip-0.0.3.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for pdf_snip-0.0.3.tar.gz
Algorithm Hash digest
SHA256 f3ba665b7b10c50196060bf104614fad0f9f46eb70f021434b5b7efe64187e9c
MD5 0b710782dc671d404b1b668614467bdb
BLAKE2b-256 87bba6210234efd6b6bd285c0be95b05fe7eea6d559ad565d3228fbc36266b54

See more details on using hashes here.

File details

Details for the file pdf_snip-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: pdf_snip-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for pdf_snip-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 474f6cd44574b9369c2893c1c02720d2faf23d94368fc8f72de66a0599f58b38
MD5 dee389d370f80b010a7345b583af60cd
BLAKE2b-256 f1b494f79e7983902579b1e8e2ea8403d850080526ed0cff862cdc2ebecb63ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page