Skip to main content

Small utility to prepare scanned documents. Supports separating PDF files by separator pages and removing blank pages.

Project description

scanprep – Prepare scanned PDF documents

Small utility to prepare scanned documents. Supports separating PDF files by separator pages and removing blank pages.

Scanprep can be used to prepare scanned documents for further processing with existing tools (like the great OCRmyPDF) or directly for archival. It allows splitting multiple documents that were scanned in a single batch into multiple files. In addition, it can also remove blank pages from the output (this is especially helpful if using a duplex scanner).

For document separation, separator pages need to be inserted between the different documents before scanning. These pages tell the program where to split. You can either use the included separator page or create your own. The separator page simply needs to have a barcode that encodes the text SCANPREP_SEP (you can use any barcode type supported by zbar).

Installation

Via Snap

You can install scanprep from the Snap Store:

snap install scanprep

scanprep -h

Via PyPI

You can install scanprep using pip (consider doing that in a venv):

pip3 install scanprep

# If you see an error like "ImportError: Unable to find zbar shared library", you need to install zbar yourself. See: https://pypi.org/project/pyzbar/
scanprep -h

From source

To install scanprep from source, clone this repository and install the dependencies:

git clone https://github.com/baltpeter/scanprep.git
cd scanprep
pip3 install -r requirements.txt # You may want to do this in a venv.
# You may also need to install the zbar shared library. See: https://pypi.org/project/pyzbar/

python3 scanprep/scanprep.py -h

Usage

Most simply, you can run scanprep via scanprep <filename.pdf>. This will process the input file and output the results into your current working directory. To specify a different output directory, use scanprep <filename.pdf> <output_directory>.
The output files will be called 0-<filename.pdf>, 1-<filename.pdf>, and so on.

By default, both page separation and blank page removal will be performed. To turn them off, use --no-page-separation or --no-blank-removal, respectively.

Use scanprep -h to show the help:

usage: scanprep [-h] [--page-separation] [--blank-removal] input_pdf [output_dir]

positional arguments:
  input_pdf             The PDF document to process.
  output_dir            The directory where the output documents will be saved. (defaults to the
                        current directory)

optional arguments:
  -h, --help            show this help message and exit
  --page-separation, --no-page-separation
                        Do (or do not) split document into separate files by the included
                        separator pages. (default yes)
  --blank-removal, --no-blank-removal
                        Do (or do not) remove empty pages from the output. (default yes)

License

Scanprep is licensed under the MIT license, see the LICENSE file for details. Issues and pull requests are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scanprep-1.0.2.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

scanprep-1.0.2-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file scanprep-1.0.2.tar.gz.

File metadata

  • Download URL: scanprep-1.0.2.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.23.0 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.6

File hashes

Hashes for scanprep-1.0.2.tar.gz
Algorithm Hash digest
SHA256 f1e492380d6979d74abc5ea509891f635daa5cb36b12d5699b62b525f1c7d547
MD5 5d173c852bd44ee4e0c719582192d052
BLAKE2b-256 b68e1a198c72f2f122a10d5744b98b2704a9888441b50e526ef78c16123ba897

See more details on using hashes here.

File details

Details for the file scanprep-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: scanprep-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.23.0 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.6

File hashes

Hashes for scanprep-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 47fe24b372a13d74045ea2b2ff34681552fe7f3540f3fab3bd70d7b5bb023699
MD5 3df1b7f3387a7ffe3f346a4062cde89a
BLAKE2b-256 1730c404e3b60c883019022b6538ed8ee09be8cb7b7680b062a0e765dc8155f8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page