Skip to main content

Small utility to prepare scanned documents. Supports separating PDF files by separator pages and removing blank pages.

Project description

scanprep – Prepare scanned PDF documents

Small utility to prepare scanned documents. Supports separating PDF files by separator pages and removing blank pages.

Scanprep can be used to prepare scanned documents for further processing with existing tools (like the great OCRmyPDF) or directly for archival. It allows splitting multiple documents that were scanned in a single batch into multiple files. In addition, it can also remove blank pages from the output (this is especially helpful if using a duplex scanner).

For document separation, separator pages need to be inserted between the different documents before scanning. These pages tell the program where to split. You can either use the included separator page or create your own. The separator page simply needs to have a barcode that encodes the text SCANPREP_SEP (you can use any barcode type supported by zbar).

Installation

From source

To install scanprep from source, simply clone this repository and install the dependencies:

git clone https://github.com/baltpeter/scanprep.git
cd scanprep
pip install -r requirements.txt # You may want to do this in a venv.
# You may also need to install the zbar shared library. See: https://pypi.org/project/pyzbar/

python3 scanprep.py -h

Usage

usage: scanprep.py [-h] [--page-separation] [--blank-removal] input_pdf [output_dir]

positional arguments:
  input_pdf             The PDF document to process.
  output_dir            The directory where the output documents will be saved. (defaults to the
                        current directory)

optional arguments:
  -h, --help            show this help message and exit
  --page-separation, --no-page-separation
                        Do (or do not) split document into separate files by the included
                        separator pages. (default yes)
  --blank-removal, --no-blank-removal
                        Do (or do not) remove empty pages from the output. (default yes)

License

Scanprep is licensed under the MIT license, see the LICENSE file for details. Issues and pull requests are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scanprep-1.0.1.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

scanprep-1.0.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file scanprep-1.0.1.tar.gz.

File metadata

  • Download URL: scanprep-1.0.1.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.23.0 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.6

File hashes

Hashes for scanprep-1.0.1.tar.gz
Algorithm Hash digest
SHA256 a84d33099dae628920cc57d0585cbf4acde4771f287c52da6aae239b8cc4650c
MD5 dff23c47538279689ea0f93ff0aa67b6
BLAKE2b-256 d07bb0be0b9de29b08bf7bb8a212f297c808f00632c9d825011d5c4b85b8db67

See more details on using hashes here.

File details

Details for the file scanprep-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: scanprep-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.23.0 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.6

File hashes

Hashes for scanprep-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3c8c8de82ff4079d2a181b665bdb8dca363fe7f70424d8d93f45a78cf4d00140
MD5 d1db72c0c63ae8a62aaaf0e1c4e71729
BLAKE2b-256 56a2e900bcfbb0fe14837a059f78d10c1e736669b70c6878ed9e431378ed63f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page