Skip to main content

Python script to make documents look like they were scanned.

Project description

look-like-scanned

Pylint PyTest license Contributions Welcome made-with-python

  • Python script to make documents look like they were scanned.

  • Local, Private, Secure, Open-Source and Transparent!

  • Converts every page of a given PDF file into an image-based page and applies random askew and brightness (very mild) effects to simulate the appearance of scanned documents.

  • The resulting pages are then combined back into an Output PDF file.

  • Granular CLI options to combine / convert image files into PDF as well.

  • Supports conversion of multi page TIFF files and password protected PDF files.

  • Importable as a Python Library. Use the core scanning engine directly in Python projects.

  • Output PDF files are saved in the same folder with a suffix "File_Name_output.pdf"

Installation

Install from the Python Package Index (PyPI)

pip install look-like-scanned

To handle HEIC/HEIF file types, install with optional dependencies:

pip install look-like-scanned[heif]

Or to install latest version from GitHub

git clone https://github.com/navchandar/look-like-scanned.git
cd look-like-scanned
pip install poetry
poetry install --all-extras
pip install .

Verify Installation:

# Print help message and usage options available
scanner -h

CLI Usage

This package uses PIL and pypdfium2 to convert and manipulate image and pdf objects.

This is extended to provide a command-line interface (CLI) for easy usage.

# Convert all pdf files in folder to scanned pdf
scanner -i .\tests
scanner -i .\tests -f "pdf"

# Convert all pdf files in folder to scanned pdf, set contrast, sharpness and brightness factors
scanner -i .\tests -c 2 -sh 10 -br 2

# Convert all pdf files in folder to scanned without askew
scanner -i .\tests -a no

# Convert specific pdf file in folder to scanned pdf
scanner -i .\tests -f "test.pdf"

# Convert all jpg, jpeg, png, webp files in folder to one pdf file
scanner -i .\tests -f "image"

# Convert all image files in folder in the order of file names
scanner -i .\tests -f "image" -s "name"

# Convert all png files in folder to pdf with 100% quality to one pdf file
scanner -i .\tests -f "png" -q 100

# Convert specific jpg file in folder to pdf with 75% quality to one pdf file
scanner -i .\tests -f "JPG_Test.jpg" -q 75

# Convert all PDF files including sub folders
scanner -i .\tests -f "pdf" -r yes

# Convert all Images including sub folders into one PDF
scanner -i .\tests -f "image" -r yes

# Convert all PDF files including sub folders and save in black & white format
scanner -i .\tests -f "pdf" -r yes -b yes

# Convert all png files including sub folders and make it a little blurry
scanner -i .\tests -f "png" -r yes -b yes -l yes

# Convert all pdf files with a slight amount of noise (grain)
scanner -i .\tests -f "pdf" -n 2

# Convert all pdf files with depth of field (uneven blur)
scanner -i .\tests -f "pdf" -v yes

# Add noise, uneven blur, and make it look like a photocopy
scanner -i .\tests -f "pdf" -n 20 -v yes -b yes

# Convert specific image with heavy noise
scanner -i .\tests -f "test.jpg" -n 50

# Simulate a "Bad/Old Scanner" (Low quality, high noise, blur, and high contrast)
scanner -i .\tests -q 75 -n 20 -l yes -c 1.5 -sh 0.8

# Simulate a "High-Quality/Modern Scanner" (High quality, slight noise for texture, sharpened)
scanner -i .\tests -q 95 -n 0 -a yes -sh 1.5 -br 1.1

# Target specific image formats only (e.g., just HEIC files from an iPhone)
scanner -i .\photos -f "heic"

# Convert specific locked PDF with password and save output without password
scanner -i .\secure_docs -f "Locked_doc.pdf" -p p@ss123

# Interactive Mode: Process encrypted PDFs by entering passwords when prompted
scanner -i .\secure_docs -f "pdf"

Library Usage

Import DocumentScanner into your own Python scripts for programmatic document processing.

from pathlib import Path
from scanner import DocumentScanner

# Initialize the scanner with custom settings
# Note: Use Python booleans (True/False) instead of CLI strings ("yes"/"no")
scanner = DocumentScanner(
    file_quality=90,
    askew=True,
    black_and_white=False,
    noise=15,
    brightness=1.2
)

# Option 1: Convert a single PDF
scanner.process_pdf(Path("path/to/document.pdf"))

# Option 2: Combine multiple images into one "scanned" PDF
image_list = [Path("page1.jpg"), Path("page2.png")]
scanner.process_images_to_one_pdf(image_list)

# Option 3: Find all PDFs in the folder and subfolders
scanner = DocumentScanner(recurse=True)
scanner.process_folder(Path("./my_docs"), file_type="pdf")

Arguments

These are the command-line arguments accepted:

  • -i, --input_folder : Specifies the input folder to read files from and convert. The default value is the current directory.

    • Example: -i /path/to/files or -i C:\files\documents
  • -f, --file_type_or_name : Specifies the file types to process or the file name to convert. The default value is "pdf" to convert all pdf files in the given input folder.

    • Example: -f pdf or -f image.jpg or -f image
  • -q, --file_quality : Specifies the quality of the converted output files. The value must be between 50 and 100. The default value is 95.

    • Example: -q 90
  • -a, --askew : Controls whether to make the output documents slightly askew or slightly tilted. Accepted values are "yes" or "no". The default value is "yes".

    • Example: -a yes or --askew no
  • -b, --black_and_white : Controls whether to save output documents in black and white format (to make it look like a photocopy) . Accepted values are "yes" or "no". The default value is "no".

    • Example: -b yes or --black_and_white no
  • -l, --blur : Controls whether to make the output a little bit blurry. Accepted values are "yes" or "no". The default value is "no".

    • Example: -l yes or --blur no
  • -v, --variation : Controls whether to apply a variable blur effect (depth of field simulation) to the image. This simulates a scanner lid that wasn't closed perfectly flat, causing one part of the document to be slightly out of focus. Accepted values are "yes" or "no". The default value is "no".

    • Example: -v yes or --variation no
  • -n, --noise : Controls the amount of salt-and-pepper noise added to the image to simulate dust or scanner sensor imperfections. The value must be an integer between 0 and 100. A value of 0 means no noise, while 50 is significantly noisy. The default value is 0.

    • Example: -n 10 or --noise 50
  • -c, --contrast : Controls contrast factor of the image. A factor of 0.0 gives a solid gray image. A factor of 1.0 gives the original image. Greater values increase the contrast of the image. The default value is 1.

    • Example: -c 2
  • -sh, --sharpness : Controls sharpness factor of the image. A factor of 0.0 gives a blurred image. A factor of 1.0 gives the original image. Greater values increase the sharpness of the image. The default value is 1.

    • Example: -sh 2
  • -br, --brightness : Controls brightness factor of the image. A factor of 0.0 gives a black image. A factor of 1.0 gives the original image. Greater values increase the brightness of the image. The default value is 1.

    • Example: -br 2
  • -r, --recurse : Allows scripts to find all matching files including subdirectories. Accepted values are "yes" or "no". The default value is "yes".

    • Example: -r yes or --recurse no
  • -s, --sort_by : Allows scripts to sort the files based on name, creation time or modified time. Accepted values are "name", "ctime", "mtime", "none". The default value is "name". If "none" is selected, then the default order of files returned by the OS is used for document conversion.

    • Example: -s name or --sort_by none
  • -p, --password : Password for decrypting locked PDF files. By default, if omitted, the script will pause and prompt you for a password whenever it encounters a locked file. Use this flag if all your PDF files share the same password. If files have different passwords, omit this and enter them one-by-one when prompted.

    • Example: -p p@ss123 or `--passw

❗❗ Note: ❗❗

  • The supported file types are: ".pdf", .jpg", ".jpeg", ".png", ".webp", ".tiff", ".tif", ".jp2", ".bmp"

  • The output PDF file size will be bigger than the input file because the pages are stored in image format.

  • When using the DocumentScanner class directly as a library, arguments like askew, blur, and black_and_white expect actual Python booleans (True / False) rather than the strings "yes" / "no" used in the CLI.

  • Bookmarks / Links / Metadata will be removed when saving the output file.

  • Transparency will be removed from png files when converting to pdf.

  • Password protected PDF files are also supported since v1.1.

  • Youtube: How to Insert a Signature on a PDF File

License

MIT license

Authors

Testing

Run tests with detailed output:

# Install package with latest changes
pip install -e .

# Run all tests
poetry run pytest -v

# Run specific tests
poetry run pytest -k="cli"

Support This Project

Paypal Badge BuymeCoffee Badge Ko-Fi Badge

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

look_like_scanned-1.2.0.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

look_like_scanned-1.2.0-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file look_like_scanned-1.2.0.tar.gz.

File metadata

  • Download URL: look_like_scanned-1.2.0.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for look_like_scanned-1.2.0.tar.gz
Algorithm Hash digest
SHA256 d63481d1ff2af0ce6397a0830e5d42f03d8fc4910a423d48fe73613f1005568e
MD5 6ac1f75b0d76a1e4cff0226bd832764b
BLAKE2b-256 4341212fd998cb3716744eb937f3e5bc1fc5586a2de3d82f601c9bddd9da9672

See more details on using hashes here.

File details

Details for the file look_like_scanned-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for look_like_scanned-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4928bc09abc42cab76e5341b7b3dae947ce4ed29f4c4f5d16a01670da0990395
MD5 0e453f47443e4750959fdc2afa7d8de3
BLAKE2b-256 79ccace76d93596cb376f764475b3f80436821576f3561aae55b3cb3e920a490

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page