Skip to main content

No project description provided

Project description

What's that

When you want to scan documents on both sides but your automatic document feeder (ADF) only scan one side, then this project may help you.

If you just want to merge two PDFs in the correct order once and have pdftk installed on your machine, the command is pdftk A=first.pdf B=second.pdf shuffle A Bend-1 output collated.pdf (adapt names to your situation). This project does it automagically for you.

How does it work

Watch PDFs files created in SOURCE_DIRECTORY. The first one will be used as is. The second one will be used with its page in reverse order (because we flip the document and start scanning from the end). The resulting PDF file will be created in DESTINATION_DIRECTORY.

How common problems are solved:

  • Merging a new PDF with an old one: There is a timeout COLLATE_TIMEOUT that runs from the moment the first PDF is done writing (event IN_CLOSE_WRITE). If a new PDF is created (event IN_CREATE) before the timeout ends, then this new PDF is understood to be the second one. Otherwise (timeout passed), the new PDF becomes the first one and the previous one is evicted with a timeout warning.
  • Merging incompatible PDFs: The number of pages should be equal for PDFs to be merged. If it's not the case, the second PDF replaces the first with a warning.

Limitations:

  • Depends on inotify, so it can be used only on Linux (Docker can solve that).
  • Cannot distinguish between PDFs coming from your scanner and those created differently (eg. copied or temporary file). Set SOURCE_DIRECTORY to a directory where your scanner is the only one to write to, with no subdirectory. Also, don't set DESTINATION_DIRECTORY to the same directory.

Installation and configuration

The Docker image is available as cranium/pdfcollate.

Also available is a Python package you can download with pip install pdfcollate.

My usage of the project:

  • I have a NAS with two SAMBA directories: one for single-sided scans (/Scans), and the other for two-sided scans (/DuplexScans).
  • My NAS docker-compose uses the project's Dockerfile and sets two volumes /DuplexScans:/files and /Scans:/output.
  • My scanner has the two SAMBA directories as possible scan destinations. When I want to scan both sides: I put the document in the ADF, select scan to duplex directory (this scans one side), then retrieve the document from the tray, put it on its flip side, and select scan to duplex directory again. Once the scan is done, PDFCollate finds both documents and creates the collated document in the destination directory.

Environment variables used by the Python script:

  • SOURCE_DIRECTORY: Directory watched for new PDF files
  • DESTINATION_DIRECTORY: Where the collated PDF will be created
  • COLLATE_TIMEOUT: How much time before we consider two PDFs to be unrelated.
  • OUTPUT_NAME_SUFFIX: Added to the output PDF name between the document name and .pdf

Why

Necessity is the mother of innovation. And I needed to scan both sides without too much hassle.

TODOs (don't hesitate to make a PR!)

  • Upgrade alpine: Stuck at alpine:3.8 because it has the pdftk binary.
  • Document utilisation: Can be used as pure Python, as a Docker image, or in a docker-compose file
  • Add CLI arguments for configuration: For improved flexibility
  • Add tests: Making sure we do the right thing in every case.
  • Remove old files: Once the merge is successful, we can remove the two old PDFs.

Done

  • Make it a Python package: Would enable one-off use. Eg: python3 -m pdfcollate
  • Publish image to Docker registry: Easier installation and docker-compose integration~~
  • Improve file permissions: We should copy the input file permissions to the output files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfcollate-0.1.4.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

pdfcollate-0.1.4-py3.10.egg (10.3 kB view details)

Uploaded Source

File details

Details for the file pdfcollate-0.1.4.tar.gz.

File metadata

  • Download URL: pdfcollate-0.1.4.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pdfcollate-0.1.4.tar.gz
Algorithm Hash digest
SHA256 cf3fe32eb7660426a77360414e107dcc00b5accaef8c2d716a1c7eb1c6b1a9b2
MD5 bb83310ab9f1238ab685856d4dafd06d
BLAKE2b-256 69fe3615ec4cc7ad95ea551ae63b25e02f826fb1ae53a79afd3822dcd4014f2b

See more details on using hashes here.

File details

Details for the file pdfcollate-0.1.4-py3.10.egg.

File metadata

  • Download URL: pdfcollate-0.1.4-py3.10.egg
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pdfcollate-0.1.4-py3.10.egg
Algorithm Hash digest
SHA256 81f0edee496b43765cc8d7ebb450655e128e8ef6b35743dfe63ca4970957c73a
MD5 dadc07e84a676c562fcaa27327553f94
BLAKE2b-256 1ffd9eee40ca4719a1e9afae714e55d4461155cd9a0c83e1629808af0677daec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page