No project description provided
Project description
What's that
When you want to scan documents on both sides but your automatic document feeder (ADF) only scan one side, then this project may help you.
If you just want to merge two PDFs in the correct order once and have pdftk installed on your machine, the command is pdftk A=first.pdf B=second.pdf shuffle A Bend-1 output collated.pdf
(adapt names to your situation). This project does it automagically for you.
How does it work
Watch PDFs files created in SOURCE_DIRECTORY
. The first one will be used as is. The second one will be used with its page in reverse order (because we flip the document and start scanning from the end). The resulting PDF file will be created in DESTINATION_DIRECTORY
.
How common problems are solved:
- Merging a new PDF with an old one: There is a timeout
COLLATE_TIMEOUT
that runs from the moment the first PDF is done writing (eventIN_CLOSE_WRITE
). If a new PDF is created (eventIN_CREATE
) before the timeout ends, then this new PDF is understood to be the second one. Otherwise (timeout passed), the new PDF becomes the first one and the previous one is evicted with a timeout warning. - Merging incompatible PDFs: The number of pages should be equal for PDFs to be merged. If it's not the case, the second PDF replaces the first with a warning.
Limitations:
- Depends on inotify, so it can be used only on Linux (Docker can solve that).
- Cannot distinguish between PDFs coming from your scanner and those created differently (eg. copied or temporary file). Set
SOURCE_DIRECTORY
to a directory where your scanner is the only one to write to, with no subdirectory. Also, don't setDESTINATION_DIRECTORY
to the same directory.
Installation and configuration
The Docker image is available as cranium/pdfcollate
.
Also available is a Python package you can download with pip install pdfcollate
.
My usage of the project:
- I have a NAS with two SAMBA directories: one for single-sided scans (
/Scans
), and the other for two-sided scans (/DuplexScans
). - My NAS docker-compose uses the project's Dockerfile and sets two volumes
/DuplexScans:/files
and/Scans:/output
. - My scanner has the two SAMBA directories as possible scan destinations. When I want to scan both sides: I put the document in the ADF, select scan to duplex directory (this scans one side), then retrieve the document from the tray, put it on its flip side, and select scan to duplex directory again. Once the scan is done, PDFCollate finds both documents and creates the collated document in the destination directory.
Environment variables used by the Python script:
SOURCE_DIRECTORY
: Directory watched for new PDF filesDESTINATION_DIRECTORY
: Where the collated PDF will be createdCOLLATE_TIMEOUT
: How much time before we consider two PDFs to be unrelated.OUTPUT_NAME_SUFFIX
: Added to the output PDF name between the document name and.pdf
Why
Necessity is the mother of innovation. And I needed to scan both sides without too much hassle.
TODOs (don't hesitate to make a PR!)
- Upgrade alpine: Stuck at alpine:3.8 because it has the pdftk binary.
- Document utilisation: Can be used as pure Python, as a Docker image, or in a docker-compose file
- Add CLI arguments for configuration: For improved flexibility
- Add tests: Making sure we do the right thing in every case.
- Remove old files: Once the merge is successful, we can remove the two old PDFs.
Done
- Make it a Python package: Would enable one-off use. Eg:
python3 -m pdfcollate
- Publish image to Docker registry: Easier installation and docker-compose integration~~
- Improve file permissions: We should copy the input file permissions to the output files.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdfcollate-0.1.4.tar.gz
.
File metadata
- Download URL: pdfcollate-0.1.4.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf3fe32eb7660426a77360414e107dcc00b5accaef8c2d716a1c7eb1c6b1a9b2 |
|
MD5 | bb83310ab9f1238ab685856d4dafd06d |
|
BLAKE2b-256 | 69fe3615ec4cc7ad95ea551ae63b25e02f826fb1ae53a79afd3822dcd4014f2b |
File details
Details for the file pdfcollate-0.1.4-py3.10.egg
.
File metadata
- Download URL: pdfcollate-0.1.4-py3.10.egg
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81f0edee496b43765cc8d7ebb450655e128e8ef6b35743dfe63ca4970957c73a |
|
MD5 | dadc07e84a676c562fcaa27327553f94 |
|
BLAKE2b-256 | 1ffd9eee40ca4719a1e9afae714e55d4461155cd9a0c83e1629808af0677daec |