Skip to main content

document mover, for moving scans stable and merge pdfs

Project description

document-mover

A Python-based automated PDF merging and file movement workflow, particularly useful for scanning operations.

Features

PDF Merger (pdf-merger)

  • Merge two PDFs with alternating page order (page1, page2, page3, page4, etc.)
  • Delete source files after successful merge (optional)
  • Remove empty pages from the merged PDF (optional)
  • CLI interface for easy command-line usage

Consecutive PDF Merger (process_pdf_merge)

  • Auto-detect consecutive PDFs based on filename numbering
  • Merge pairs of consecutively numbered files automatically
  • Process entire folders with batch operations
  • Dry-run mode to preview operations before executing

Document Mover (document-mover)

  • File stability checking - ensures files are fully written before processing
  • Selective file movement - separate handling for regular and dual-sided files
  • Dual-sided file pairing - automatically pairs and merges dual-sided scans
  • Atomic operations - safe file movements with proper permission handling
  • Comprehensive logging - detailed tracking of all operations

Installation

pdm install

Usage

PDF Merger

Merge two PDF files:

pdm run pdf-merger file1.pdf file2.pdf output.pdf

With options:

pdm run pdf-merger file1.pdf file2.pdf output.pdf --delete-source --remove-empty-pages --verbose

Options:

  • --delete-source: Delete source PDF files after successful merge
  • --remove-empty-pages: Remove empty pages from the merged PDF
  • --verbose: Enable verbose logging

Consecutive PDF Merger

Process a folder and merge consecutive PDFs:

pdm run process_pdf_merge /path/to/folder

With options:

pdm run process_pdf_merge /path/to/folder --dest-folder /path/to/dest --dry-run --verbose

Options:

  • --dest-folder: Directory for merged PDF output (defaults to source folder)
  • --dry-run: Preview operations without executing
  • --verbose: Enable verbose logging

Document Mover

Move processed documents and automatically merge dual-sided PDFs:

pdm run document-mover --source-dir /path/to/source --dest-dir /path/to/dest

With options:

pdm run document-mover --source-dir /path/to/source --dest-dir /path/to/dest --dual-side-prefix "dual-side" --stability-wait 5 --max-age 30 --verbose

Options:

  • --source-dir: Source directory containing files to process (required)
  • --dest-dir: Destination directory for processed files (required)
  • --dual-side-prefix: Prefix for identifying dual-sided files (default: "double-sided")
  • --user-id: User ID for file ownership (default: current user)
  • --group-id: Group ID for file ownership (default: current group)
  • --stability-wait: Seconds to wait before checking file stability (default: 10)
  • --max-age: Maximum file age in minutes before forcing move (default: 10)
  • --dry-run: Preview operations without moving files
  • --verbose: Enable verbose logging

Configuration

Ruff Formatting

The project is configured to use ruff with a line length of 120 characters. This is set in pyproject.toml:

[tool.ruff]
line-length = 120

The VS Code editor is also configured to display a ruler at column 120 for consistency.

Project Structure

document-mover/
├── src/
│   └── document_mover/
│       ├── __init__.py
│       ├── pdf_merger.py          # PDF merging utilities
│       ├── document_mover.py      # Main file movement system
│       └── file_lock.py           # File locking utilities
├── tests/
│   └── __init__.py
├── pyproject.toml                 # Project configuration & dependencies
└── README.md

Requirements

  • Python 3.12+
  • pypdf >= 6.4.1
  • ruff >= 0.14.8
  • mypy >= 1.19.0
  • pytest >= 9.0.2
  • pytest-cov >= 7.0.0

Development

Running Tests

pdm run pytest

Code Quality

Format code with ruff:

pdm run ruff format .

Type checking with mypy:

pdm run mypy src/

Use Cases

Scanning Workflow

  1. Scan dual-sided documents as separate files (page 1, page 2, etc.)
  2. Run document-mover to automatically pair and merge dual-sided files
  3. Process results to final destination with proper organization

PDF Organization

  1. Use consecutive PDF merger to automatically merge numbered PDF sets
  2. Delete source files after successful merge to save space
  3. Remove empty pages for cleaner final documents

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_mover-0.3.5.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

document_mover-0.3.5-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file document_mover-0.3.5.tar.gz.

File metadata

  • Download URL: document_mover-0.3.5.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.26.2 CPython/3.13.2 Linux/6.18.1-1-cachyos-bore

File hashes

Hashes for document_mover-0.3.5.tar.gz
Algorithm Hash digest
SHA256 aa6f71ccb1eb8bbf7497157cc667b8c6f295f8bbf90a963199d2bc425e9f6b43
MD5 020d008f8d134e5334555d12af1f315b
BLAKE2b-256 513ae740ccf4c3482a01e9d0a984fb24e6e4c424f2e736c4fbf13fe0d8b6c9e6

See more details on using hashes here.

File details

Details for the file document_mover-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: document_mover-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.26.2 CPython/3.13.2 Linux/6.18.1-1-cachyos-bore

File hashes

Hashes for document_mover-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a42f516f4012d4d2f1829b4d008a28f6f95ad9f69b30b8b1e0d62b3be2e0f50d
MD5 5630dd42ab35436be3cd3cf1c26551df
BLAKE2b-256 d6af7343c0abaca2becc422b914cac9ebaf0fe4a76a6dc37a67baea837a86159

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page