Merges PDF files into one using a YAML configuration file.
Project description
A simple, easy-to-script tool for merging multiple PDF files into one document using a YAML configuration file.
The program is written in Python, using the PyPDF4 library.
Purpose
There are many good utilities for splitting and merging PDF files. For instance, if you prefer one with a GUI, PDF Arranger is a good choice. However, I had slightly different requirements: * Composing several documents from a similar set of files, but each time with slight modifications. * Creating a structure of bookmarks, so that it is easier to navigate the larger document. * Original PDF features such as hyperlinks and the page orientation should remain intact.
Features
Simple YAML configuration file structure
Adding PDF metadata
Creating bookmarks, also with nested structures
Checking the version of merged PDF files and setting the minimum in the output PDF
Installation
Prerequisites
This program requires Python 3.6+.
Windows
Download and install from https://www.python.org/downloads/.
MacOS X
The best is to use Homebrew and install using
brew install python
Linux
Install Python using your distribution’s package manager. Usually the package is named python3.
Virtual environment
It is not strictly required but strongly advised to create a virtual environment for installing Python packages for a specific purpose.
Create one in any preferred location using
virtualenv -p python3 pdfconfig
The last argument can also be changed if preferred. Then activate the new environment:
source pdfconfig/bin/activate
Installing this tool
pip install pdf-config
Usage
Configuration file
A configuration file is set up in YAML syntax with the following components:
metadata: # Optional
title: My document title
author: ME
# Additionally supported:
# creator, keywords, producer, subject
# Hard-coded version to set in the header. Set to 'auto' or leave out
# entirely for using the maximum version of all input documents.
version: '1.6'
paths: # Optional
# List of paths to look up any files that do not contain a path
# specification. The current directory is checked first, then the following
# directories are checked in that order.
- ~/my-pdfs # User home directory can be referred to.
- $ADDITIONAL_PDF_PATH # Environment variables are also supported.
contents: # The only required element
# Each list entry can contain any of the following:
# bookmark: The bookmark title
# document: The name (and path) of the input file.
# contents: An additional list of contents. Any bookmarks in this sub-structure
# are placed under this bookmark, if present.
- bookmark: First
document: first.pdf
- bookmark: Second
contents:
- bookmark: Second doc 1
document: sd1.pdf
- bookmark: Second doc 2
document: sd2.pdf
- bookmark: Third
document: ~/pdfpath/third.pdf # Relative and absolute paths are supported.
contents:
- bookmark: Third doc 1
document: $PDF_T1 # Environment variables are also expanded.
- bookmark: Third doc 2
document: $PDF_T2
The order of metadata, paths, and contents above is not relevant. The resulting PDF bookmark structure will be
|-First | |-Second | |-Second doc 1 | |-Second doc 2 | |-Third |-Third doc 1 |-Third doc 2
Second points to the same page as Second doc 1, whereas Third and Third doc 1 point to different pages, since Third inserts pages on its own.
Start
With the configuration stored in sample.yaml and the PDF files in place, start the merging process by running
pdfconfig sample.yaml
This will merge the listed PDF files into sample.pdf. For changing the output name, simply append it to the end of the line; e.g. run
pdfconfig sample.yaml path/to/output.pdf
In Windows, use pdfconfig.exe.
For more explanation, run
pdfconfig -- --help
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.