Extract pages from PDF documents

## Project description

Extract specific pages from PDF documents.

## What is it?

A python package that extracts pages from PDF documents and writes them to a fresh PDF.

## How do I install it?

Via pip:

pip install pdfpages


Or if you just want the git repo:

git clone git@github.com:philbooth/pdfpages.git


## How do I run it on the command line?

pdfpages -o out.pdf in.pdf


The -o option is used to specify the output path and the final argument is the path to the input document. You can specify multiple input documents by listing further paths at the end of the command:

pdfpages -o out.pdf in1.pdf in2.pdf
pdfpages -o out.pdf in/*.pdf


Without other arguments, the default behaviour is to extract the first page from each input document and write the result to the output PDF.

If you want to extract specific pages, you can use the -p option. For instance, to extract just the second page from each input document you woud run:

pdfpages -p 2 -o out.pdf in/*.pdf


Or to extract the second and third pages from each document:

pdfpages -p 2 3 -o out.pdf in/*.pdf


You can also use the -f and -c options to specify ranges of page numbers. For instance, to extract the first hundred pages from each document:

pdfpages -f 1 -c 100 -o out.pdf in/*.pdf


Or to extract the second hundred pages:

pdfpages -f 101 -c 100 -o out.pdf in/*.pdf


You can exclude specific pages from these ranges with the -e option. For example, to exclude the third page from the first five pages of each input document:

pdfpages -f 1 -c 5 -e 3 -o out.pdf in/*.pdf


Finally, you can see the usage information at any time using the -h option:

python pdfpages.py -h


## How do I call it from python code?

import pdfpages

pdfpages.extract(in_files, out_file, pages, exclude_pages)

• in_files: A tuple containing files opened for binary reading (mode "rb").
• out_file: A file opened for binary writing (mode "wb").
• pages: A tuple containing page numbers to extract (integers).
• exclude_pages: An optional tuple containing page numbers to exclude from extraction (integers). Defaults to an empty tuple.

