Skip to main content

Extract pages from PDF documents

Project description

https://img.shields.io/pypi/v/pdfpages.svg?style=flat-square:https://pypi.python.org/pypi/pdfpages: https://img.shields.io/github/license/philbooth/pdfpages.svg?style=flat-square:target:https://opensource.org/licenses/MIT

Extract specific pages from PDF documents.

What is it?

A python package that extracts pages from PDF documents and writes them to a fresh PDF.

How do I install it?

Via pip:

pip install pdfpages

Or if you just want the git repo:

git clone git@github.com:philbooth/pdfpages.git

How do I run it on the command line?

pdfpages -o out.pdf in.pdf

The -o option is used to specify the output path and the final argument is the path to the input document. You can specify multiple input documents by listing further paths at the end of the command:

pdfpages -o out.pdf in1.pdf in2.pdf
pdfpages -o out.pdf in/*.pdf

Without other arguments, the default behaviour is to extract the first page from each input document and write the result to the output PDF.

If you want to extract specific pages, you can use the -p option. For instance, to extract just the second page from each input document you woud run:

pdfpages -p 2 -o out.pdf in/*.pdf

Or to extract the second and third pages from each document:

pdfpages -p 2 3 -o out.pdf in/*.pdf

You can also use the -f and -c options to specify ranges of page numbers. For instance, to extract the first hundred pages from each document:

pdfpages -f 1 -c 100 -o out.pdf in/*.pdf

Or to extract the second hundred pages:

pdfpages -f 101 -c 100 -o out.pdf in/*.pdf

You can exclude specific pages from these ranges with the -e option. For example, to exclude the third page from the first five pages of each input document:

pdfpages -f 1 -c 5 -e 3 -o out.pdf in/*.pdf

Finally, you can see the usage information at any time using the -h option:

python pdfpages.py -h

How do I call it from python code?

import pdfpages

pdfpages.extract(in_files, out_file, pages, exclude_pages)
  • in_files: A tuple containing files opened for binary reading (mode "rb").
  • out_file: A file opened for binary writing (mode "wb").
  • pages: A tuple containing page numbers to extract (integers).
  • exclude_pages: An optional tuple containing page numbers to exclude from extraction (integers). Defaults to an empty tuple.

What license is it released under?

MIT

Project details


Release history Release notifications

This version
History Node

0.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pdfpages-0.1.0.tar.gz (3.2 kB) Copy SHA256 hash SHA256 Source None Sep 10, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page