Skip to main content

Extract pages from PDF documents

Project description

https://img.shields.io/pypi/v/pdfpages.svg?style=flat-square:https://pypi.python.org/pypi/pdfpages: https://img.shields.io/github/license/philbooth/pdfpages.svg?style=flat-square:target:https://opensource.org/licenses/MIT

Extract specific pages from PDF documents.

What is it?

A python package that extracts pages from PDF documents and writes them to a fresh PDF.

How do I install it?

Via pip:

pip install pdfpages

Or if you just want the git repo:

git clone git@github.com:philbooth/pdfpages.git

How do I run it on the command line?

pdfpages -o out.pdf in.pdf

The -o option is used to specify the output path and the final argument is the path to the input document. You can specify multiple input documents by listing further paths at the end of the command:

pdfpages -o out.pdf in1.pdf in2.pdf
pdfpages -o out.pdf in/*.pdf

Without other arguments, the default behaviour is to extract the first page from each input document and write the result to the output PDF.

If you want to extract specific pages, you can use the -p option. For instance, to extract just the second page from each input document you woud run:

pdfpages -p 2 -o out.pdf in/*.pdf

Or to extract the second and third pages from each document:

pdfpages -p 2 3 -o out.pdf in/*.pdf

You can also use the -f and -c options to specify ranges of page numbers. For instance, to extract the first hundred pages from each document:

pdfpages -f 1 -c 100 -o out.pdf in/*.pdf

Or to extract the second hundred pages:

pdfpages -f 101 -c 100 -o out.pdf in/*.pdf

You can exclude specific pages from these ranges with the -e option. For example, to exclude the third page from the first five pages of each input document:

pdfpages -f 1 -c 5 -e 3 -o out.pdf in/*.pdf

Finally, you can see the usage information at any time using the -h option:

python pdfpages.py -h

How do I call it from python code?

import pdfpages

pdfpages.extract(in_files, out_file, pages, exclude_pages)
  • in_files: A tuple containing files opened for binary reading (mode "rb").

  • out_file: A file opened for binary writing (mode "wb").

  • pages: A tuple containing page numbers to extract (integers).

  • exclude_pages: An optional tuple containing page numbers to exclude from extraction (integers). Defaults to an empty tuple.

What license is it released under?

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfpages-0.1.0.tar.gz (3.2 kB view details)

Uploaded Source

File details

Details for the file pdfpages-0.1.0.tar.gz.

File metadata

  • Download URL: pdfpages-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pdfpages-0.1.0.tar.gz
Algorithm Hash digest
SHA256 efe90a81891d67f10ea0d0dc8dab328422c0b8231792a1c049c71ec439e6f053
MD5 4c1232b83a519a5bfeb6a3be66e28bd3
BLAKE2b-256 79ee44ae1743687fa3d92bd5990c4f130743cdce865b332252bcae4892bf8076

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page