Skip to main content

Command line interface for generating pdftk-style bookmark files in a user-friendly way, and (optionally) outputs a PDF file with the specified outline.

Project description

pdfoutliner

Command line tool for generating pdftk-style bookmark files in a user-friendly way, and (optionally) outputs a PDF file with the specified outline.

Table of Contents

Why

Instead of requiring a TOC file like this, as pdftk does

BookmarkBegin
BookmarkTitle: PDF Reference (Version 1.5)
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: Contents
BookmarkLevel: 2
BookmarkPageNumber: 3

To create a PDF file with a structured/nested outline with the script, you only need a TOC file that looks like this:

PDF Reference (Version 1.5) 1
  Contents 3

or perhaps better, this:

1 PDF Reference (Version 1.5) 1
1.1 Contents 3

Installation

pip3 install pdfoutliner

Sample Usage

With PDF I/O:

pdfoutliner TOC --inpdf in.pdf -s START

where

  • START is the page in the PDF where p. 1 is supposed to start, and
  • TOC is the path to a table of contents file.

See section TOC Format for details on the syntax.

Writing a pdftk bookmark file only:

pdfoutliner TOC

For more options, see section Additional Options, or use

pdfoutliner -h

TOC Format

The default table of contents format is

1 Heading 1
1.2 Subheading 3
1.2.3 Subsubheading 5

Each line has a numbering (not necessarily numerical), a title, and a page number, separated by space characters.

The script will infer that "1 Heading" is level 1, "1.1 Subheading" is level 2, and so on.

Alternatively, you can specify the structure by indentation, or keep the PDF flat.

Specifying structure by subheading numbering

This is the default option. As mentioned, the format is

1 Heading 1
1.1 Subheading 3
1.1.1 Subsubheading 5

And the script will infer the structure from the numbering.

If your TOC file looks like

1. Heading 1
1.1. Subheading 3
1.1.1. Subsubheading 5

i.e., has a trailing dot after each numbering, you could specify the style of the heading with --style 1.2.

Specifying structure by indentation

You could also specify the structure of the outline by indentation with -d --indentation, followed by an escaped regex for 1 unit of indentation.

For example, suppose my TOC looks like

Heading 1
  Subheading 3
    Subsubheading 5

where the unit of indentation is 2 spaces, then use

pdfoutliner TOC -d \\s\\s

And the script will infer the structure from the subheading indentations.

Keeping PDF flat

Use -k --keepflat and the script will ignore any numbering or indentations. The output PDF will have a flat, unstructured outline.

Heading 1
Subheading 3
Subsubheading 5

Additional Options

usage: pdfoutliner [-h] [-o OUTMARKS] [-d INDENTATION] [-k]
                      [--style {1.2,1.2.}] [--outpdf OUTPDF] [--inpdf INPDF]
                      [-s START]
                      toc

optional arguments:
  -h, --help            show this help message and exit

bookmark I/O:
  toc                   path to TOC file
  -o OUTMARKS, --outmarks OUTMARKS
                        name for pdftk bookmarks file. default is original toc
                        name + "_outlined"

bookmark structure:
  if both -d and -k are specified, -d will take precedence over -k

  -d INDENTATION, --indentation INDENTATION
                        escaped regex for 1 unit of indentation
  -k, --keepflat       keep outline flat
  --style {1.2,1.2.}    heading style. with or without a trailing dot. default
                        "1.2", i.e., no trailing dot

PDF I/O:
  --outpdf OUTPDF       path to output PDF file. default is input pdf name +
                        "_outlined.pdf" in input PDF's directory
  --inpdf INPDF         path to input PDF file
  -s START, --start START
                        page in the pdf document where page 1 is. default 1

Dependency

  • pdftk
    • on macOS 10.11+, use the build here
  • (optional) Tabula
    • for extracting a usable TOC from PDF files (along with some additional regex golfing)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfoutliner-0.0.4.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfoutliner-0.0.4-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file pdfoutliner-0.0.4.tar.gz.

File metadata

  • Download URL: pdfoutliner-0.0.4.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.8.1

File hashes

Hashes for pdfoutliner-0.0.4.tar.gz
Algorithm Hash digest
SHA256 bf3305ca34027452dca5bc8876b603d47052e7805924d8ad02a4891fa1c4eca2
MD5 ba21fdc1c3edd493fb910ad08b0e9a05
BLAKE2b-256 aed5848a3af4ea64dee79ca21d4234c5f7fb664eb528211f6b290a73db5fccd3

See more details on using hashes here.

File details

Details for the file pdfoutliner-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: pdfoutliner-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.8.1

File hashes

Hashes for pdfoutliner-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e4164bd85463dca1b7da3ce91a7c4f94588e582b20de3617a648f9a9b4eca9b8
MD5 cd7e2815b5142de7b8f5a5e3aa5bb0c1
BLAKE2b-256 8997631b63853b18907486fe18a31e3ae0dcf5360eba0b82c71dd61b1a568350

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page