Skip to main content

automatically create bookmarks in a PDF file

Project description

pdf_scout

PyPI PyPI - Python Version PyPI - License

This CLI tool automatically generates PDF bookmarks (also known as an 'outline' or a 'table of contents') for computer-generated PDF documents.

You can install it globally via pip:

pip install --user pdf_scout
pdf_scout ./my_document.pdf

pip uninstall pdf_scout

screenshot

This project is a work in progress and will likely only generate suitable bookmarks for documents that conform to the following requirements:

  • Single column of text (not multiple columns)
  • Font size of header text > font size of body text
  • Header text is justified or left-aligned
  • Paragraph spacing for headers > body text paragraph spacing
  • Consistent left margins on every page

Supported document types

pdf_scout has been tested on and expressly supports the following classes of documents:

It may support other types of documents as well. If a particular class of document isn't supported or does not work well, please open an issue and I will consider adding support for it.

Development

This project manages its dependencies using poetry and is only supported for Python ^3.9. After installing poetry and entering the project folder, run the following to install the dependencies:

poetry install

To open a virtualenv in the project folder with the dependencies, run:

poetry shell

To run a script directly, run:

poetry run python ./pdf_scout/app.py <INPUT_FILE_PATH>

Tests

There are snapshot tests. Input PDFs are not provided at the moment, so you will have to populate the /pdf folder manually using the relevant sources (you may want to consider using Clerkent to download the unreported versions of judgments):

poetry run pytest
poetry run pytest --snapshot-update

Static type-checking

poetry run mypy pdf_scout/app.py

Tips

  • Processing a large PDF can take some time, so to iterate faster when debugging certain behaviour, extract the problematic part of the PDF as a separate file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_scout-0.0.6.tar.gz (10.5 kB view hashes)

Uploaded Source

Built Distribution

pdf_scout-0.0.6-py3-none-any.whl (11.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page