Skip to main content

automatically create bookmarks in a PDF file

Project description

pdf_scout

PyPI PyPI - Python Version PyPI - License

This CLI tool automatically generates PDF bookmarks (also known as an 'outline' or a 'table of contents') for computer-generated PDF documents.

You can install it globally via pip:

pip install --user pdf_scout
pdf_scout ./my_document.pdf

pip uninstall pdf_scout

screenshot

This project is a work in progress and will likely only generate suitable bookmarks for documents that conform to the following requirements:

  • Single column of text (not multiple columns)
  • Font size of header text > font size of body text
  • Header text is justified or left-aligned
  • Paragraph spacing for headers > body text paragraph spacing
  • Consistent left margins on every page

Supported document types

pdf_scout has been tested on and expressly supports the following classes of documents:

It may support other types of documents as well. If a particular class of document isn't supported or does not work well, please open an issue and I will consider adding support for it.

Development

This project manages its dependencies using poetry and is only supported for Python ^3.9. After installing poetry and entering the project folder, run the following to install the dependencies:

poetry install

To open a virtualenv in the project folder with the dependencies, run:

poetry shell

To run a script directly, run:

poetry run python ./pdf_scout/app.py <INPUT_FILE_PATH>

Tests

There are snapshot tests. Input PDFs are not provided at the moment, so you will have to populate the /pdf folder manually using the relevant sources (you may want to consider using Clerkent to download the unreported versions of judgments):

poetry run pytest
poetry run pytest --snapshot-update

Static type-checking

poetry run mypy pdf_scout/app.py

Tips

  • Processing a large PDF can take some time, so to iterate faster when debugging certain behaviour, extract the problematic part of the PDF as a separate file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_scout-0.0.6.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

pdf_scout-0.0.6-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file pdf_scout-0.0.6.tar.gz.

File metadata

  • Download URL: pdf_scout-0.0.6.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.9 Linux/5.15.0-1024-azure

File hashes

Hashes for pdf_scout-0.0.6.tar.gz
Algorithm Hash digest
SHA256 87908911f26ca52c3e030d4c76c3c4273d0ed51c01d1b4271af4d302a917f331
MD5 e543ddfa6f4b39997d88e011aeba88ce
BLAKE2b-256 7a08352acbf5c5dd59db3c4beef7da258788b4e7e9614886b3edfcbd6b8d84a0

See more details on using hashes here.

File details

Details for the file pdf_scout-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pdf_scout-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.9 Linux/5.15.0-1024-azure

File hashes

Hashes for pdf_scout-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5a21a3ddca70215016f0c5cf1f670e3913ba5fbc92552ed913f14b20853f706d
MD5 c3c66cb5c05237ac12b5927075eabab5
BLAKE2b-256 8aee8e0b7ec8cce767959b283f60ab98047bd9932a6331aafd74b36e64407fe0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page