automatically create bookmarks in a PDF file
Project description
pdf_scout
This CLI tool automatically generates PDF bookmarks (also known as an 'outline' or a 'table of contents') for computer-generated PDF documents.
You can install it globally via pip:
pip install --user pdf_scout
pdf_scout ./my_document.pdf
pip uninstall pdf_scout
This project is a work in progress and will likely only generate suitable bookmarks for documents that conform to the following requirements:
- Single column of text (not multiple columns)
- Font size of header text > font size of body text
- Header text is justified or left-aligned
- Paragraph spacing for headers > body text paragraph spacing
- Consistent left margins on every page
Supported document types
pdf_scout
has been tested on and expressly supports the following classes of documents:
- Singapore State Court and Supreme Court Judgments (unreported)
- Singapore Law Reports
It may support other types of documents as well. If a particular class of document isn't supported or does not work well, please open an issue and I will consider adding support for it.
Development
This project manages its dependencies using poetry and is only supported for Python ^3.9. After installing poetry and entering the project folder, run the following to install the dependencies:
poetry install
To open a virtualenv in the project folder with the dependencies, run:
poetry shell
To run a script directly, run:
poetry run python ./pdf_scout/app.py <INPUT_FILE_PATH>
Tests
There are snapshot tests. Input PDFs are not provided at the moment, so you will have to populate the /pdf
folder manually using the relevant sources (you may want to consider using Clerkent to download the unreported versions of judgments):
poetry run pytest
poetry run pytest --snapshot-update
Static type-checking
poetry run mypy pdf_scout/app.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pdf_scout-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce3d0519c2864d21d43241d53f2a6aeffebc7117a09044c0eed4443b1a6e3ec2 |
|
MD5 | afd59a6ce9078876efed9fd6729da6ab |
|
BLAKE2b-256 | 0d9127e0ccf6639211ee8bb5d985d34307f2c7f6cf76f67b21b27e27b9587f0c |