Add navigable bookmarks to a PDF based on its heading structure.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aatayev

These details have not been verified by PyPI

Project description

bmrk

A simple CLI tool for adding structured bookmarks to PDFs.

bmrk analyses a PDF's text and font metadata to detect its heading structure, then writes a bookmarked copy for easier navigation in any PDF viewer.

Installation
- From source
- With OCR support
  - OCR in a dev environment
Usage
How it works
Code structure
Limitations
Development
Contributing
License

Installation

pip install bmrk

For an isolated install that keeps bmrk available globally without polluting your Python environment:

# pipx
pipx install bmrk

# uv
uv tool install bmrk

To run bmrk once without installing it:

# pipx
pipx run bmrk paper.pdf paper_bookmarked.pdf

# uvx (uv's ephemeral tool runner)
uvx bmrk paper.pdf paper_bookmarked.pdf

From source

pip install git+https://github.com/AnvarAtayev/bmrk.git

With OCR support

For scanned PDFs that lack a text layer, install the optional OCR extra:

pip install "bmrk[ocr]"
# or
pipx install "bmrk[ocr]"
# or
uv tool install "bmrk[ocr]"

This pulls in ocrmypdf, which itself requires Tesseract and Ghostscript to be installed on your system:

# macOS
brew install tesseract ghostscript

# Debian/Ubuntu
sudo apt install tesseract-ocr ghostscript

# Windows -- download installers from:
#   https://github.com/UB-Mannheim/tesseract/wiki
#   https://www.ghostscript.com/releases/gsdnld.html

Then pass --ocr to bmrk:

bmrk scanned.pdf scanned_bookmarked.pdf --ocr

OCR in a dev environment

# 1. Clone the repo and sync all extras
git clone https://github.com/AnvarAtayev/bmrk.git
cd bmrk
uv sync --extra dev --extra ocr

# 2. Install system deps (macOS example)
brew install tesseract ghostscript

# 3. Run
uv run bmrk scanned.pdf out.pdf --ocr

Usage

bmrk [OPTIONS] <INPUT>.pdf [<OUTPUT>.pdf]

Basic

bmrk paper.pdf paper_bookmarked.pdf

Options

Flag	Default	Description
`--threshold RATIO` / `-t`	`1.05`	Font-size ratio above which text is treated as a heading. Raise to `1.15` for noisy PDFs; lower to `1.01` to catch bold same-size section titles.
`--verbose` / `-v`	off	Print detected headings and progress info.
`--dry-run` / `-n`	off	Detect and print headings only; do not write an output file. Useful for tuning `--threshold`.
`--ocr`	off	Run OCR before detection. Requires `bmrk[ocr]`.
`--export-headings FILE`	--	Write detected heading structure to FILE (TSV). Edit and feed back in with `--import-headings`.
`--import-headings FILE`	--	Use headings from FILE instead of running detection. Enables manual adjustments.
`--cover-pages N`	`0`	Skip the first N pages when detecting headings (e.g. cover page).
`--max-depth N` / `-d`	`3`	Maximum heading depth to include (1 = chapters only, 2 = + sections, 3 = + subsections).

Inspect before writing

bmrk paper.pdf --dry-run --verbose

Manual heading adjustments

If the auto-detected bookmarks are not quite right, you can export the heading structure, edit it by hand, and import the corrected version back in.

Step 1 -- Export the detected headings

bmrk paper.pdf --export-headings headings.tsv

When OUTPUT is omitted, bmrk runs detection and exports the heading list without writing a PDF.

Step 2 -- Edit the TSV file

Open headings.tsv in any text editor or spreadsheet app. The format is tab-separated with three columns:

# bmrk heading export
# level	page	title
1	1	Introduction
2	3	Background
2	7	Methods
1	12	Results
3	14	Statistical Analysis

level -- heading depth (1 = top-level chapter, 2 = section, 3 = subsection, ...).
page -- 1-based page number where the heading appears.
title -- the bookmark text shown in the PDF viewer.
Lines starting with # are comments and are ignored on import.

Common edits:

Remove a heading -- delete the line entirely.
Add a missing heading -- insert a new line with the correct level, page, and title.
Fix a title -- change the text in the third column.
Change nesting -- adjust the level number (e.g. change 2 to 1 to promote a section to a chapter).
Reorder headings -- rearrange lines; bookmarks are inserted in the order they appear in the file.

Step 3 -- Import and produce the bookmarked PDF

bmrk paper.pdf paper_bookmarked.pdf --import-headings headings.tsv

This skips detection entirely and uses your edited headings to write the bookmarked PDF.

Tune for a noisy PDF

# More conservative -- only large headings
bmrk paper.pdf out.pdf --threshold 1.15

# More aggressive -- catches bold same-size section titles
bmrk paper.pdf out.pdf --threshold 1.01

Handle a cover page

# Skip page 1 (the cover) when detecting headings
bmrk report.pdf report_bookmarked.pdf --cover-pages 1

How it works

bmrk reads every text span in the PDF along with its font size and style, then uses three signals to find headings:

Font size -- text larger than the body font is a heading. The biggest text becomes H1, the next size H2, and so on.
Numbered prefixes -- lines like 1 Introduction or 2.3 Methods are headings, with depth inferred from the numbering.
Bold/italic at body size -- some documents style section headings in bold or italic without changing the font size. These are picked up as the lowest heading level.

After detection, bmrk cleans up the results (removes running page headers, deduplicates, merges chapter labels like Chapter 1 with the title that follows) and writes the final bookmark outline into the output PDF.

flowchart LR
    A[PDF] --> B[Extract spans]
    B --> C[Pre-process]
    C --> D[Detect headings]
    D --> E[Clean up]
    E --> F[Write bookmarks]

    C -.- C1["Skip cover/TOC pages
    Exclude headers/footers
    Estimate body font size"]
    D -.- D1["1. Font size > body size
    2. Numbered prefixes
    3. Bold/italic at body size"]
    E -.- E1["Remove running headers
    Deduplicate adjacent titles
    Merge chapter labels
    Filter by max depth"]

Code structure

src/bmrk/
├── cli.py        # Typer CLI entry point
├── detector.py   # Heading detection logic and HeadingEntry dataclass
├── bookmarker.py # PDF bookmark writing

Limitations

Scanned/image PDFs -- bmrk cannot detect headings in PDFs without selectable text. Run OCR first with bmrk --ocr (requires bmrk[ocr]).
Existing bookmarks -- bmrk replaces any existing outline; it does not merge with pre-existing bookmarks.

Development

uv sync --extra dev

# Lint
uv run ruff check src/

# Test
uv run pytest

Contributing

Contributions are welcome. Bug reports, feature requests, and pull requests can all be submitted via GitHub Issues or as a pull request against main.

Before opening a pull request, run the lint and test suite to confirm nothing is broken:

uv sync --extra dev
uv run ruff check src/
uv run pytest

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aatayev

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Mar 1, 2026

This version

0.1.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bmrk-0.1.0.tar.gz (31.0 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bmrk-0.1.0-py3-none-any.whl (19.6 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file bmrk-0.1.0.tar.gz.

File metadata

Download URL: bmrk-0.1.0.tar.gz
Upload date: Mar 1, 2026
Size: 31.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bmrk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`64816dc71a70c38a2885e3c6f80a1e4179cd26f78a088fc1d8d02d3104bd3b0a`
MD5	`abec117ac54cf715a4f503c4d51f8931`
BLAKE2b-256	`04d12fee7fc0be4e0faadd2b8ab60a7ef9bf2a86184836976e9b06638aa6a09d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bmrk-0.1.0.tar.gz:

Publisher: release.yml on AnvarAtayev/bmrk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bmrk-0.1.0.tar.gz
- Subject digest: 64816dc71a70c38a2885e3c6f80a1e4179cd26f78a088fc1d8d02d3104bd3b0a
- Sigstore transparency entry: 1006317765
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: AnvarAtayev/bmrk@47ee3cd8b8f0eddd0bfd0db935a4e8dce24be5d6
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/AnvarAtayev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@47ee3cd8b8f0eddd0bfd0db935a4e8dce24be5d6
- Trigger Event: push

File details

Details for the file bmrk-0.1.0-py3-none-any.whl.

File metadata

Download URL: bmrk-0.1.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 19.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bmrk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c8e21b952ac7c6b793e916578e07e56b3b2e0612432e0968212e9018bb2d16d`
MD5	`5e82b2e2026672624e00bb5afae1d947`
BLAKE2b-256	`05e1109b43b721c0773da3f233dc3570be0b3e91dd8192baafda47e778e5ead6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bmrk-0.1.0-py3-none-any.whl:

Publisher: release.yml on AnvarAtayev/bmrk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bmrk-0.1.0-py3-none-any.whl
- Subject digest: 7c8e21b952ac7c6b793e916578e07e56b3b2e0612432e0968212e9018bb2d16d
- Sigstore transparency entry: 1006317766
- Sigstore integration time: Mar 1, 2026
Source repository:
- Permalink: AnvarAtayev/bmrk@47ee3cd8b8f0eddd0bfd0db935a4e8dce24be5d6
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/AnvarAtayev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@47ee3cd8b8f0eddd0bfd0db935a4e8dce24be5d6
- Trigger Event: push

bmrk 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

bmrk

Table of Contents

Installation

From source

With OCR support

OCR in a dev environment

Usage

Basic

Options

Inspect before writing

Manual heading adjustments

Tune for a noisy PDF

Handle a cover page

How it works

Code structure

Limitations

Development

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance