Convert Notion HTML exports to PDF via a local LaTeX pipeline

These details have not been verified by PyPI

Project links

Project description

Notion HTML → PDF (LaTeX pipeline)

Convert a Notion HTML export into a printable PDF with correct heading hierarchy, math, tables, images, and a clickable table of contents.

Designed for large course notes exported from Notion with KaTeX formulas, nested toggles, and simple-table blocks.

Quick start

Requirements

Tool	Purpose
Python 3.10+	CLI and HTML/LaTeX processing
Pandoc 3.x	HTML → LaTeX
pdflatex (TeX Live or MacTeX)	PDF build

All processing runs on your machine — nothing is uploaded.

Install (CLI)

From a clone of this repo:

cd /path/to/Notion2Tex
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e .
notion2tex --check          # verify pandoc + pdflatex

From PyPI (when published):

pip install notion2tex

Install Pandoc: https://pandoc.org/installing.html

Install TeX (includes pdflatex): https://www.tug.org/texlive/ (or MacTeX on macOS). A minimal TeX Live install is enough; if compilation fails on a missing .sty file, run tlmgr install <package> (e.g. tlmgr install soul ulem float).

Convert

Pass the .zip file you get when exporting from Notion (HTML format). The ZIP contains the page .html and an asset folder with the same name:

notion2tex "/path/to/Export.zip"

The ZIP is extracted to a folder with the same name (e.g. Export.zip → Export/), then the pipeline runs on the main page inside it.

You can still pass a single .html if it already sits next to its asset folder:

notion2tex "/path/to/export/Page Name.html"

Or use the wrapper script (after pip install -e .):

chmod +x n2t.sh   # once
./n2t.sh Export.zip

Output (for a page Automata.html inside Export/):

File	Description
`Automata.html`	Original Notion export (unchanged)
`Automata.tex`	LaTeX source
`Automata.pdf`	Final PDF
`Automata.log`	pdflatex log (if PDF was built)

Intermediate files (_clean.html, .aux, .toc, .out, …) are removed automatically after a successful run.

Files are written next to the HTML inside the extracted export folder.

Options:

notion2tex --help
notion2tex Export.zip --tex-only       # LaTeX only, no pdflatex
notion2tex Export.zip -v               # show compiler output
notion2tex Export.zip --no-color       # plain output (no colors or progress bars)
notion2tex Export.zip --extract-dir ./work   # custom extraction folder

Setup (development)

python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Pipeline overview

flowchart LR
  A[Notion export .zip] --> Z[Extract ZIP]
  Z --> B[clean_html.py]
  B --> C["*_clean.html"]
  C --> D[Pandoc]
  D --> E["*.tex"]
  E --> F[fix_latex.py]
  F --> G[table_latex.py]
  G --> H["*.tex fixed"]
  H --> I[pdflatex x2]
  I --> J["*.pdf"]

clean_html.py — Fix Notion-specific HTML so Pandoc behaves predictably.
Pandoc — Convert cleaned HTML to a standalone LaTeX document.
fix_latex.py — Post-process LaTeX (math, sections, TOC, figures, tables).
pdflatex (twice) — Build PDF and refresh the table of contents / page numbers.

notion2tex (or n2t.sh) runs all four steps in order.

Exporting from Notion

Open the Notion page (or workspace export).
Export as HTML (with subpages if needed). Notion delivers a .zip file.
Run notion2tex Export.zip — the tool extracts the archive and keeps paths intact (Page.html + Page/ asset folder).
Do not rename or move files inside the export before converting; image paths in the HTML are relative to the .html file.

Project structure

.
├── automata.html          # Example input: raw Notion HTML export
├── automata_clean.html    # Generated: cleaned HTML
├── automata.tex           # Generated: LaTeX
├── automata.pdf           # Generated: PDF
├── n2t.sh                 # Thin wrapper → notion2tex CLI
├── notion2tex/            # Installable Python package
│   ├── clean_html.py      # Step 1: HTML preprocessing
│   ├── fix_latex.py       # Step 3: LaTeX post-processing
│   ├── table_latex.py     # Table conversion (used by fix_latex)
│   ├── zip_export.py      # Extract Notion .zip, find main .html
│   ├── pipeline.py        # Full build orchestration
│   └── cli.py             # `notion2tex` command
├── pyproject.toml
└── .venv/                 # Optional virtual environment

`clean_html.py`

Prepares Notion HTML before Pandoc:

Step	What it does
Toggles → headings	Nested `<details>` become `<h1>`–`<h6>` (deepest first)
Table repair	Removes invalid `<div>` wrappers inside `<table>` so Pandoc emits real tables
Math	KaTeX `<annotation>` → MathML (inline) or `$$...$$` (display)
SVG removal	Drops SVG icons/images that break `pdflatex`
Emoji removal	Strips emoji characters

python -c "from notion2tex.clean_html import clean_html_for_pandoc; clean_html_for_pandoc('automata.html', 'automata_clean.html')"

`fix_latex.py`

Fixes Pandoc/Notion artifacts in the .tex file:

Area	Fix
Structure	Section numbering `1.` / `1.1.` / `1.1.1.`; unnumbered cover page
TOC	Inserts `\tableofcontents` after the cover; front matter in roman numerals, body from page 1 in arabic
Figures	`[H]` placement so images stay in document order
Math	Escaped `\$...\$`, `\textbackslash`, `gather*` / `cases`, Unicode symbols
Titles	Corrupted `\section{...}` with KaTeX / bookmarks
Captions	Removes empty `\caption{}` / spurious “Figure N”
Tables	Delegates to `table_latex.py`

python -c "from notion2tex.fix_latex import fix_latex; fix_latex('automata.tex')"

`table_latex.py`

Rebuilds Pandoc longtable environments:

Replaces awkward p{} + minipage columns with tabular / tabularx + booktabs
Uses \shortstack for multi-line cells
Skips the Notion cover metadata table (website / status)
Plain l columns for compact transition tables; X columns for wide text

Manual build (step by step)

notion2tex automata.html --tex-only
cd "$(dirname automata.html)"   # if you used an absolute path
rm -f automata.aux automata.toc automata.out
pdflatex -interaction=nonstopmode automata.tex
pdflatex -interaction=nonstopmode automata.tex

Or run the full pipeline in one step: notion2tex automata.html.

The second pdflatex pass is required for a correct table of contents and page numbers.

Troubleshooting

`Missing \begin{document}` with hex garbage in `.aux`

The auxiliary file is corrupted (often after interrupting pdflatex):

rm -f automata.aux automata.toc automata.out
pdflatex -interaction=nonstopmode automata.tex
pdflatex -interaction=nonstopmode automata.tex

`Package array Error` near `\end{tabularx}`

Usually a malformed table column spec from an older build. Re-run the full pipeline with notion2tex so table_latex.py regenerates tables.

Tables appear as separate text blocks (not columns)

The source HTML still has Notion <div> inside <tbody>. Re-run clean_html.py (table repair runs before math replacement).

Course properties table missing fields (username, password, …)

Notion2Tex shows every property row present in the HTML export. During clean HTML, the log lists the field names found, for example: Normalized properties table (4 fields): Sito web, Username, Password, Status.

If username/password are missing from that list, they are not in the export file — Notion often omits Password-type database properties from HTML exports. Use Text properties (or re-export after adding the fields and confirming they appear in the raw .html before converting). Then run notion2tex again.

Empty or wrong table of contents

Run pdflatex twice. Delete .toc / .aux first if you changed section structure.

Missing images in PDF

Check that image folders from the Notion export sit next to the HTML file with the same relative paths as in the export.

`File ...sty not found`

Install a full TeX distribution (TeX Live / MacTeX). pdflatex needs packages such as hyperref, booktabs, tabularx, float.

Customization

Goal	Where to change
TOC depth (section levels)	`fix_latex.py` → `_add_table_of_contents()` (`tocdepth`)
First numbered section marker	`fix_latex.py` → `_add_table_of_contents()` (`marker`)
Cover page title	`fix_latex.py` → `_unnumbered_cover_section()`
Toggle → heading depth cap	`clean_html.py` → `h_level = min(1 + nesting_depth, 6)`
Property tables (cover metadata)	`properties.py`, `table_latex.py` → `_rebuild_key_value_table()`

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notion2tex-0.1.0.tar.gz (33.7 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

notion2tex-0.1.0-py3-none-any.whl (35.3 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file notion2tex-0.1.0.tar.gz.

File metadata

Download URL: notion2tex-0.1.0.tar.gz
Upload date: May 21, 2026
Size: 33.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for notion2tex-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c7055dff7b239b96d37c79c59117ffc287eb1ae88bd6b20e395d4e0127dd3711`
MD5	`23e386553960f8728a9695332104bf29`
BLAKE2b-256	`463283681a2a6190bed6f38ef19d8260871191d7c7a3a522e0f68ba77f05c0d4`

See more details on using hashes here.

File details

Details for the file notion2tex-0.1.0-py3-none-any.whl.

File metadata

Download URL: notion2tex-0.1.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 35.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for notion2tex-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`93e8ba726fecd2513e9bb7fa3227133114f2db7a8859951e73c8c498a0bfaff6`
MD5	`adf0ed2426d0b83e784572384655a795`
BLAKE2b-256	`5389bf11ad13119b0f054817b3884d7a45479f81e4e8b0b27f838d9d2855dd59`

See more details on using hashes here.

notion2tex 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Notion HTML → PDF (LaTeX pipeline)

Quick start

Requirements

Install (CLI)

Convert

Setup (development)

Pipeline overview

Exporting from Notion

Project structure

clean_html.py

fix_latex.py

table_latex.py

Manual build (step by step)

Troubleshooting

Missing \begin{document} with hex garbage in .aux

Package array Error near \end{tabularx}

Tables appear as separate text blocks (not columns)

Course properties table missing fields (username, password, …)

Empty or wrong table of contents

Missing images in PDF

File ...sty not found

Customization

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`clean_html.py`

`fix_latex.py`

`table_latex.py`

`Missing \begin{document}` with hex garbage in `.aux`

`Package array Error` near `\end{tabularx}`

`File ...sty not found`