A Python port of Turndown — HTML to Markdown converter
Project description
turndown-py
A Python port of Turndown (v7.2.4), an HTML to Markdown converter originally written in JavaScript by Dom Christie.
Project Status
Conversion fidelity: 203 tests passing — 147 golden test cases ported from upstream (all match JS output), 39 internal/unit tests, 14 API tests.
- ✅
ruff check— clean - ✅
ruff format— clean - ✅
pytest tests/— 203/203 passed
🎉 Additional tests in tests/test_internals.py:
- Pickle serialization —
TurndownServiceround-trips throughpickle.dumps/loads(required for Spark UDFs) - Thread safety — 8 concurrent threads converting different HTML inputs produce correct, isolated results
- Statelessness — repeated calls with same input yield identical output
- Reference link isolation — per-call
referencesstate prevents cross-call contamination
Installation
pip install turndown-py
Requires Python 3.7+.
Development
# Clone and enter the repo
git clone https://github.com/M-Intelligence/turndown-py
cd turndown-py
# Create virtualenv and install dev dependencies
uv sync
# Run tests
uv run pytest
# Lint and format
uv run ruff check .
uv run ruff format .
Package Structure
| Python Module | Ported From | Description |
|---|---|---|
html_parser.py |
html-parser.js |
Pure-Python DOM tree via html.parser.HTMLParser with DomNode adapter providing JS-like childNodes, textContent, outerHTML, cloneNode(), getAttribute(), sibling traversal |
utilities.py |
utilities.js |
Block/void element sets, escapeMarkdown(), extend(), repeat(), newline trimming helpers |
collapse_whitespace.py |
collapse-whitespace.js |
DFS whitespace collapse preserving <pre> subtrees |
node.py |
node.js |
annotate_node() — isBlock, isCode, isBlank, flankingWhitespace |
rules.py |
rules.js |
Three-bucket rule system: forNode(), keep(), remove(), add() |
commonmark_rules.py |
commonmark-rules.js |
All 15 CommonMark rules — paragraphs, headings, lists, code blocks, links, emphasis, images |
root_node.py |
root-node.js |
Input normalization: parse HTML + collapse whitespace |
service.py |
turndown.js |
TurndownService with full public API, pickle support, thread safety |
Usage
from turndown import TurndownService
ts = TurndownService()
markdown = ts.turndown('<h1>Hello <em>world</em></h1>')
# => 'Hello _world_\n============='
License
GPL-3.0-or-later. The original turndown.js MIT license is retained in LICENSE_MIT.
Attribution in CREDITS.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file turndown_py-2026.5.20.tar.gz.
File metadata
- Download URL: turndown_py-2026.5.20.tar.gz
- Upload date:
- Size: 78.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16828a3cbfd2fe74f76be787db80cb41446d79a38b12923913931e6be90e3e1c
|
|
| MD5 |
b1e119efc241df815be1ccf9fd57058b
|
|
| BLAKE2b-256 |
9a8b94fbd2de1557466a00c0e343aa8ac184b697ee3f0044742ae9297ab023c9
|
File details
Details for the file turndown_py-2026.5.20-py3-none-any.whl.
File metadata
- Download URL: turndown_py-2026.5.20-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e039cddea8e9d9353bc52e638e3d11794672be03d10367ef3fc774a8e1ded31c
|
|
| MD5 |
0900f21415d96e551eba87bccb5c372b
|
|
| BLAKE2b-256 |
c863878c560f4ee7bc9faeb6447e04a9fc802d9b501e53287e9f46f7caeef155
|