Skip to main content

A Python port of Turndown — HTML to Markdown converter

Project description

turndown-py

A Python port of Turndown (v7.2.4), an HTML to Markdown converter originally written in JavaScript by Dom Christie.

Project Status

Conversion fidelity: 203 tests passing — 147 golden test cases ported from upstream (all match JS output), 39 internal/unit tests, 14 API tests.

  • ruff check — clean
  • ruff format — clean
  • pytest tests/ — 203/203 passed

🎉 Additional tests in tests/test_internals.py:

  • Pickle serializationTurndownService round-trips through pickle.dumps/loads (required for Spark UDFs)
  • Thread safety — 8 concurrent threads converting different HTML inputs produce correct, isolated results
  • Statelessness — repeated calls with same input yield identical output
  • Reference link isolation — per-call references state prevents cross-call contamination

Installation

pip install turndown-py

Requires Python 3.7+.

Development

# Clone and enter the repo
git clone https://github.com/M-Intelligence/turndown-py
cd turndown-py

# Create virtualenv and install dev dependencies
uv sync

# Run tests
uv run pytest

# Lint and format
uv run ruff check .
uv run ruff format .

Package Structure

Python Module Ported From Description
html_parser.py html-parser.js Pure-Python DOM tree via html.parser.HTMLParser with DomNode adapter providing JS-like childNodes, textContent, outerHTML, cloneNode(), getAttribute(), sibling traversal
utilities.py utilities.js Block/void element sets, escapeMarkdown(), extend(), repeat(), newline trimming helpers
collapse_whitespace.py collapse-whitespace.js DFS whitespace collapse preserving <pre> subtrees
node.py node.js annotate_node()isBlock, isCode, isBlank, flankingWhitespace
rules.py rules.js Three-bucket rule system: forNode(), keep(), remove(), add()
commonmark_rules.py commonmark-rules.js All 15 CommonMark rules — paragraphs, headings, lists, code blocks, links, emphasis, images
root_node.py root-node.js Input normalization: parse HTML + collapse whitespace
service.py turndown.js TurndownService with full public API, pickle support, thread safety

Usage

from turndown import TurndownService

ts = TurndownService()
markdown = ts.turndown('<h1>Hello <em>world</em></h1>')
# => 'Hello _world_\n============='

License

GPL-3.0-or-later. The original turndown.js MIT license is retained in LICENSE_MIT. Attribution in CREDITS.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turndown_py-2026.5.20.tar.gz (78.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turndown_py-2026.5.20-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file turndown_py-2026.5.20.tar.gz.

File metadata

  • Download URL: turndown_py-2026.5.20.tar.gz
  • Upload date:
  • Size: 78.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for turndown_py-2026.5.20.tar.gz
Algorithm Hash digest
SHA256 16828a3cbfd2fe74f76be787db80cb41446d79a38b12923913931e6be90e3e1c
MD5 b1e119efc241df815be1ccf9fd57058b
BLAKE2b-256 9a8b94fbd2de1557466a00c0e343aa8ac184b697ee3f0044742ae9297ab023c9

See more details on using hashes here.

File details

Details for the file turndown_py-2026.5.20-py3-none-any.whl.

File metadata

  • Download URL: turndown_py-2026.5.20-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for turndown_py-2026.5.20-py3-none-any.whl
Algorithm Hash digest
SHA256 e039cddea8e9d9353bc52e638e3d11794672be03d10367ef3fc774a8e1ded31c
MD5 0900f21415d96e551eba87bccb5c372b
BLAKE2b-256 c863878c560f4ee7bc9faeb6447e04a9fc802d9b501e53287e9f46f7caeef155

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page