Skip to main content

Rust implementation of Python markdownify with a Python API

Project description

markdownify-rs

Rust implementation of Python markdownify with output parity as the primary goal.

Python bindings

Build and install locally with maturin (uv):

uv venv
uv pip install maturin
.venv/bin/maturin develop --features python

Build via pip (PEP 517):

uv pip install .

Usage:

from markdownify_rs import markdownify

print(markdownify("<b>Hello</b>"))

Batch usage (parallelized in Rust):

from markdownify_rs import markdownify_batch

outputs = markdownify_batch(["<b>Hello</b>", "<i>World</i>"])

Notes:

  • code_language_callback is not yet supported in the Python bindings.

CLI:

markdownify-rs input.html
cat input.html | markdownify-rs

Parity hacks (scraper vs. BeautifulSoup)

These are explicit, ad hoc behaviors added on top of scraper/html5ever to match python-markdownify (BeautifulSoup + html.parser) output. They are intentionally quirky and may be replaced with more “correct” behavior once parity is stable.

  • <br> parser quirk: With BeautifulSoup’s html.parser, if a non‑self‑closing <br> appears before a self‑closing <br/>, the later <br/> can be treated like an opening <br> whose contents run until that implicit <br> is closed (usually when its parent closes). We emulate this by removing the content between that <br/> and the closing tag that ends the implicit <br> (ignoring <br> tags inside comments/scripts), which matches python-markdownify’s output.
  • Leading whitespace reconstruction: html.parser preserves whitespace‑only text nodes that html5ever drops (notably between <html> children and at the start of <body>). We reconstruct the normalized leading whitespace prefix (using the same “single space vs. single newline” rules as BeautifulSoup’s endData) and merge it with the converter output, carrying it across non‑block tags and empty custom elements whose contents are only comments/whitespace.
  • Table header inference: For tables whose header row is effectively empty, we avoid forcing a “---” separator to match python-markdownify behavior.
  • Top-level <td>/<th> wrapping: If input is a bare <td>/<th>, we wrap it in a <table><tr>…</tr></table> fragment to align with python-markdownify output.

Benchmarks

Datasets

  • Michigan Statutes (JSONL, 241 HTML documents).
    • Total HTML bytes: 101,029,525 (~96.35 MiB).
    • Largest document: 8,034,686 bytes (~7.66 MiB).
    • Source file size: 102,856,616 bytes (~98.10 MiB).
  • Law websites (CSV, 3,136 HTML documents).
    • Total HTML bytes: 111,747,114 (~106.57 MiB).
    • Largest document: 1,381,380 bytes (~1.32 MiB).
    • Source file size: 148,486,852 bytes (~141.61 MiB).

Run

# Michigan Statutes (JSONL)
MARKDOWNIFY_BENCH_PATH=/path/to/mi_statutes.jsonl .venv/bin/python scripts/bench_python.py --module markdownify_rs --dist-name markdownify-rs --label markdownify_rs
MARKDOWNIFY_BENCH_PATH=/path/to/mi_statutes.jsonl .venv/bin/python scripts/bench_python.py --module markdownify --dist-name markdownify --label markdownify

# Law websites (CSV)
.venv/bin/python scripts/bench_python.py --format csv --path /path/to/deleted_pages.csv --module markdownify_rs --dist-name markdownify-rs --label markdownify_rs
.venv/bin/python scripts/bench_python.py --format csv --path /path/to/deleted_pages.csv --module markdownify --dist-name markdownify --label markdownify

Python binding comparison (both run through Python, 2026-01-28, Apple M3, macOS 14.6 / Darwin 24.6.0, Python 3.13.0)

Michigan Statutes (JSONL)

  • markdownify_rs convert_all (241 docs): time 2.266594 s, throughput 42.508 MiB/s
  • markdownify_rs convert_all_batch (241 docs): time 0.538012 s, throughput 179.084 MiB/s
  • markdownify_rs convert_largest (8,034,686 bytes): time 187.941 ms, throughput 40.771 MiB/s
  • markdownify convert_all (241 docs): time 29.654787 s, throughput 3.249 MiB/s
  • markdownify convert_largest (8,034,686 bytes): time 4.496880 s, throughput 1.704 MiB/s

Speedup summary (wall-clock time, lower is better)

Scenario markdownify_rs time markdownify_rs batch time markdownify time Speedup (rs vs py) Speedup (batch vs py) Batch vs rs
convert_all 2.266594 s 0.538012 s 29.654787 s 13.08x (+1208.34%) 55.12x (+5411.92%) 4.21x (+321.29%)
convert_largest 187.941 ms n/a 4.496880 s 23.93x (+2292.71%) n/a n/a

Law websites (CSV)

  • markdownify_rs convert_all (3,136 docs): time 2.596691 s, throughput 41.041 MiB/s
  • markdownify_rs convert_all_batch (3,136 docs): time 0.672013 s, throughput 158.584 MiB/s
  • markdownify_rs convert_largest (1,381,380 bytes): time 54.482 ms, throughput 24.180 MiB/s
  • markdownify convert_all (3,136 docs): time 17.680570 s, throughput 6.028 MiB/s
  • markdownify convert_largest (1,381,380 bytes): time 280.459 ms, throughput 4.697 MiB/s

Speedup summary (wall-clock time, lower is better)

Scenario markdownify_rs time markdownify_rs batch time markdownify time Speedup (rs vs py) Speedup (batch vs py) Batch vs rs
convert_all 2.596691 s 0.672013 s 17.680570 s 6.81x (+580.89%) 26.31x (+2530.99%) 3.86x (+286.40%)
convert_largest 54.482 ms n/a 280.459 ms 5.15x (+414.77%) n/a n/a

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdownify_rs-0.1.0.tar.gz (40.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

markdownify_rs-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

markdownify_rs-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

markdownify_rs-0.1.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

markdownify_rs-0.1.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

markdownify_rs-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

markdownify_rs-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

markdownify_rs-0.1.0-cp38-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file markdownify_rs-0.1.0.tar.gz.

File metadata

  • Download URL: markdownify_rs-0.1.0.tar.gz
  • Upload date:
  • Size: 40.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for markdownify_rs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 78de70f0d58090f17d5199febc58808f35e70e2212efff2e94f6280b157c2405
MD5 07142dfd5aee57fe88b51c8c38016600
BLAKE2b-256 123bd173a71213b81352382948cdac606adb94338a7da7d8f7c2d332f7defa7a

See more details on using hashes here.

File details

Details for the file markdownify_rs-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for markdownify_rs-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e6085aaad716c6943217a3b697e5c39501f4e27515300b93d9a932f26e49f551
MD5 3a87dd0ee33ccb4da4a9143c71373b43
BLAKE2b-256 8b538923e44fb834461f06cdd392fad57d128382c65d3702b2a9f9a872cb0f49

See more details on using hashes here.

File details

Details for the file markdownify_rs-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for markdownify_rs-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e1fb61e129319e9640693f0fa88c3937cefe5f479eab6949dc25a54cea0fe670
MD5 0b41cf95344c42295cc62cd4020696e0
BLAKE2b-256 e9e2fee387b65e3a8349d1c40589243ddfc5dfac301b9d9a15155358799571c5

See more details on using hashes here.

File details

Details for the file markdownify_rs-0.1.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for markdownify_rs-0.1.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f1d80e1956b4bc30a526597c316b9f66236c3fe56b0ecfbff950954d6c71b73b
MD5 a46c69f57b63b2618bddf5c1da8ae97f
BLAKE2b-256 4474c791e7e657e8369790607b0f5a6d071a119f58de63dc8070dec76d343213

See more details on using hashes here.

File details

Details for the file markdownify_rs-0.1.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for markdownify_rs-0.1.0-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3df4d4213876a701479cd3cd9aa5f4c140cc98fcc21d312fbe9c417d33c084e7
MD5 2f4a3ef4dcab0e6881863c522890af74
BLAKE2b-256 65e7f99929a5e2190cca88cc3300a2880b9a36e1b46a372399d89c28d3d2f4a4

See more details on using hashes here.

File details

Details for the file markdownify_rs-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for markdownify_rs-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4279f729909ef76f7b14e24de150c417df6b027f0cb60b9a02e6b7e32ee41f00
MD5 e1a1d94338a32f3dc5e908fc236d716b
BLAKE2b-256 acbbec89aa7860656819b3b7e96b974a658d3187ab5e57e3bc08ee8e361d9832

See more details on using hashes here.

File details

Details for the file markdownify_rs-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for markdownify_rs-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 be624ae96f69d783089a459be8c32829326436b5b1c474933a4b957958c3d99d
MD5 e07822a69de61b285793e6a36438034c
BLAKE2b-256 d27eeb5228de1b72e45f12c8c635e0ec10ee9883aa7419afe7227efb90d59822

See more details on using hashes here.

File details

Details for the file markdownify_rs-0.1.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for markdownify_rs-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b33f57576be743bba41ac2befc72eb0aa24faf59365498a640f3cd466c4e9fba
MD5 981d091de07bb7db0b0d21facc1ca8d2
BLAKE2b-256 8e1158eb9502da2b9bf8b1e59d039f85db2ec4c8b975d40af8c190c8be0e6489

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page