Rust implementation of Python markdownify with a Python API
Project description
markdownify-rs
Rust implementation of Python markdownify with output parity as the primary goal.
Python bindings
Build and install locally with maturin (uv):
uv venv
uv pip install maturin
.venv/bin/maturin develop --features python
Build via pip (PEP 517):
uv pip install .
Usage:
from markdownify_rs import markdownify
print(markdownify("<b>Hello</b>"))
Batch usage (parallelized in Rust):
from markdownify_rs import markdownify_batch
outputs = markdownify_batch(["<b>Hello</b>", "<i>World</i>"])
Markdown-adjacent utilities (submodule):
from markdownify_rs.markdown_utils import (
split_into_chunks,
split_into_chunks_batch,
coalesce_small_chunks,
strip_links_with_substring,
remove_large_tables,
split_on_dividers,
link_percentage,
strip_html_and_contents,
strip_data_uri_images,
)
chunks = split_into_chunks(text, how="sections")
chunks_batch = split_into_chunks_batch([text1, text2], how="sections")
cleaned = strip_links_with_substring(text, "javascript")
Notes:
code_language_callbackis not yet supported in the Python bindings.
CLI:
markdownify-rs input.html
cat input.html | markdownify-rs
Parity hacks (scraper vs. BeautifulSoup)
These are explicit, ad hoc behaviors added on top of scraper/html5ever to match
python-markdownify (BeautifulSoup + html.parser) output. They are intentionally
quirky and may be replaced with more “correct” behavior once parity is stable.
<br>parser quirk: With BeautifulSoup’s html.parser, if a non‑self‑closing<br>appears before a self‑closing<br/>, the later<br/>can be treated like an opening<br>whose contents run until that implicit<br>is closed (usually when its parent closes). We emulate this by removing the content between that<br/>and the closing tag that ends the implicit<br>(ignoring<br>tags inside comments/scripts), which matches python-markdownify’s output.- Leading whitespace reconstruction: html.parser preserves whitespace‑only text
nodes that html5ever drops (notably between
<html>children and at the start of<body>). We reconstruct the normalized leading whitespace prefix (using the same “single space vs. single newline” rules as BeautifulSoup’sendData) and merge it with the converter output, carrying it across non‑block tags and empty custom elements whose contents are only comments/whitespace. - Table header inference: For tables whose header row is effectively empty, we avoid forcing a “---” separator to match python-markdownify behavior.
- Top-level
<td>/<th>wrapping: If input is a bare<td>/<th>, we wrap it in a<table><tr>…</tr></table>fragment to align with python-markdownify output.
Benchmarks
Datasets
- Michigan Statutes (JSONL, 241 HTML documents).
- Total HTML bytes: 101,029,525 (~96.35 MiB).
- Largest document: 8,034,686 bytes (~7.66 MiB).
- Source file size: 102,856,616 bytes (~98.10 MiB).
- Law websites (CSV, 3,136 HTML documents).
- Total HTML bytes: 111,747,114 (~106.57 MiB).
- Largest document: 1,381,380 bytes (~1.32 MiB).
- Source file size: 148,486,852 bytes (~141.61 MiB).
Run
# Michigan Statutes (JSONL)
MARKDOWNIFY_BENCH_PATH=/path/to/mi_statutes.jsonl .venv/bin/python scripts/bench_python.py --module markdownify_rs --dist-name markdownify-rs --label markdownify_rs
MARKDOWNIFY_BENCH_PATH=/path/to/mi_statutes.jsonl .venv/bin/python scripts/bench_python.py --module markdownify --dist-name markdownify --label markdownify
# Law websites (CSV)
.venv/bin/python scripts/bench_python.py --format csv --path /path/to/deleted_pages.csv --module markdownify_rs --dist-name markdownify-rs --label markdownify_rs
.venv/bin/python scripts/bench_python.py --format csv --path /path/to/deleted_pages.csv --module markdownify --dist-name markdownify --label markdownify
Python binding comparison (both run through Python, 2026-01-28, Apple M3, macOS 14.6 / Darwin 24.6.0, Python 3.13.0)
Michigan Statutes (JSONL)
markdownify_rsconvert_all(241 docs): time 2.266594 s, throughput 42.508 MiB/smarkdownify_rsconvert_all_batch(241 docs): time 0.538012 s, throughput 179.084 MiB/smarkdownify_rsconvert_largest(8,034,686 bytes): time 187.941 ms, throughput 40.771 MiB/smarkdownifyconvert_all(241 docs): time 29.654787 s, throughput 3.249 MiB/smarkdownifyconvert_largest(8,034,686 bytes): time 4.496880 s, throughput 1.704 MiB/s
Speedup summary (wall-clock time, lower is better)
| Scenario | markdownify_rs time | markdownify_rs batch time | markdownify time | Speedup (rs vs py) | Speedup (batch vs py) | Batch vs rs |
|---|---|---|---|---|---|---|
| convert_all | 2.266594 s | 0.538012 s | 29.654787 s | 13.08x (+1208.34%) | 55.12x (+5411.92%) | 4.21x (+321.29%) |
| convert_largest | 187.941 ms | n/a | 4.496880 s | 23.93x (+2292.71%) | n/a | n/a |
Law websites (CSV)
markdownify_rsconvert_all(3,136 docs): time 2.596691 s, throughput 41.041 MiB/smarkdownify_rsconvert_all_batch(3,136 docs): time 0.672013 s, throughput 158.584 MiB/smarkdownify_rsconvert_largest(1,381,380 bytes): time 54.482 ms, throughput 24.180 MiB/smarkdownifyconvert_all(3,136 docs): time 17.680570 s, throughput 6.028 MiB/smarkdownifyconvert_largest(1,381,380 bytes): time 280.459 ms, throughput 4.697 MiB/s
Speedup summary (wall-clock time, lower is better)
| Scenario | markdownify_rs time | markdownify_rs batch time | markdownify time | Speedup (rs vs py) | Speedup (batch vs py) | Batch vs rs |
|---|---|---|---|---|---|---|
| convert_all | 2.596691 s | 0.672013 s | 17.680570 s | 6.81x (+580.89%) | 26.31x (+2530.99%) | 3.86x (+286.40%) |
| convert_largest | 54.482 ms | n/a | 280.459 ms | 5.15x (+414.77%) | n/a | n/a |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markdownify_rs-0.1.1.tar.gz.
File metadata
- Download URL: markdownify_rs-0.1.1.tar.gz
- Upload date:
- Size: 74.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7b6901cb53084a160f52a28073c945625df11948819118d88054b393cd0cf77
|
|
| MD5 |
bd5698a15266a259dce5d49d74e2aebc
|
|
| BLAKE2b-256 |
5a1896edfb18e61474a8e8adb5833c868d15c5752c7fee505bc453e3ac50e05a
|
File details
Details for the file markdownify_rs-0.1.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: markdownify_rs-0.1.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.4 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ba5d7620d1fb752650c7a721c70a476a629de4251b4be95eab73236ee0e1a81
|
|
| MD5 |
d0c914e50a6d2c6dd66360b2a9235153
|
|
| BLAKE2b-256 |
561bf84d20a8d6751bd9b0eaff21198376aea2c56dd8a090cc1c0e35a3caed4a
|
File details
Details for the file markdownify_rs-0.1.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: markdownify_rs-0.1.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.4 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d68d5e0641844ea90fd3128d5f8f9fd147f89dba30bcc096f9746f7eeab7343e
|
|
| MD5 |
ffd81e624cbc80425c1df1276d4f26d9
|
|
| BLAKE2b-256 |
5246a634b3a51c00e1d5c9075cdfd5cd8745c7909ca3f90ccfba2cf7f4b890a8
|
File details
Details for the file markdownify_rs-0.1.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: markdownify_rs-0.1.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.4 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
387d3f6fc9da1b44fac0348c68e3d2fd208c420fa6ad48f7647fd0a6fc5ce342
|
|
| MD5 |
e0a9108f126987d7cda61c8f2998fd30
|
|
| BLAKE2b-256 |
6836e57b2da5fddccdce5ad0b8731e27c7f9ac17ee9a3f3e4965cd3126f60252
|
File details
Details for the file markdownify_rs-0.1.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: markdownify_rs-0.1.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.4 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3db582e85123bf6b337cc3c1a4953104187242fca99c1a9345b53a526a939645
|
|
| MD5 |
c2c3cc3b3ec7517109fd21da3aaeed1c
|
|
| BLAKE2b-256 |
b4d74d9a9e4260e9d1ff5505d7c846bd7768c19706b85470f9df5e6632d76246
|
File details
Details for the file markdownify_rs-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: markdownify_rs-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
251172440ae8d2e75938278fe3454f5fb3575aaa57e364ef8fc04f4385a5dad0
|
|
| MD5 |
2daa369aeda120ea08a8df5dc4608a18
|
|
| BLAKE2b-256 |
cb683ca1d56cb2e0f87c9c5643890b6654785e1c12a8c94791100b1fb39c1f43
|
File details
Details for the file markdownify_rs-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: markdownify_rs-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77023cf0f15a30b10ba9e10ebe22db29a4c73954ec4bc3148ecef8df2e07b2ec
|
|
| MD5 |
33d57438165a971ed64211e59c656a13
|
|
| BLAKE2b-256 |
6a76e6e84447ab67dd2a0a2dc9f4b9ec4e9f5171a6c9d25b2004743d0a67d49b
|
File details
Details for the file markdownify_rs-0.1.1-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: markdownify_rs-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0938e56dabeca70675d425098313fedc656767423bf90108014c58aef0ac53f
|
|
| MD5 |
eb290d610cb1b4928329db40f49390e9
|
|
| BLAKE2b-256 |
25dc08d359dbeda2079a482b13286b510f9423750bb453f4aa6fd47f3caf383b
|