Skip to main content

A bounded-time, Pandoc-leaning Markdown parser with GFM, Extra/kramdown, math, fenced divs, and XHTML output.

Project description

xhtmlmd

A Rust Markdown parser and XHTML renderer.

The parser is tree-oriented. It preserves the structure and attributes needed for XHTML output, but it does not try to round-trip source text. The dialect is CommonMark/GFM for the core and GFM features, with Pandoc-leaning choices where extension families disagree.

xhtmlmd is largely implemented using AI, except for the tests. The tests are largely adapted from cmark-gfm, PHP Markdown Extra, kramdown, and Mistlefoot. Credit for xhtmlmd really belongs to the authors of these tests, and of the CommonMark docs, which is where the hard work was done.

Implemented syntax

  • Core block syntax: paragraphs, ATX/setext headings, thematic breaks, block quotes, ordered/unordered lists, indented code, raw HTML, link reference definitions.
  • GFM: pipe tables with alignment, task lists, ~~x~~ strikethrough, angle and bare autolinks, plus opt-in tagfiltering.
  • Code: backtick/tilde fenced code blocks, info strings, and Pandoc-style code attributes.
  • HTML-in-Markdown: block containers opened with markdown="1"; the control attribute is stripped, indented code blocks are disabled inside the container, and fenced code is the code-block syntax there.
  • Math: four modes: brackets for \(...\) and \[...\], dollars for those plus $...$ and $$...$$ using Pandoc's non-space/digit dollar rules, on to preserve \(...\) and \[...\] delimiters for client-side renderers such as KaTeX, and off. Brackets mode is the default.
  • Attributes and inline spans: Pandoc/kramdown-style {#id .class key="value"}, block IALs {: ...}, span IALs, ALDs such as {:note: #id .class} with references, superscript ^x^, subscript ~x~, and highlight ==x==.
  • Definition lists: PHP Markdown Extra/Pandoc-style Term followed by : definition or ~ definition.
  • Footnotes: [^id] references to defined [^id]: definitions with indented continuation blocks.
  • Abbreviations: *[HTML]: Hyper Text Markup Language definitions render matching text as <abbr>.
  • Fenced divs: Pandoc/Quarto/Djot-style ::: containers with attributes or a single class word.

Usage

Install via pip to get both the Python API and the native xhtmlmd CLI:

pip install xhtmlmd

The CLI reads Markdown from stdin or from an optional file path and writes an XHTML fragment to stdout:

echo '# Hello' | xhtmlmd
xhtmlmd input.md > out.xhtml
xhtmlmd --math=on input.md > out.xhtml
xhtmlmd --math=dollars input.md > out.xhtml

Python API:

from xhtmlmd import to_xhtml

html = to_xhtml(r"\(x^2\)")
html_for_katex = to_xhtml(r"\(x^2\)", math="on")
html_with_dollars = to_xhtml("$x$", math="dollars")

Callbacks

Python callers can override rendered nodes with callbacks. Each callback receives a node dict and the default XHTML for that node. Return None to keep the default, or return replacement XHTML.

Callback names:

  • Blocks: paragraph, heading, block_quote, list, definition_list, code_block, html_block, html_container, thematic_break, table, div, math_block
  • Inlines: text, soft_break, hard_break, emph, strong, strike, superscript, subscript, highlight, code, link, image, autolink, abbr, html_inline, math_inline, footnote_ref, span
from fastpylight import highlight
from xhtmlmd import to_xhtml

def highlight_code(node, default_html):
    if node["lang"] != "python": return None
    return highlight(node["text"], node["lang"]) + "\n"

html = to_xhtml(markdown, callbacks={"code_block": highlight_code})

Callbacks can also render bracket math as MathML:

from math_core import LatexToMathML
from xhtmlmd import to_xhtml

mathml = LatexToMathML()

def render_math(node, default_html):
    html = mathml.convert_with_local_counter(node["tex"], displaystyle=node["type"] == "math_block")
    return html + ("\n" if node["type"] == "math_block" else "")

html = to_xhtml(markdown, callbacks={"math_inline": render_math, "math_block": render_math})

Command-line usage (the xhtmlmd script is installed with the package):

xhtmlmd input.md > out.xhtml
cat input.md | xhtmlmd --math=dollars

Parsing strategy

The parser uses the two-phase strategy described in the CommonMark parsing-strategy appendix: first build the block tree and collect link reference definitions, then parse raw inline text with the completed reference table. It tracks visual columns and byte offsets for each line and builds blocks with an arena-backed open-container stack. The stack has typed nodes for block quotes, lists, paragraphs/setext candidates, fenced and indented code, raw HTML, GFM table candidates, math, footnote definitions, definition lists, fenced divs, and markdown-in-HTML containers. Inlines are scanned into atoms, bracket openers, and delimiter runs; links/images/spans resolve through the bracket stack, while emphasis/strong/strikethrough resolve through the delimiter stack. Inputs that can otherwise explode have explicit bounds: inline nesting, block/container nesting, link label length, and link parenthesis nesting.

The link parser uses raw reference-label scanning, bounded parenthesis nesting, bounded link labels, URI escaping for rendered href/src attributes, and a plain-text fast path for inputs with no possible inline constructs. This keeps adversarial inputs such as deeply nested brackets, long blockquote runs, repeated ![[](), and unclosed comments in predictable time.

Raw HTML is preserved by default. Supported raw HTML container tags such as div, section, table, svg, math, and custom elements stay open across blank lines until their matching close tag, with same-tag nesting counted; void and self-closing tags do not open balanced containers. Markdown inside raw HTML remains raw unless the open tag that starts the Markdown block uses markdown="1"; this crate does not recursively look for markdown controls inside otherwise-raw HTML. Options::default().tagfilter is false; enabling it applies GFM-style filtering for tags such as script, style, xmp, and textarea. This is compatibility and extra protection, not a replacement for sanitizing untrusted rendered HTML.

Tests

maturin develop && pytest -q

The spec-conformance suite is tests/test_conformance.py: it renders the fixtures under tests/source/ and compares normalized HTML trees. Run just that file with pytest tests/test_conformance.py -v to see per-example ids.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhtmlmd-0.1.5.tar.gz (126.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

xhtmlmd-0.1.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (424.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.5-cp313-cp313-macosx_11_0_arm64.whl (382.5 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

xhtmlmd-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (424.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.5-cp312-cp312-macosx_11_0_arm64.whl (382.6 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

xhtmlmd-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (425.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.5-cp311-cp311-macosx_11_0_arm64.whl (384.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

xhtmlmd-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (426.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.5-cp310-cp310-macosx_11_0_arm64.whl (385.0 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file xhtmlmd-0.1.5.tar.gz.

File metadata

  • Download URL: xhtmlmd-0.1.5.tar.gz
  • Upload date:
  • Size: 126.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhtmlmd-0.1.5.tar.gz
Algorithm Hash digest
SHA256 696b5bf15e418856f59a172620ee4dee8a80c5cde94cccd402f1e5f6b149da61
MD5 a87b5221f0e0d4705cf305a443545fd8
BLAKE2b-256 ed8a5c82f5f4af994068a3b28628746f5fd698e22acf3cb8f0bb36256fcdab43

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.5.tar.gz:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 446c618bb0554efa226df0c426fbc3885b1cbcfbc872e04ef7383031ac252183
MD5 64db40d4c55ef094355e37bf4f36ce33
BLAKE2b-256 1ddbc78566cac654641986652e65accc090610fd82f6c1e47ff2885800cb3233

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.5-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.5-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 294dbfedd81eeae8a80b969a5324907bce1f40d4ddd27a5ee4e052ee1eb6fc0f
MD5 a2bf09efbe30ae34cbf8a936d89a5f2d
BLAKE2b-256 fbf7522b27fd794bb715ff6b4376d595ec271874751868d36e609a9721e55eef

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.5-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 681e9e61aa1ab799e485891be598bd055ddda85a3a1ba9fa3a92974466f7cddf
MD5 76583480fd4ccfc4cb3cf71b3788f486
BLAKE2b-256 3bfa125f7c8cd9ee91237fea1b2d7ebf680339d01437f81b04b7578a5df0821d

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 83c65d78a077225dc1b0fbaa0e797ff39f9fd49259293d66734c9d5a9a6e889b
MD5 57856fc3e5956fa9100517eb39f0e7f4
BLAKE2b-256 f37395242163223499faa02ad8a52d1e1fe1d77b8e23b97e9e6473afd5372782

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.5-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d327784a5cb588e9cc2229b430aa77833302777597e6526ce9692d6bc849b2fb
MD5 42a22bd15323d48ff9931253de667098
BLAKE2b-256 290a3f352968d1f82658c46fd5662ab5e28e9012ebc7d021c70c229232977539

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8bb2d3f9987fdbbfd942e39a5b6c6369720f8332c8a9bf2a0d75bb682a3d30ca
MD5 1dde7e0cddf2a28bf102775a465e90f9
BLAKE2b-256 40a45e73d1131e33368c3230d1f9bd05b84f93ac2bb8375849b6011bcb7a0213

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.5-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e2fa045dab9344d031141b797c72c0658595ef025553116ac8fa8f806e2cc313
MD5 17f7c3bc480253c7a6e8ae08ed0316b8
BLAKE2b-256 768fa4fde9adb9113a628efc6fe1107c282ad7a21e31f6ab1d560e97d8cb38c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.5-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 90a31a9f3c71ad7715b6f8e0ebfe9e3e7354c155b8115b75eb7b214191878232
MD5 dfb9a5b45aebbda3b3ff645e1f874796
BLAKE2b-256 20b731fe2d2dae24114cbb01ff97b189f5e6e4821e52836f04d814524aed641a

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.5-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page