Skip to main content

A bounded-time, Pandoc-leaning Markdown parser with GFM, Extra/kramdown, math, fenced divs, and XHTML output.

Project description

xhtmlmd

A Rust Markdown parser and XHTML renderer.

The parser is tree-oriented. It preserves the structure and attributes needed for XHTML output, but it does not try to round-trip source text. The dialect is CommonMark/GFM for the core and GFM features, with Pandoc-leaning choices where extension families disagree.

xhtmlmd is largely implemented using AI, except for the tests. The tests are largely adapted from cmark-gfm, PHP Markdown Extra, kramdown, and Mistlefoot. Credit for xhtmlmd really belongs to the authors of these tests, and of the CommonMark docs, which is where the hard work was done.

Implemented syntax

  • Core block syntax: paragraphs, ATX/setext headings, thematic breaks, block quotes, ordered/unordered lists, indented code, raw HTML, link reference definitions.
  • GFM: pipe tables with alignment, task lists, ~~x~~ strikethrough, angle and bare autolinks, plus opt-in tagfiltering.
  • Code: backtick/tilde fenced code blocks, info strings, and Pandoc-style code attributes.
  • HTML-in-Markdown: block containers opened with markdown="1"; the control attribute is stripped, indented code blocks are disabled inside the container, and fenced code is the code-block syntax there.
  • Math: four modes: brackets for \(...\) and \[...\], dollars for those plus $...$ and $$...$$ using Pandoc's non-space/digit dollar rules, on to preserve \(...\) and \[...\] delimiters for client-side renderers such as KaTeX, and off. Brackets mode is the default.
  • Attributes and inline spans: Pandoc/kramdown-style {#id .class key="value"}, block IALs {: ...}, span IALs, ALDs such as {:note: #id .class} with references, superscript ^x^, subscript ~x~, and highlight ==x==.
  • Definition lists: PHP Markdown Extra/Pandoc-style Term followed by : definition or ~ definition.
  • Footnotes: [^id] references to defined [^id]: definitions with indented continuation blocks.
  • Abbreviations: *[HTML]: Hyper Text Markup Language definitions render matching text as <abbr>.
  • Fenced divs: Pandoc/Quarto/Djot-style ::: containers with attributes or a single class word.

Usage

Install via pip to get both the Python API and the native xhtmlmd CLI:

pip install xhtmlmd

The CLI reads Markdown from stdin or from an optional file path and writes an XHTML fragment to stdout:

echo '# Hello' | xhtmlmd
xhtmlmd input.md > out.xhtml
xhtmlmd --math=on input.md > out.xhtml
xhtmlmd --math=dollars input.md > out.xhtml

Python API:

from xhtmlmd import to_xhtml

html = to_xhtml(r"\(x^2\)")
html_for_katex = to_xhtml(r"\(x^2\)", math="on")
html_with_dollars = to_xhtml("$x$", math="dollars")

Callbacks

Python callers can override rendered nodes with callbacks. Each callback receives a node dict and the default XHTML for that node. Return None to keep the default, or return replacement XHTML.

Callback names:

  • Blocks: paragraph, heading, block_quote, list, definition_list, code_block, html_block, html_container, thematic_break, table, div, math_block
  • Inlines: text, soft_break, hard_break, emph, strong, strike, superscript, subscript, highlight, code, link, image, autolink, abbr, html_inline, math_inline, footnote_ref, span
from fastpylight import highlight
from xhtmlmd import to_xhtml

def highlight_code(node, default_html):
    if node["lang"] != "python": return None
    return highlight(node["text"], node["lang"]) + "\n"

html = to_xhtml(markdown, callbacks={"code_block": highlight_code})

Callbacks can also render bracket math as MathML:

from math_core import LatexToMathML
from xhtmlmd import to_xhtml

mathml = LatexToMathML()

def render_math(node, default_html):
    html = mathml.convert_with_local_counter(node["tex"], displaystyle=node["type"] == "math_block")
    return html + ("\n" if node["type"] == "math_block" else "")

html = to_xhtml(markdown, callbacks={"math_inline": render_math, "math_block": render_math})

Rust/source usage:

cargo run --release -- input.md > out.xhtml
cat input.md | cargo run --release -- --math=dollars

Library usage:

use xhtmlmd::{to_xhtml, Options, MathMode};

let mut options = Options::default();
options.math = MathMode::Dollars;
let html = to_xhtml("$x$", &options);

Parsing strategy

The parser uses the two-phase strategy described in the CommonMark parsing-strategy appendix: first build the block tree and collect link reference definitions, then parse raw inline text with the completed reference table. It tracks visual columns and byte offsets for each line and builds blocks with an arena-backed open-container stack. The stack has typed nodes for block quotes, lists, paragraphs/setext candidates, fenced and indented code, raw HTML, GFM table candidates, math, footnote definitions, definition lists, fenced divs, and markdown-in-HTML containers. Inlines are scanned into atoms, bracket openers, and delimiter runs; links/images/spans resolve through the bracket stack, while emphasis/strong/strikethrough resolve through the delimiter stack. Inputs that can otherwise explode have explicit bounds: inline nesting, block/container nesting, link label length, and link parenthesis nesting.

The link parser uses raw reference-label scanning, bounded parenthesis nesting, bounded link labels, URI escaping for rendered href/src attributes, and a plain-text fast path for inputs with no possible inline constructs. This keeps adversarial inputs such as deeply nested brackets, long blockquote runs, repeated ![[](), and unclosed comments in predictable time.

Raw HTML is preserved by default. Supported raw HTML container tags such as div, section, table, svg, math, and custom elements stay open across blank lines until their matching close tag, with same-tag nesting counted; void and self-closing tags do not open balanced containers. Markdown inside raw HTML remains raw unless the open tag that starts the Markdown block uses markdown="1"; this crate does not recursively look for markdown controls inside otherwise-raw HTML. Options::default().tagfilter is false; enabling it applies GFM-style filtering for tags such as script, style, xmp, and textarea. This is compatibility and extra protection, not a replacement for sanitizing untrusted rendered HTML.

Tests

ship-rs-test

Use cargo test --test conformance -- --nocapture when you want the per-section conformance report. The harness supports XHTML_MD_CONFORMANCE_SECTION, XHTML_MD_CONFORMANCE_EXAMPLE, XHTML_MD_CONFORMANCE_LIMIT, and XHTML_MD_CONFORMANCE_TRACE for narrowing failures.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhtmlmd-0.1.3.tar.gz (131.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

xhtmlmd-0.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (745.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.3-cp313-cp313-macosx_11_0_arm64.whl (657.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

xhtmlmd-0.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (745.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.3-cp312-cp312-macosx_11_0_arm64.whl (657.3 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

xhtmlmd-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (744.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.3-cp311-cp311-macosx_11_0_arm64.whl (660.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

xhtmlmd-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (744.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.3-cp310-cp310-macosx_11_0_arm64.whl (660.6 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file xhtmlmd-0.1.3.tar.gz.

File metadata

  • Download URL: xhtmlmd-0.1.3.tar.gz
  • Upload date:
  • Size: 131.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhtmlmd-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0d13eebbb98bced75c64bf320e63a82c241a9a83baf93ba522e8c26cec73b427
MD5 ed08a5e89b3dd4f722fd2c10e377501c
BLAKE2b-256 abd3e272a67102427063ced4278aa1ad7e59111abba0afe28d2d47dbcf8720ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.3.tar.gz:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 036aa0b0335d86cade7d954e1c1bfae9d518ca39c0fe3fa157de7a19c2f65b05
MD5 d7363641697006f32afbed340f409ab3
BLAKE2b-256 f59f299d7e86f390ed031a83694743f0f28bfd15f51a0e82d39819b57686cf5b

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ea5c1243ce1d7f9a5557d152d53fdf2bba832238b10b756166fb9b427828a04b
MD5 da2dd21bd7229e111e09bf424a38b80a
BLAKE2b-256 acda2510ef9c19408fec0c0044664797371103e7be9c96f8d34614b62a0f248a

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.3-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b10f10526201f29234dba826cddd26683fafb8b5b0c1de72beb86ad4ea3f93dc
MD5 6ca2e90585bf97dfc5c717c79d1b5ebc
BLAKE2b-256 41aaf546a82b93bd9f790ba28781bfc8893e8f1beb974a7c3b16dcc73db2f7e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3c82b5166e852080467caa18c588741ad0a496407cd0e2b7a0588b383bf18d5a
MD5 27499a85bd5cb64356a8ee2198e579ed
BLAKE2b-256 641c7115aa966233d8407237ae19651a64aae0d4d9f8e86046ab733d8b9d4a65

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f508a85b245879631d9b63ef599dfc9a26f86210e4ace5f47f2090f6ef1436fd
MD5 1dd6870c43b9031abdd373b00caafa95
BLAKE2b-256 5a1ff6eef0897814ac7e2513ee8ab3d090a7e3b74d67576a9517a799a431ccea

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6deaa115d17047e15e3a6ec59bb736109646623175fed64bb047d7aae3180383
MD5 2a1bc6d7ea12f7f61797620095c057fb
BLAKE2b-256 b9cd3b0104870dddf5edd8e940bffafe23cc4b7c680c9382208935a17d8f0688

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.3-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0fe8f969f7c3b43ff8dcc3f71158d70ae061aeeecc3d520208f067460bb92784
MD5 15b68cbf29c29ff706f739f0cd9ca6dc
BLAKE2b-256 e204600e5bc007f789131aaf2e44d4bdc3824bbcd83fca306e8a30c17395bfbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 83e85e41849c58fd2b56c55bbf515d9f2d1b704ac036bc171f788d5575e46d75
MD5 1b10ffc85745884301a1fabd5bd32d5a
BLAKE2b-256 6a2193f14e034ad270f8716109ce3d9c2a94bdad0f22da0828b6677f35d7d2fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.3-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page