Skip to main content

A bounded-time, Pandoc-leaning Markdown parser with GFM, Extra/kramdown, math, fenced divs, and XHTML output.

Project description

xhtmlmd

A Rust Markdown parser and XHTML renderer.

The parser is tree-oriented. It preserves the structure and attributes needed for XHTML output, but it does not try to round-trip source text. The dialect is CommonMark/GFM for the core and GFM features, with Pandoc-leaning choices where extension families disagree.

xhtmlmd is largely implemented using AI, except for the tests. The tests are largely adapted from cmark-gfm, PHP Markdown Extra, kramdown, and Mistlefoot. Credit for xhtmlmd really belongs to the authors of these tests, and of the CommonMark docs, which is where the hard work was done.

Implemented syntax

  • Core block syntax: paragraphs, ATX/setext headings, thematic breaks, block quotes, ordered/unordered lists, indented code, raw HTML, link reference definitions.
  • GFM: pipe tables with alignment, task lists, ~~x~~ strikethrough, angle and bare autolinks, plus opt-in tagfiltering.
  • Code: backtick/tilde fenced code blocks, info strings, and Pandoc-style code attributes.
  • HTML-in-Markdown: block containers opened with markdown="1"; the control attribute is stripped, indented code blocks are disabled inside the container, and fenced code is the code-block syntax there.
  • Math: four modes: brackets for \(...\) and \[...\], dollars for those plus $...$ and $$...$$ using Pandoc's non-space/digit dollar rules, on to preserve \(...\) and \[...\] delimiters for client-side renderers such as KaTeX, and off. Brackets mode is the default.
  • Attributes and inline spans: Pandoc/kramdown-style {#id .class key="value"}, block IALs {: ...}, span IALs, ALDs such as {:note: #id .class} with references, superscript ^x^, subscript ~x~, and highlight ==x==.
  • Definition lists: PHP Markdown Extra/Pandoc-style Term followed by : definition or ~ definition.
  • Footnotes: [^id] references to defined [^id]: definitions with indented continuation blocks.
  • Abbreviations: *[HTML]: Hyper Text Markup Language definitions render matching text as <abbr>.
  • Fenced divs: Pandoc/Quarto/Djot-style ::: containers with attributes or a single class word.

Usage

Install via pip to get both the Python API and the native xhtmlmd CLI:

pip install xhtmlmd

The CLI reads Markdown from stdin or from an optional file path and writes an XHTML fragment to stdout:

echo '# Hello' | xhtmlmd
xhtmlmd input.md > out.xhtml
xhtmlmd --math=on input.md > out.xhtml
xhtmlmd --math=dollars input.md > out.xhtml

Python API:

from xhtmlmd import to_xhtml

html = to_xhtml(r"\(x^2\)")
html_for_katex = to_xhtml(r"\(x^2\)", math="on")
html_with_dollars = to_xhtml("$x$", math="dollars")

Callbacks

Python callers can override rendered nodes with callbacks. Each callback receives a node dict and the default XHTML for that node. Return None to keep the default, or return replacement XHTML.

Callback names:

  • Blocks: paragraph, heading, block_quote, list, definition_list, code_block, html_block, html_container, thematic_break, table, div, math_block
  • Inlines: text, soft_break, hard_break, emph, strong, strike, superscript, subscript, highlight, code, link, image, autolink, abbr, html_inline, math_inline, footnote_ref, span
from fastpylight import highlight
from xhtmlmd import to_xhtml

def highlight_code(node, default_html):
    if node["lang"] != "python": return None
    return highlight(node["text"], node["lang"]) + "\n"

html = to_xhtml(markdown, callbacks={"code_block": highlight_code})

Callbacks can also render bracket math as MathML:

from math_core import LatexToMathML
from xhtmlmd import to_xhtml

mathml = LatexToMathML()

def render_math(node, default_html):
    html = mathml.convert_with_local_counter(node["tex"], displaystyle=node["type"] == "math_block")
    return html + ("\n" if node["type"] == "math_block" else "")

html = to_xhtml(markdown, callbacks={"math_inline": render_math, "math_block": render_math})

Rust/source usage:

cargo run --release -- input.md > out.xhtml
cat input.md | cargo run --release -- --math=dollars

Library usage:

use xhtmlmd::{to_xhtml, Options, MathMode};

let mut options = Options::default();
options.math = MathMode::Dollars;
let html = to_xhtml("$x$", &options);

Parsing strategy

The parser uses the two-phase strategy described in the CommonMark parsing-strategy appendix: first build the block tree and collect link reference definitions, then parse raw inline text with the completed reference table. It tracks visual columns and byte offsets for each line and builds blocks with an arena-backed open-container stack. The stack has typed nodes for block quotes, lists, paragraphs/setext candidates, fenced and indented code, raw HTML, GFM table candidates, math, footnote definitions, definition lists, fenced divs, and markdown-in-HTML containers. Inlines are scanned into atoms, bracket openers, and delimiter runs; links/images/spans resolve through the bracket stack, while emphasis/strong/strikethrough resolve through the delimiter stack. Inputs that can otherwise explode have explicit bounds: inline nesting, block/container nesting, link label length, and link parenthesis nesting.

The link parser uses raw reference-label scanning, bounded parenthesis nesting, bounded link labels, URI escaping for rendered href/src attributes, and a plain-text fast path for inputs with no possible inline constructs. This keeps adversarial inputs such as deeply nested brackets, long blockquote runs, repeated ![[](), and unclosed comments in predictable time.

Raw HTML is preserved by default. Supported raw HTML container tags such as div, section, table, svg, math, and custom elements stay open across blank lines until their matching close tag, with same-tag nesting counted; void and self-closing tags do not open balanced containers. Markdown inside raw HTML remains raw unless the open tag that starts the Markdown block uses markdown="1"; this crate does not recursively look for markdown controls inside otherwise-raw HTML. Options::default().tagfilter is false; enabling it applies GFM-style filtering for tags such as script, style, xmp, and textarea. This is compatibility and extra protection, not a replacement for sanitizing untrusted rendered HTML.

Tests

ship-rs-test

Use cargo test --test conformance -- --nocapture when you want the per-section conformance report. The harness supports XHTML_MD_CONFORMANCE_SECTION, XHTML_MD_CONFORMANCE_EXAMPLE, XHTML_MD_CONFORMANCE_LIMIT, and XHTML_MD_CONFORMANCE_TRACE for narrowing failures.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhtmlmd-0.1.4.tar.gz (131.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

xhtmlmd-0.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (745.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.4-cp313-cp313-macosx_11_0_arm64.whl (657.3 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

xhtmlmd-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (745.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.4-cp312-cp312-macosx_11_0_arm64.whl (657.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

xhtmlmd-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (744.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.4-cp311-cp311-macosx_11_0_arm64.whl (660.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

xhtmlmd-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (745.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

xhtmlmd-0.1.4-cp310-cp310-macosx_11_0_arm64.whl (660.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file xhtmlmd-0.1.4.tar.gz.

File metadata

  • Download URL: xhtmlmd-0.1.4.tar.gz
  • Upload date:
  • Size: 131.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xhtmlmd-0.1.4.tar.gz
Algorithm Hash digest
SHA256 1404bc124ffec9b6d612fbdf1ffca2277c8d57f6ee6048b8038a4928eb6fb121
MD5 e994a00a48f91e21e5819b575ed0e6d5
BLAKE2b-256 8332af2cfd2ec2f88b5b46be782e4bee6569619a97bf4ff392e831cdca79a149

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.4.tar.gz:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 784c3087a352bfe2a6a975a0f1ae7d975b356157d124cc4503995b67d6d1b4be
MD5 52eb9c89290b3d45003119d957f4fabe
BLAKE2b-256 349f48a5ea04f1d696ae0a90c74f347c1d1aed397ddf7f5db5ec342c4d2eb43e

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.4-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.4-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fb299acb6356341f8a9cdd46888e51898081799da0fd6ff674312d7c05973ddb
MD5 c10e0bd53ca536d2f74dc348db77b1ea
BLAKE2b-256 f3f3ff04a730c8558102b4e333472aa2ddd56868a1b58679e2e61e07960d86e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.4-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 42f3f57cc25230f5920898f42a5d7cc08bfc18c07ae9d7bbd49afe434e2453e1
MD5 8e2e83ef8740f15bd4453be6f6ae90ba
BLAKE2b-256 c1ada2de269387fdf47c3a801c1e5aaa477e6466604f55833b115021e94db1cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d3b83db99588cf0a972d78cd0e0a88fde17a070fc2d46b27d877d06f723410f5
MD5 b6a352cf0fc20bd87703868af162161e
BLAKE2b-256 50f8172ad3b9b8678052fef4ba29ac4c4ac1ce42b96cbfa98731ad208017436a

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.4-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 01dc413e2d97d1a9e86e4b0a96d1fd7546e69dee57f02a67144e6c8ac76c3c1a
MD5 fd9796ef68caa85b02b2c7ca22ff5930
BLAKE2b-256 195f5293b6b589acd820172ad4a93cfdfbe3d00a6c550cfb060ef800a93d885a

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 986b9b8058e1a8b9379acde537e14cc1590b94ca691ee1cbab74138d0bf711d5
MD5 8a3bd3212343604cf015699f5c2d422a
BLAKE2b-256 31149ca57ec295f76cfd6a1c2b2ad705aca8c62b6bf51360d7cf05a5582320a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.4-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c37289be08aa93034d701f8a7b0a2584911481ffbee9f80c09a18b8478398fe7
MD5 f5eb75ec3b163ef7b1eb86522a800703
BLAKE2b-256 d4b08ffb2834c6c73b46505e9da77c112d445098ded91ac84557ad02489d6cba

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xhtmlmd-0.1.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xhtmlmd-0.1.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 955b3ad23fa4ed5a69af9d86efefdb33b200ad58eabee49aa6f0b51ed4325ab5
MD5 6b61463c016cc5b4d8ffed4769ba4dce
BLAKE2b-256 13c91ac02a79a19852ad280b436dd8848f424a2ee59b4bebb3f7dcb273d19291

See more details on using hashes here.

Provenance

The following attestation bundles were made for xhtmlmd-0.1.4-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: ci.yml on AnswerDotAI/xhtmlmd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page