Skip to main content

Markdown grammar for tree-sitter, with a textlint-style AST shape

Project description

tree-sitter-markdown-text

Markdown grammar for tree-sitter, shaped so that its AST lines up with the textlint TxtNode model.

Parses .md (and .markdown, .mdown, .mkd, .mkdn) files into a concrete syntax tree covering the full CommonMark block structure plus common extensions (GFM pipe tables, task lists, GFM alerts, YAML/TOML front matter, Pandoc math and directive blocks, footnotes, MDX JSX). Inline content is surfaced as structured children of the inline wrapper: classified tokens (word_token, numeric_token, identifier_like_token, path_like_token) and punctuation-class nodes (terminator, separator, bracket, operator_like), plus inline structural nodes (emphasis, strong, strikethrough, link, image, autolink, inline_code, html_inline, math_inline, mdx_jsx_inline, footnote_reference).

Features

Block nodes

  • Document structuredocument, nested section wrappers around ATX headings, paragraph, blank_line (as a first-class node).
  • Headings — ATX (#..######) and setext (===/---) with the heading level exposed as a level field on both atx_heading and setext_heading.
  • Code blocks — indented code blocks and fenced code blocks (backtick and tilde), with info_string/language children for the GFM language tag.
  • Math blocks — Pandoc/GitLab/KaTeX display math ($$…$$) as a dedicated math_block with math_block_delimiter/math_block_content children.
  • Lists — unordered (+/-/*) and ordered (1./1)) list markers. GFM task list items are promoted to task_list_item (distinct from list_item), with task_list_marker_checked/task_list_marker_unchecked markers.
  • Block quotes and callouts — nested quotes and lazy continuations. A block quote whose first paragraph begins with [!NOTE] / [!TIP] / [!IMPORTANT] / [!WARNING] / [!CAUTION] (or any uppercase-only label) is surfaced as callout with a callout_type field.
  • Thematic breaks---, ***, ___.
  • HTML blocks — all 7 CommonMark HTML block types; block-level HTML comments are aliased to html_comment_block for easy metric extraction.
  • MDX JSX blocks — shallow mdx_jsx_block for lines that start with an MDX-style JSX element (<Component ...>, <Component/>, </Component>). Component-style mixed-case names disambiguate from all-caps HTML blocks such as <DIV>.
  • Pipe tablespipe_table with pipe_table_header, pipe_table_delimiter_row, pipe_table_row, pipe_table_cell, pipe_table_align_left/pipe_table_align_right.
  • Link reference definitionslink_reference_definition with link_label/link_destination/link_title children.
  • Footnote definitionsfootnote_definition ([^id]: …) with a footnote_label child.
  • Directive blocks — generic container directives (:::name … :::, per remark-directive / MyST / Pandoc fenced divs) as directive_block with directive_block_delimiter/directive_name/directive_block_content children.
  • Image blocks — a paragraph consisting of a single block-level image (![alt](dest) on its own line) is surfaced as image_block with link_label/link_destination children.
  • Front matter — YAML (--- fenced) as minus_metadata, TOML (+++ fenced) as plus_metadata.

Inline nodes (children of the inline wrapper)

  • Classified text tokenstext_span wraps runs of classified tokens: word_token (Unicode alphabetic), numeric_token (integers, decimals, versions), identifier_like_token (camelCase / PascalCase / snake_case), path_like_token (paths with / separators or dotted identifiers).

  • Punctuation classes — every punctuation lexeme is classified: terminator (., ?, !, , ), separator (,, ;, :), bracket ((, ), [, ], {, }, <, >), operator_like (::, ->, =>, =, +, -, *, /, |, &, and other punctuation).

  • Emphasis / strong / strikethroughemphasis (*…* or _…_), strong (**…** or __…__), strikethrough (~~…~~), each with a _delimiter/_content/_delimiter sub-tree.

  • Code spansinline_code with matched backtick-run delimiters (1 or 2 backticks).

  • Links and imageslink (inline, full-reference, collapsed-reference, shortcut-reference forms) and image (![alt](dest) or ![alt][ref]). Both expose link_label/link_destination/link_title children.

  • Autolinksautolink with uri or email children for <https://…> and <user@example.com>.

  • Raw HTML inlinehtml_inline with html_open_tag/html_close_tag/html_comment/html_cdata/html_declaration/html_processing_instruction children.

  • MDX JSX inline — shallow mdx_jsx_inline with mdx_jsx_open_tag/mdx_jsx_close_tag/mdx_jsx_expression children.

  • Inline mathmath_inline ($…$) with math_inline_delimiter/math_inline_content children. Disambiguated from math_block ($$…$$).

  • Footnote referencesfootnote_reference ([^id] inside prose) with a footnote_reference_label child.

  • Injections query — ships a queries/injections.scm that injects into fenced-code-block info strings, HTML blocks, and front matter.

Example

# Heading

A paragraph with inline content.

- one
- two

```go
func main() {}

Parsed tree (abbreviated):

(document (section (atx_heading level: (atx_h1_marker) heading_content: (inline)) (blank_line) (paragraph (inline)) (blank_line) (list (list_item (list_marker_minus) (paragraph (inline))) (list_item (list_marker_minus) (paragraph (inline)))) (blank_line) (fenced_code_block (fenced_code_block_delimiter) (info_string (language)) (code_fence_content) (fenced_code_block_delimiter))))


## Relationship to textlint

The grammar is structurally close to the textlint AST. Every block-level `TxtNode` type has a direct counterpart here; inline `TxtNode` types (`Str`, `Emphasis`, `Strong`, `Link`, `Image`, `Code`, `Html`, `Delete`, `FootnoteReference`) also have direct counterparts as children of the `inline` wrapper. Names stay snake_case per the tree-sitter convention; consumers map names themselves. See [docs/textlint-mapping.md](docs/textlint-mapping.md) for the full table.

## Installation

### npm

```sh
npm install tree-sitter-markdown-text

Cargo

cargo add tree-sitter-markdown-text

PyPI

pip install tree-sitter-markdown-text

Go

import tree_sitter_markdown_text "github.com/ophidiarium/tree-sitter-markdown-text/bindings/go"

The root package also exports the bundled queries via go:embed:

import markdown "github.com/ophidiarium/tree-sitter-markdown-text"

lang := markdown.GetLanguage()
query, _ := markdown.GetHighlightsQuery()

Usage

Node.js

import Parser from "tree-sitter";
import Markdown from "tree-sitter-markdown-text";

const parser = new Parser();
parser.setLanguage(Markdown);

const tree = parser.parse("# hello\n");
console.log(tree.rootNode.toString());

Rust

let mut parser = tree_sitter::Parser::new();
let language = tree_sitter_markdown_text::LANGUAGE;
parser.set_language(&language.into()).unwrap();

let tree = parser.parse("# hello\n", None).unwrap();
println!("{}", tree.root_node().to_sexp());

Python

from tree_sitter import Language, Parser
import tree_sitter_markdown_text

parser = Parser(Language(tree_sitter_markdown_text.language()))
tree = parser.parse(b"# hello\n")
print(tree.root_node.sexp())

Credits and references

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tree_sitter_markdown_text-0.2.1.tar.gz (312.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tree_sitter_markdown_text-0.2.1-cp310-abi3-win_arm64.whl (140.9 kB view details)

Uploaded CPython 3.10+Windows ARM64

tree_sitter_markdown_text-0.2.1-cp310-abi3-win_amd64.whl (146.1 kB view details)

Uploaded CPython 3.10+Windows x86-64

tree_sitter_markdown_text-0.2.1-cp310-abi3-musllinux_1_2_x86_64.whl (190.8 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

tree_sitter_markdown_text-0.2.1-cp310-abi3-musllinux_1_2_aarch64.whl (187.9 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

tree_sitter_markdown_text-0.2.1-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (189.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

tree_sitter_markdown_text-0.2.1-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (193.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

tree_sitter_markdown_text-0.2.1-cp310-abi3-macosx_11_0_arm64.whl (151.1 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

tree_sitter_markdown_text-0.2.1-cp310-abi3-macosx_10_9_x86_64.whl (142.8 kB view details)

Uploaded CPython 3.10+macOS 10.9+ x86-64

File details

Details for the file tree_sitter_markdown_text-0.2.1.tar.gz.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.1.tar.gz
Algorithm Hash digest
SHA256 26fa99a02defb0b425602e4a30c756de26dca3e83e985f8621867f50afeb0673
MD5 cd1a9ae24098897ea6aeebd88555f313
BLAKE2b-256 aae5a987f244e182d94b9e729518c6b65a1a0fec7814e8b91a504eaf2ad92298

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.1.tar.gz:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.1-cp310-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.1-cp310-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 73caccd5cd30fcb7c406ec8b183dc7c37ba40a04c0a7df9eaeccf27f137cbd99
MD5 737cf94c57fdd4f60ce5c182619c9986
BLAKE2b-256 78f052b75fc42df6f1fe29c8b9f49333fb95f2b4cdc8f726a3f2c1c24c1ebc69

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.1-cp310-abi3-win_arm64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.1-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 94c18f4a8d76bb1b5eb89b4b58c323367b6594c9773cfa20b82b1dca0f9034b9
MD5 be0f4cf77d7c0e05d15faecb8bd21810
BLAKE2b-256 26d1c63c260c2a84684fe7eff07638597c7470ba03faf0b085ba54c05092dc73

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.1-cp310-abi3-win_amd64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.1-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.1-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 874e54182ae68d9c34037c7408a6aaf95cafc19e37bcab1345fbff9d364302be
MD5 2335d1078a48ffb6988f52c517f6ac94
BLAKE2b-256 6e7410cbec941926987a3f312eede67a069054f4ece1163d42be3dab24e7d63e

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.1-cp310-abi3-musllinux_1_2_x86_64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.1-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.1-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 4f69dc3e2dbd38c1e22e1077a7e3c32de4bd893a66e6c95663d99dcdc497f83c
MD5 286d2ed59751b9b4cd90d90f2df47c39
BLAKE2b-256 f60197001e94e947dc80cebb8c0c57768171a4601372bc0cd16789cce37e9beb

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.1-cp310-abi3-musllinux_1_2_aarch64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.1-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.1-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 de9b1253c5fa97461f72f8723898db54ed6d7666ae044e5235a6fb8f129327ae
MD5 e55c12c11707f78c111ab4a796220bd0
BLAKE2b-256 b7a5073b0f69034bf998884799683735b1a95a8e082db4241d9f94be3a67d8a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.1-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.1-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.1-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl
Algorithm Hash digest
SHA256 e8f1e591ef9bcbcbbef4132ae2c231a612c91b1ef5aa956e69119db3d7bbfb46
MD5 0310f73ba96ef13844c6b0ece4110fc8
BLAKE2b-256 a823f32d714fc285e3b8fe23cf420dee6185ff667a59f07f065d692941c43287

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.1-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4efd1bb601c75c4cef08e17313806692775604684b75375946669bc9b9defba4
MD5 dfbac5fd3434302e6b1e5f602c7e5c96
BLAKE2b-256 1e0732e1f5a72b433191274ac69f3b69c7fe7050ae0d550876c6dd6c5402b02f

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.1-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.1-cp310-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.1-cp310-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 93f7b47a8bdea1b058b5f8885c695536e4a7a9fa5a7cd24b6313f82b34c111e3
MD5 ff927bbc0a9f698162d7b206deecb589
BLAKE2b-256 2cca3b9378358925362ce8f7a78a47cc72111991e92bbfc950fb2f3dbf62090b

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.1-cp310-abi3-macosx_10_9_x86_64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page