Skip to main content

Markdown grammar for tree-sitter, with a textlint-style AST shape

Project description

tree-sitter-markdown-text

Markdown grammar for tree-sitter, shaped so that its AST lines up with the textlint TxtNode model.

Parses .md (and .markdown, .mdown, .mkd, .mkdn) files into a concrete syntax tree covering the full CommonMark block structure plus common extensions (GFM pipe tables, task lists, GFM alerts, YAML/TOML front matter, Pandoc math and directive blocks, footnotes, MDX JSX). Inline content is surfaced as structured children of the inline wrapper: classified tokens (word_token, numeric_token, identifier_like_token, path_like_token) and punctuation-class nodes (terminator, separator, bracket, operator_like), plus inline structural nodes (emphasis, strong, strikethrough, link, image, autolink, inline_code, html_inline, math_inline, mdx_jsx_inline, footnote_reference).

Features

Block nodes

  • Document structuredocument, nested section wrappers around ATX headings, paragraph, blank_line (as a first-class node).
  • Headings — ATX (#..######) and setext (===/---) with the heading level exposed as a level field on both atx_heading and setext_heading.
  • Code blocks — indented code blocks and fenced code blocks (backtick and tilde), with info_string/language children for the GFM language tag.
  • Math blocks — Pandoc/GitLab/KaTeX display math ($$…$$) as a dedicated math_block with math_block_delimiter/math_block_content children.
  • Lists — unordered (+/-/*) and ordered (1./1)) list markers. GFM task list items are promoted to task_list_item (distinct from list_item), with task_list_marker_checked/task_list_marker_unchecked markers.
  • Block quotes and callouts — nested quotes and lazy continuations. A block quote whose first paragraph begins with [!NOTE] / [!TIP] / [!IMPORTANT] / [!WARNING] / [!CAUTION] (or any uppercase-only label) is surfaced as callout with a callout_type field.
  • Thematic breaks---, ***, ___.
  • HTML blocks — all 7 CommonMark HTML block types; block-level HTML comments are aliased to html_comment_block for easy metric extraction.
  • MDX JSX blocks — shallow mdx_jsx_block for lines that start with an MDX-style JSX element (<Component ...>, <Component/>, </Component>). Component-style mixed-case names disambiguate from all-caps HTML blocks such as <DIV>.
  • Pipe tablespipe_table with pipe_table_header, pipe_table_delimiter_row, pipe_table_row, pipe_table_cell, pipe_table_align_left/pipe_table_align_right.
  • Link reference definitionslink_reference_definition with link_label/link_destination/link_title children.
  • Footnote definitionsfootnote_definition ([^id]: …) with a footnote_label child.
  • Directive blocks — generic container directives (:::name … :::, per remark-directive / MyST / Pandoc fenced divs) as directive_block with directive_block_delimiter/directive_name/directive_block_content children.
  • Image blocks — a paragraph consisting of a single block-level image (![alt](dest) on its own line) is surfaced as image_block with link_label/link_destination children.
  • Front matter — YAML (--- fenced) as minus_metadata, TOML (+++ fenced) as plus_metadata.

Inline nodes (children of the inline wrapper)

  • Classified text tokenstext_span wraps runs of classified tokens: word_token (Unicode alphabetic), numeric_token (integers, decimals, versions), identifier_like_token (camelCase / PascalCase / snake_case), path_like_token (paths with / separators or dotted identifiers).

  • Punctuation classes — every punctuation lexeme is classified: terminator (., ?, !, , ), separator (,, ;, :), bracket ((, ), [, ], {, }, <, >), operator_like (::, ->, =>, =, +, -, *, /, |, &, and other punctuation).

  • Emphasis / strong / strikethroughemphasis (*…* or _…_), strong (**…** or __…__), strikethrough (~~…~~), each with a _delimiter/_content/_delimiter sub-tree.

  • Code spansinline_code with matched backtick-run delimiters (1 or 2 backticks).

  • Links and imageslink (inline, full-reference, collapsed-reference, shortcut-reference forms) and image (![alt](dest) or ![alt][ref]). Both expose link_label/link_destination/link_title children.

  • Autolinksautolink with uri or email children for <https://…> and <user@example.com>.

  • Raw HTML inlinehtml_inline with html_open_tag/html_close_tag/html_comment/html_cdata/html_declaration/html_processing_instruction children.

  • MDX JSX inline — shallow mdx_jsx_inline with mdx_jsx_open_tag/mdx_jsx_close_tag/mdx_jsx_expression children.

  • Inline mathmath_inline ($…$) with math_inline_delimiter/math_inline_content children. Disambiguated from math_block ($$…$$).

  • Footnote referencesfootnote_reference ([^id] inside prose) with a footnote_reference_label child.

  • Injections query — ships a queries/injections.scm that injects into fenced-code-block info strings, HTML blocks, and front matter.

Example

# Heading

A paragraph with inline content.

- one
- two

```go
func main() {}

Parsed tree (abbreviated):

(document (section (atx_heading level: (atx_h1_marker) heading_content: (inline)) (blank_line) (paragraph (inline)) (blank_line) (list (list_item (list_marker_minus) (paragraph (inline))) (list_item (list_marker_minus) (paragraph (inline)))) (blank_line) (fenced_code_block (fenced_code_block_delimiter) (info_string (language)) (code_fence_content) (fenced_code_block_delimiter))))


## Relationship to textlint

The grammar is structurally close to the textlint AST. Every block-level `TxtNode` type has a direct counterpart here; inline `TxtNode` types (`Str`, `Emphasis`, `Strong`, `Link`, `Image`, `Code`, `Html`, `Delete`, `FootnoteReference`) also have direct counterparts as children of the `inline` wrapper. Names stay snake_case per the tree-sitter convention; consumers map names themselves. See [docs/textlint-mapping.md](docs/textlint-mapping.md) for the full table.

## Installation

### npm

```sh
npm install tree-sitter-markdown-text

Cargo

cargo add tree-sitter-markdown-text

PyPI

pip install tree-sitter-markdown-text

Go

import tree_sitter_markdown_text "github.com/ophidiarium/tree-sitter-markdown-text/bindings/go"

The root package also exports the bundled queries via go:embed:

import markdown "github.com/ophidiarium/tree-sitter-markdown-text"

lang := markdown.GetLanguage()
query, _ := markdown.GetHighlightsQuery()

Usage

Node.js

import Parser from "tree-sitter";
import Markdown from "tree-sitter-markdown-text";

const parser = new Parser();
parser.setLanguage(Markdown);

const tree = parser.parse("# hello\n");
console.log(tree.rootNode.toString());

Rust

let mut parser = tree_sitter::Parser::new();
let language = tree_sitter_markdown_text::LANGUAGE;
parser.set_language(&language.into()).unwrap();

let tree = parser.parse("# hello\n", None).unwrap();
println!("{}", tree.root_node().to_sexp());

Python

from tree_sitter import Language, Parser
import tree_sitter_markdown_text

parser = Parser(Language(tree_sitter_markdown_text.language()))
tree = parser.parse(b"# hello\n")
print(tree.root_node.sexp())

Credits and references

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tree_sitter_markdown_text-0.2.0.tar.gz (312.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tree_sitter_markdown_text-0.2.0-cp310-abi3-win_arm64.whl (140.9 kB view details)

Uploaded CPython 3.10+Windows ARM64

tree_sitter_markdown_text-0.2.0-cp310-abi3-win_amd64.whl (146.1 kB view details)

Uploaded CPython 3.10+Windows x86-64

tree_sitter_markdown_text-0.2.0-cp310-abi3-musllinux_1_2_x86_64.whl (190.8 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

tree_sitter_markdown_text-0.2.0-cp310-abi3-musllinux_1_2_aarch64.whl (187.9 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

tree_sitter_markdown_text-0.2.0-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (189.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

tree_sitter_markdown_text-0.2.0-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (193.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

tree_sitter_markdown_text-0.2.0-cp310-abi3-macosx_11_0_arm64.whl (151.1 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

tree_sitter_markdown_text-0.2.0-cp310-abi3-macosx_10_9_x86_64.whl (142.8 kB view details)

Uploaded CPython 3.10+macOS 10.9+ x86-64

File details

Details for the file tree_sitter_markdown_text-0.2.0.tar.gz.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0cac4a330b658b3eca84098fb01f24411b50e700b92f6ba6f6f5673409419d32
MD5 015e1e44d06ad2603939ac60810ad2c2
BLAKE2b-256 e8395d336c8f48afd6c72a2419551731459eb7aec3139dd75a849af173ed5a02

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.0.tar.gz:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.0-cp310-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.0-cp310-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 24a2b1df00cbd3ce5ce9d50fa2f642d62d97f47d1a29fe9ab487778b35daa1c5
MD5 c2f997b3708582132d9698090030fc69
BLAKE2b-256 df49f6c5f1ca40a88e73bc6cb4cde3e031de0a2381d24f3a2ea42cab0c34eca4

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.0-cp310-abi3-win_arm64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.0-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 05f9f62ffeb38d4da64997f4446ce82fed9822818910af2dc65431a7216f2f27
MD5 a543f84afe1e3670e2bc9d8576a4b59b
BLAKE2b-256 6b9403cba61f51c846ef9e446f2373e35e88c47f3ea820a1dfbaf3c1fbf1a0c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.0-cp310-abi3-win_amd64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.0-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.0-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 d5889d3220aa1c86fba57c7c197b184fc0f03f0d89312172426b20055e76afe5
MD5 25ae141c1daddc9cde79bd7dd8a68cb4
BLAKE2b-256 780bc36fa235e3538285ba5510a4650668bd3839abc098d288467870cbad9642

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.0-cp310-abi3-musllinux_1_2_x86_64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.0-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.0-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 6390dd468c16db27a21f8bbc238ced4dd0a341c57a6d17048e309ba33ac8f8c0
MD5 27834bedb38f0fb2b77630598a7c1e69
BLAKE2b-256 71d78a5832ad752481a39a86a8f8cbd433b1f1230d16c24079d09455b10b51f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.0-cp310-abi3-musllinux_1_2_aarch64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.0-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.0-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 81e046b0622975379789e507cd432b3371e8e39b0dac49dfa657cb6ae02c4628
MD5 007c62bab895b0e71d2602cacef369ed
BLAKE2b-256 a9e7244156c4ab9e303f77940f1708e393be1af126f7e2a329f61a0467d4406d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.0-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.0-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.0-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl
Algorithm Hash digest
SHA256 fe8b5916b23afc026079a2a0519ddb165b0ad9d36134d5bc21e2ab086182eaae
MD5 b2947342ba89036f98e1219c88a5bc9a
BLAKE2b-256 d1c2894e15f56d19ab60ebed30bbdf038efcf47bccda7424c63e148e8cf464e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.0-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 791153d8c70f40aa54187d86acca7f83d1e32600c8cc0e52a15e32dbc38dcfd8
MD5 41e73e7a2ba6902e928a1866f2851c01
BLAKE2b-256 4cf9a1c3d26d43a48936fe6bea6ae58792c46e4643295a3333073fcf5559bee5

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tree_sitter_markdown_text-0.2.0-cp310-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_markdown_text-0.2.0-cp310-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 cb2b86dfbec5af70ad92033d5e66d2d76f8f70b86dd07d5fe11f32a31c07b479
MD5 173a51a41ef11e249684394cb407582b
BLAKE2b-256 6975f169a74cdd2f1ecbefab646491478f8c4986afe1eb3282caf20c1f5d3ff5

See more details on using hashes here.

Provenance

The following attestation bundles were made for tree_sitter_markdown_text-0.2.0-cp310-abi3-macosx_10_9_x86_64.whl:

Publisher: publish_pypi.yml on ophidiarium/tree-sitter-markdown-text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page