A Python package for converting Markdown to AST and back to Markdown
Project description
marktripy
TL;DR: A Python package for parsing Markdown to AST, manipulating the tree structure, and serializing back to Markdown while preserving formatting. Built on markdown-it-py and mistletoe for maximum flexibility.
from marktripy import parse_markdown, render_markdown
# Parse Markdown to AST
ast = parse_markdown("# Hello\n\nThis is **bold** text.")
# Manipulate AST (e.g., downgrade headings)
for node in ast.walk():
if node.type == "heading":
node.level += 1
# Render back to Markdown
markdown = render_markdown(ast)
# Output: "## Hello\n\nThis is **bold** text."
Installation
# Using pip
pip install marktripy
# Using uv (recommended)
uv add marktripy
# Development installation
git clone https://github.com/yourusername/marktripy
cd marktripy
uv sync --dev
Quick Usage
Basic Markdown to HTML
from marktripy import markdown_to_html
html = markdown_to_html("# Hello World\n\nThis is **bold** and *italic*.")
# <h1>Hello World</h1><p>This is <strong>bold</strong> and <em>italic</em>.</p>
AST Manipulation
from marktripy import parse_markdown, render_markdown
# Parse Markdown to AST
ast = parse_markdown("""
# Main Title
## Section 1
Some content here.
## Section 2
More content.
""")
# Add IDs to all headings
for node in ast.walk():
if node.type == "heading":
# Generate ID from heading text
text = node.get_text().lower().replace(" ", "-")
node.attrs["id"] = text
# Downgrade all headings by one level
for node in ast.walk():
if node.type == "heading" and node.level < 6:
node.level += 1
# Render back to Markdown
result = render_markdown(ast)
Custom Syntax Extensions
from marktripy import create_extension, Parser
# Create a custom extension for ++text++ → <kbd>text</kbd>
kbd_extension = create_extension(
pattern=r'\+\+([^+]+)\+\+',
node_type='kbd',
html_tag='kbd'
)
# Use parser with extension
parser = Parser(extensions=[kbd_extension])
ast = parser.parse("Press ++Ctrl+C++ to copy")
html = parser.render_html(ast)
# Output: Press <kbd>Ctrl+C</kbd> to copy
CLI Usage
# Convert Markdown to HTML
marktripy convert input.md -o output.html
# Parse and manipulate Markdown
marktripy transform input.md --downgrade-headings --add-ids -o output.md
# Validate Markdown structure
marktripy validate document.md --check-links --check-headings
The Backstory
Why Another Markdown Parser?
The Python ecosystem has numerous Markdown parsers, each with different strengths:
markdown: The original, extensible but with a complex APImarkdown2: Faster alternative but less extensiblemistune: Fast and supports AST, but limited round-trip capabilitymarko: Good AST support but newer with less ecosystemmarkdown-it-py: Port of markdown-it with excellent plugin system
After extensive research (see /ref directory), I found that no single library perfectly addressed the need for:
- Clean AST manipulation - Easy traversal and modification of document structure
- Round-trip conversion - Parse Markdown → AST → Markdown without losing formatting
- Extensibility - Simple API for adding custom syntax
- Performance - Fast enough for real-world documents
- Standards compliance - CommonMark compliant with GFM extensions
The Research Journey
The /ref directory contains comprehensive research comparing 8+ Python Markdown libraries across multiple dimensions:
- ref1.md: Practical guide to advanced Markdown processing in Python
- ref2.md: Detailed comparison of parser architectures and extension mechanisms
- ref3.md: Performance benchmarks and feature matrix
Key findings:
markdown-it-pyoffers the best plugin architecturemistletoehas the cleanest AST representationmarkoprovides good round-trip capabilities- Performance varies by 10-100x between libraries
Design Philosophy
marktripy combines the best ideas from existing libraries:
- Dual-parser architecture: Use
markdown-it-pyfor extensibility andmistletoefor AST manipulation - Unified AST format: Convert between parser representations transparently
- Preserving formatting: Track source positions and whitespace for faithful round-trips
- Plugin-first design: Everything beyond core CommonMark is a plugin
- Type safety: Full type hints with
mypy --strictcompatibility
Technical Architecture
Core Components
marktripy/
├── ast.py # Unified AST node definitions
├── parser.py # Parser abstraction layer
├── renderer.py # Markdown/HTML renderers
├── extensions/ # Built-in extensions
│ ├── gfm.py # GitHub Flavored Markdown
│ ├── toc.py # Table of contents generator
│ └── ...
├── transformers/ # AST transformation utilities
│ ├── headings.py # Heading manipulation
│ ├── links.py # Link processing
│ └── ...
└── cli.py # Command-line interface
AST Structure
The AST uses a unified node structure compatible with both parsers:
class ASTNode:
type: str # Node type (heading, paragraph, etc.)
children: List[ASTNode]
attrs: Dict[str, Any] # Attributes (id, class, etc.)
content: str # Text content for leaf nodes
meta: Dict[str, Any] # Source mapping, parser-specific data
Parser Architecture
# Abstraction layer over multiple parsers
class Parser:
def __init__(self, parser_backend="markdown-it-py", extensions=None):
self.backend = self._create_backend(parser_backend)
self.extensions = extensions or []
def parse(self, markdown: str) -> ASTNode:
# Parse with backend
backend_ast = self.backend.parse(markdown)
# Convert to unified AST
return self._normalize_ast(backend_ast)
Extension System
Extensions can hook into multiple stages:
class Extension:
def extend_parser(self, parser): ... # Modify parser rules
def transform_ast(self, ast): ... # Post-process AST
def extend_renderer(self, renderer): ... # Custom rendering
Rendering Pipeline
- AST → Markdown: Preserves formatting, handles custom nodes
- AST → HTML: Configurable sanitization, custom handlers
- AST → JSON: Serialization for processing pipelines
Performance Optimizations
- Lazy parsing for large documents
- Streaming renderers for memory efficiency
- Optional C extensions via
umarkdownbackend - Caching for repeated transformations
Advanced Usage
Custom Transformers
from marktripy import Transformer
class HeaderAnchorTransformer(Transformer):
"""Add GitHub-style anchor links to headers"""
def transform(self, ast):
for node in ast.walk():
if node.type == "heading":
anchor = self.create_anchor(node)
node.children.insert(0, anchor)
return ast
Parser Backends
# Use different backends for different needs
from marktripy import Parser
# Maximum compatibility
parser = Parser(backend="markdown")
# Best performance
parser = Parser(backend="mistletoe")
# Most extensions
parser = Parser(backend="markdown-it-py")
Integration Examples
# Pelican static site generator
from marktripy import PelicanReader
# MkDocs documentation
from marktripy import MkDocsPlugin
# Jupyter notebook processing
from marktripy import MarkdownCell
Contributing
We welcome contributions! Key areas:
- Additional extensions (math, diagrams, etc.)
- Performance improvements
- Better round-trip fidelity
- More transformer utilities
See CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Acknowledgments
Built on the shoulders of giants:
markdown-it-pydevelopers for the excellent plugin systemmistletoefor the clean AST design- The CommonMark specification authors
- All researchers of the Python Markdown ecosystem
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marktripy-1.0.3.tar.gz.
File metadata
- Download URL: marktripy-1.0.3.tar.gz
- Upload date:
- Size: 46.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d760b44aac7a528d6e045f22592670c013e3a29f19b93a902c5c72991477c3a
|
|
| MD5 |
91c36d339ca1af77177b14f07717bd1e
|
|
| BLAKE2b-256 |
bf96779a97a53739a4596e0a56a08de040799d15e79d93e994d1ce544025b065
|
File details
Details for the file marktripy-1.0.3-py3-none-any.whl.
File metadata
- Download URL: marktripy-1.0.3-py3-none-any.whl
- Upload date:
- Size: 47.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffadcec11db94031d9673761d43c151fe65004692f823811640ec87fe54cb965
|
|
| MD5 |
1dabd6f4b7609e6d81d5de2a58e77a76
|
|
| BLAKE2b-256 |
9a07a241d76967401f8044f56347839b0c3770b3b18c06dd168167fd46800f49
|