Automated programmatic Markdown-to-PDF typesetting engine
Project description
Automated Programmatic Markdown-to-PDF Typesetting Engine
md2pdf converts structured Markdown documents into beautiful, print-ready PDFs. Unlike other conversion tools, it does not rely on heavy system dependencies like Pandoc, Node.js, or headless Chrome/Chromium browsers. It is written in pure Python and powered by ReportLab and mistletoe.
Architecture Overview
md2pdf is designed as a pipeline that runs in four distinct stages: preprocessing, parsing/validation, element rendering, and layout composition.
graph TD
Input[Markdown File] --> Pre[Preprocessors]
Pre --> Parser[MarkdownParser]
Parser --> Tokens[Token Stream]
Tokens --> Val[DocumentValidator]
Tokens --> Reg[Plugin/Handler Registry]
Reg --> Handlers[Element Handlers]
Handlers --> Kroki[Kroki API / Cache]
Handlers --> Flowables[ReportLab Flowables]
Flowables --> Layout[LayoutComposer]
Layout --> Post[Postprocessors]
Post --> Renderer[ReportLab PDF Engine]
Renderer --> Output[Output PDF]
Key Features
- Standard Elements: Headings (H1–H6), paragraphs, lists, blockquotes, horizontal rules, and hyperlinks.
- Multi-page Tables: Tables split cleanly across page boundaries. Headers repeat at the top of every page.
- Diagrams & Math Blocks: Renders Mermaid diagrams and LaTeX math blocks via the Kroki API, with automatic transparent/white margin cropping, offline fallbacks, and SHA-256 disk caching.
- Extensible Plugin System: Load custom element handlers, text-level preprocessors, post-processors, and stylesheet/theme layers.
- Typesetting Safeguards: Implements strict "anti-fail" layout rules including orphaned heading protection, ghost page elimination, and widow/orphan line settings.
- DX-First Validation: Pre-render validation runs to identify nested tables, empty diagrams, or unsupported elements before rendering.
Tech Stack
| Component | Library/Tool | Description |
|---|---|---|
| Core Language | Python >= 3.11 | Modern Python with strict type-hinting |
| PDF Generation | ReportLab >= 4.0 | Low-level document layout and flowable engine |
| Markdown Parsing | mistletoe >= 1.3 | Fast and extensible Markdown AST parser |
| HTTP Requests | requests >= 2.31 | Handles communication with Kroki API |
| CLI Framework | typer >= 0.12 | CLI builder for options and validation errors |
| Image Processing | pillow >= 10.0 | Auto-cropping and dimension detection for diagrams |
Project Structure
md2pdf/
├── docs/ # Developer documentation
│ ├── plugin-authoring.md # Instructions for writing plugins
│ └── themes.md # Themes and stylesheet reference
├── md2pdf/ # Core source directory
│ ├── assets/ # Kroki client, caching, and fallback elements
│ ├── core/ # Engine pipeline, parser, validator, layout, registry
│ ├── handlers/ # Element-specific flowable generators (headings, tables, etc.)
│ ├── styles/ # Default stylesheet and theme configs
│ ├── cli.py # CLI entry point
│ └── pipeline.py # Main execution coordinator
├── tests/ # Automated test suite
│ ├── fixtures/ # Markdown and configuration test files
│ ├── test_cli.py # CLI integration tests
│ └── test_pipeline.py # Pipeline validation tests
├── pyproject.toml # Build system and dependency declaration
└── README.md # Project overview
Logic Flows
The diagram below details the sequence of execution inside the Pipeline class:
sequenceDiagram
autonumber
actor CLI as CLI / User
participant PL as Pipeline
participant PR as Preprocessors
participant MP as MarkdownParser
participant DV as DocumentValidator
participant REG as PluginRegistry
participant LC as LayoutComposer
participant PO as Postprocessors
participant RL as ReportLab Engine
CLI->>PL: run(raw_md)
PL->>PR: run_all(raw_md)
PR-->>PL: preprocessed_md
PL->>MP: parse(preprocessed_md)
MP-->>PL: token_stream
PL->>DV: validate(token_stream)
DV-->>PL: validation_issues (warnings/errors)
PL->>REG: render(token_stream)
REG-->>PL: raw_flowables
PL->>LC: compose(raw_flowables)
LC-->>PL: layout_guarded_flowables
PL->>PO: run_all(layout_guarded_flowables)
PO-->>PL: finalized_flowables
PL->>RL: build(finalized_flowables)
RL-->>CLI: output.pdf
Installation & Setup
Using uv (recommended):
uv tool install pymd2pdf
Or via standard pip:
pip install pymd2pdf
To initialize the project for local development:
# Clone the repository
git clone https://github.com/user/md2pdf.git
cd md2pdf
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
Usage Examples
Command Line Interface
Convert a Markdown file:
md2pdf input.md -o output.pdf
Execute pre-render validation checks without producing a PDF:
md2pdf input.md --validate-only
Run in offline mode to avoid calling the Kroki API (places image boxes with source code in the PDF instead):
md2pdf input.md -o output.pdf --offline
CLI Options
| Flag | Shortcut | Description |
|---|---|---|
--output |
-o |
Path to save the output PDF file (default: output.pdf). |
--config |
-c |
Path to a custom md2pdf.toml config file. |
--theme |
-t |
Name of the theme to apply (default: default). |
--offline |
Skip external API requests (e.g. Kroki diagram rendering) and use local placeholders. | |
--validate-only |
Execute pre-render validation checks and exit without building a PDF. | |
--verbose |
-v |
Output debug-level logging to stderr. |
Programmatic Python Usage
from md2pdf import convert, Config, Pipeline
# Option 1: Simple conversion
convert("input.md", "output.pdf")
# Option 2: Advanced programmatic pipeline usage
config = Config(
offline=False,
cache_dir=".md2pdf_cache",
output_file="my_document.pdf"
)
pipeline = Pipeline(config)
# Validate markdown document
issues = pipeline.validate("# Hello World")
for issue in issues:
print(f"[{issue.severity}] {issue.code}: {issue.message}")
# Render markdown
pipeline.run(raw_md="# Document Title\n\nSome body text.")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymd2pdf-0.1.4.tar.gz.
File metadata
- Download URL: pymd2pdf-0.1.4.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d38a132616513803115abdb1dd64d362b9781df704971af40d4ef2f7204080a
|
|
| MD5 |
34a3c0bb083538b87e7b2308c032a285
|
|
| BLAKE2b-256 |
e5feebb1fd7b44b62ccfad87c5d0969836c1f3b868dfb581c12b6a326b18a723
|
File details
Details for the file pymd2pdf-0.1.4-py3-none-any.whl.
File metadata
- Download URL: pymd2pdf-0.1.4-py3-none-any.whl
- Upload date:
- Size: 1.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90bf900b98c2b39a07a4060ca7122b1b9a3d21e9a14721a4057e2f38c00a9b23
|
|
| MD5 |
3a29095c6e4f0ee7c6fa349f72ee84f5
|
|
| BLAKE2b-256 |
6ac1a0ec88b9abf160d33c22b32a846b6c5029fbcb60366b31490b404a6f642e
|