pimd

Professional document publishing platform — Markdown, HTML, diagrams, equations, templates, EPUB, LaTeX, PDF/A, i18n, multi-format export

These details have not been verified by PyPI

Project links

Project description

PiMD

Professional document generation and publishing framework for Python.

PiMD (Python Markdown Publisher) converts Markdown, HTML, and documentation repositories into professional DOCX, EPUB, PDF/A, and LaTeX documents — books, reports, technical manuals, research papers, e-books, invoices, and more. It runs entirely offline with zero cloud dependencies.

What is PiMD?
Why PiMD?
Key Features
Quick Start
Architecture
Feature Showcase
EPUB Output
LaTeX Output
PDF/A Output
DOCX Quality
Diagram Support
Scientific Publishing
Internationalization (i18n)
Collaborative Editing
Templates
Plugin Ecosystem
Backend Integration
CLI Reference
Configuration
Performance
Project Structure
Roadmap
Contributing
License

What is PiMD?

PiMD is a document generation and publishing framework for Python. It takes structured text formats — Markdown, HTML, documentation repositories — and produces publish-ready DOCX output with professional typography, diagrams, equations, citations, cross-references, tables of contents, headers and footers.

Unlike simple Markdown-to-DOCX converters, PiMD provides:

A full document model — headings, paragraphs, code blocks, tables, lists, images, diagrams, equations, callouts, and footnotes are all first-class citizens
A plugin architecture and Extension SDK for custom renderers, parsers, and publishing pipelines
A template engine with inheritance, 10 preset templates, and full customization
Diagram rendering from Mermaid, PlantUML, Graphviz, D2, BlockDiag, Vega, BPMN, and ASCII art
Equation rendering with LaTeX, MathJax, KaTeX, and native Word OMML
Scientific publishing — cross-references, bibliography, equation numbering, figure numbering
Enterprise features — incremental builds, parallel processing, streaming, caching, safety guards, accessibility validation

PiMD is designed to be used both as a Python library (integrate into FastAPI, Flask, Django, or any Python application) and as a standalone CLI tool for batch processing, watch mode, and CI/CD pipelines.

Why PiMD?

vs Pandoc

	PiMD	Pandoc
Primary output	DOCX with professional-quality rendering	General-purpose document conversion
Templates	Built-in template engine with 10 presets	Template system via partials
Diagrams	Mermaid, PlantUML, Graphviz, D2, BlockDiag, Vega, BPMN, ASCII	None built-in
Equations	LaTeX, MathJax, KaTeX, native Word OMML	LaTeX via MathJax
Plugin system	Full plugin architecture with SDK	Filters and custom writers
Python API	First-class library API	Haket filters or shell
Accessibility	Built-in WCAG validation	None

PiMD focuses on producing publish-ready DOCX output with professional styling, diagrams, and equations — not general-purpose format conversion.

vs Sphinx

	PiMD	Sphinx
Input format	Markdown, HTML	reStructuredText (primary)
Output format	DOCX-focused, multi-format	HTML, PDF, ePub, LaTeX
Setup complexity	Zero — runs on any Markdown	Requires conf.py, roles, directives
API integration	Native Python library	Subprocess or extension hooks
Diagrams	12 renderers included	Via extensions
Learning curve	Minimal — standard Markdown	Requires RST expertise

PiMD is for teams that want to publish from standard Markdown without adopting a new markup language or complex build system.

vs MkDocs

	PiMD	MkDocs
Primary output	DOCX, PDF	HTML websites
Use case	Print publishing, reports, books	Technical documentation websites
Templates	DOCX-centric templates	HTML themes
Offline	Fully offline	Offline
Plugin system	9 plugin types	MkDocs plugins

MkDocs builds documentation websites; PiMD produces print-ready documents from the same Markdown source.

vs Traditional Markdown Converters

Most Markdown-to-DOCX converters are thin wrappers around python-docx with basic formatting. PiMD provides:

A full document model with typed blocks (heading, table, diagram, equation, code, callout, footnote)
Diagram rendering integrated into the conversion pipeline
Equation rendering with multiple backends
Template system with inheritance and configuration
Plugin architecture with hooks at every pipeline stage
Observability — metrics, profiling, execution reports
Safety guards — path traversal protection, input validation, resource limits

Key Features

Markdown Support

Full CommonMark + GitHub Flavored Markdown: headings, paragraphs, inline formatting, code blocks with syntax highlighting, tables with alignment, task lists, blockquotes, horizontal rules, footnotes, callouts, frontmatter metadata.

HTML Support

Converts HTML documents — inline styles, classes, semantic elements — preserving structure and formatting.

DOCX Generation

Publish-quality DOCX output with professional typography, A4 layout, configurable margins, headers/footers, page numbering, table of contents, cross-references, and automatic numbering.

Professional Templates

10 built-in template presets with configurable fonts, colors, page sizes, margins, headers, footers, watermarking, and cover pages.

Diagram Rendering

Mermaid, PlantUML, Graphviz, D2, BlockDiag (seqdiag, actdiag, nwdiag, packetdiag), Vega, BPMN, and ASCII art — all rendered inline in DOCX output.

Scientific Publishing

Cross-references, bibliography (APA, IEEE, MLA, Chicago, Harvard), equation numbering, figure numbering, table numbering, citing, and native Word equation (OMML) support.

Asset Management

Remote asset downloads with SHA256 caching, domain allowlisting, offline mode, MIME detection.

Repository Conversion

Convert MkDocs, Sphinx, Docusaurus, and Obsidian documentation repositories to DOCX.

Book Publishing

Compile multi-chapter books from configuration files with unified styling, consistent numbering, cross-chapter references, and generated table of contents.

Batch Processing

Convert entire directories of Markdown/HTML files with parallel processing, progress display, and error reporting.

Plugin System

9 plugin types (diagram, template, citation, renderer, exporter, asset, validation, parser, publishing) with entry-point discovery, lifecycle hooks, dependency management, and diagnostics.

Backend Integration

First-class Python API for FastAPI, Flask, Django, and any Python application. In-memory conversion, bytes output, streaming responses.

CLI Interface

40+ commands: conversion, diagrams, equations, templates, branding, reports, books, citations, merge, batch, validate, project, config, cache, jobs, profile, watch, build, accessibility, and more.

Enterprise Publishing

Incremental builds, parallel processing, streaming large files, caching with memory/filesystem/Redis backends, safety guards, observability with metrics and profiling, accessibility WCAG validation.

Multi-Format Export

DOCX, PDF, PDF/A, EPUB, LaTeX, HTML, Markdown, RTF, ODT, TXT — with a consistent public API for all formats.

Accessibility Validation

Built-in engine checks for WCAG 1.1.1 (alt text), 1.3.1 (table headers), 2.4.10 (heading hierarchy), 4.1.1 (structure), and generates markdown reports with scores.

Internationalization (i18n)

Full Unicode script detection (LTR, RTL, CJK), Arabic/Persian/Urdu/Hebrew reshaping and bidirectional support, Chinese/Japanese/Korean typography, and language-aware font/line-height configuration across all output formats.

Collaborative Editing

Revision tracking system with insertions, deletions, replacements, formatting changes, threaded comments, resolution workflow, and review metadata export API.

Quick Start

Installation

From source (current — pre-PyPI):

git clone https://github.com/devasishpal/PiMd.git
cd PiMd
pip install -e .                    # Core only
pip install -e ".[all]"             # Core + all optional features

From PyPI:

pip install pimd                      # Core only (CLI + basic conversion)
pip install "pimd[all]"               # Core + all runtime features
pip install "pimd[full]"              # Everything including dev tools

Optional extras can be combined individually:

pip install "pimd[diagrams]"       # Diagram rendering (Pillow)
pip install "pimd[equations]"      # Equation rendering (matplotlib)
pip install "pimd[export]"         # PDF export
pip install "pimd[citations]"      # BibTeX support
pip install "pimd[redis]"          # Redis cache backend
pip install "pimd[profiling]"      # Performance profiling

All core dependencies (markdown-it-py, mdit-py-plugins, python-docx, beautifulsoup4, lxml, typer, rich, pyyaml) auto-install with any variant — no manual steps needed.

Python API

from pimd import PiMD

engine = PiMD()

# Convert file to file
engine.md_to_docx("input.md", "output.docx")

# Convert string to bytes (no filesystem writes)
docx_bytes = engine.md_text_to_docx_bytes("# Hello World")

# Convert HTML to DOCX
engine.html_to_docx("page.html", "page.docx")

# Convert with options
engine.md_to_docx(
    "report.md",
    "report.docx",
    generate_toc=True,
    page_numbers=True,
    title="Annual Report",
    author="Jane Smith",
)

CLI

# Convert Markdown to DOCX
pimd md guide.md guide.docx

# Convert HTML to DOCX
pimd html page.html page.docx

# Convert Markdown to EPUB 3.2
pimd epub book.md book.epub

# Convert Markdown to LaTeX
pimd latex paper.md paper.tex

# Export to PDF/A archival format
pimd export pdfa report.md report.pdf

# List available templates
pimd template list

# Generate a report
pimd report generate executive

# Detect language script direction
pimd language input.md

# Track document revisions
pimd revision init --id doc-1 --title "Report"

# Watch a directory for changes
pimd watch ./docs --output ./build

# Check version
pimd --version

Backend Server

FastAPI:

from fastapi import FastAPI, File, UploadFile, Form
from fastapi.responses import Response
from pimd import PiMD

app = FastAPI()
engine = PiMD()

@app.post("/convert")
async def convert(file: UploadFile = File(...)) -> Response:
    content = await file.read()
    docx_bytes = engine.md_text_to_docx_bytes(content.decode())
    return Response(
        content=docx_bytes,
        media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    )

See examples/fastapi_app.py, examples/flask_example.py, examples/django_example.py for complete integration examples.

Architecture

PiMD follows a layered architecture with clear separation of concerns:

┌─────────────────────────────────────────────────────┐
│                    CLI Layer                        │
│   pimd md, pimd html, pimd build, pimd watch ...    │
├─────────────────────────────────────────────────────┤
│                   API Layer                         │
│              PiMD class, convert()                  │
├─────────────────────────────────────────────────────┤
│                Service Layer                        │
│  ConversionService, DocumentService, TemplateService│
├─────────────────────────────────────────────────────┤
│               Pipeline Layer                        │
│     Parse → Transform → Render → Export + Hooks     │
├─────────────────────────────────────────────────────┤
│  Parsers  │  Renderers  │  Engines  │  Plugins      │
│  Markdown │  DOCX       │  Diagram  │  Conversion   │
│  HTML     │  PDF        │  Equation │  Lifecycle    │
│           │  HTML       │  Template │  Extension    │
│           │  TXT        │  Citation │               │
├─────────────────────────────────────────────────────┤
│              Domain Model Layer                     │
│    Document, Block, Span, Heading, Table, Image...  │
├─────────────────────────────────────────────────────┤
│           Infrastructure Layer                      │
│  Cache (memory/fs/redis) │ Safety │ Observability   │
│  Config │ Incremental │ Parallel │ Streaming        │
└─────────────────────────────────────────────────────┘

Parsers

Two built-in parsers convert source text into PiMD's document model:

MarkdownParser — CommonMark + GFM with extensions for frontmatter, footnotes, callouts, diagrams, equations, cross-references
HTMLParser — HTML to document model via BeautifulSoup with structure preservation

Each parser implements a parse(text: str) -> Document interface. Custom parsers can be registered via the plugin system.

Document Model

The document model (pimd.models) is a typed hierarchy of block-level elements:

Document
├── Heading (level, text, id)
├── Paragraph (spans with bold/italic/code/links/images)
├── CodeBlock (language, code)
├── Table (headers, rows, alignment)
├── OrderedList / BulletList (items with nested children)
├── Image (url, alt, width, height)
├── Diagram (language, source, png_bytes, svg_bytes)
├── EquationBlock (latex, omml, svg)
├── HorizontalRule
├── Callout (type, title, blocks)
├── Footnote (ref_id, text)
└── Blockquote (blocks)

This model is the single source of truth that flows through the pipeline. Every renderer, plugin, and transformation operates on it.

Renderers

DOCX Renderer (primary) — produces publish-quality DOCX via python-docx with full styling, TOC, headers/footers, watermarks
HTML Renderer — generates HTML output
PDF — via DOCX-to-PDF conversion (weasyprint on Linux/Mac, docx2pdf on Windows)
EPUB, LaTeX, PPTX — stub architecture for future releases

Publishing Engine

The publishing layer orchestrates multi-part documents:

Book Compiler — compiles chapters from config, applies consistent templates, generates TOC
Report Engine — generates structured reports from templates (executive, technical, research, compliance, architecture, etc.)
Template Engine — manages template presets, inheritance chains, config merging

Diagram Engine

Diagram rendering is fully integrated into the conversion pipeline. Code blocks with recognized language hints are automatically detected and rendered:

Auto-detection of diagram languages from code block content
12 built-in renderers with availability detection
Plugin architecture for custom renderers
Caching with memory, filesystem, and Redis backends
Automatic fallback — if a renderer is unavailable, the pipeline continues

Equation Engine

Equations are rendered inline and display-mode from LaTeX syntax:

Detects $...$ and $$...$$ in Markdown and HTML
Multiple rendering backends (matplotlib, MathJax-based)
Native Word OMML output for perfect DOCX rendering
Equation numbering and cross-references
Chemical formula support
Caching with configurable TTL

Template Engine

Templates control the visual output of documents:

10 presets — professional, academic, technical, business, book, proposal, invoice, resume, manual, API
Inheritance — templates can extend parent templates with overrides
Config — page size, margins, fonts, colors, headers, footers, TOC, cover pages, watermarks
Validation — built-in template validation
Custom — create new templates with JSON configuration

Plugin System

The plugin system (pimd.plugins) provides hooks at every stage of the conversion pipeline:

Conversion hooks: BEFORE_PARSE, AFTER_PARSE, BEFORE_RENDER, AFTER_RENDER, BEFORE_CONVERT, AFTER_CONVERT
Plugin types: diagram, template, citation, renderer, exporter, asset, validation, parser, publishing
SDK: pimd.sdk provides BasePlugin with typed subclasses for each plugin type
Discovery: plugins can be discovered via entry points (pimd.plugins group) or filesystem
Lifecycle: on_install, on_uninstall, on_enable, on_disable hooks
Dependencies: plugins can declare dependencies on other plugins

Service Layer

ConversionService orchestrates the full pipeline with:

Input validation (safety checks, file size, nesting depth)
Cache lookups (memory/filesystem/Redis)
Plugin hook dispatch at every stage
Diagram and equation processing
Statistics collection
Metrics and observability (parse time, render time, total time, block counts)
Error handling with graceful degradation

Feature Showcase

Markdown → DOCX

sample.md:

# PiMD Sample Document

This is a sample Markdown file used for testing PiMD conversion.

## Features

- Paragraphs with **bold** and *italic* text
- [Links](https://example.com)
- Lists (ordered and unordered)
- Code blocks

pimd md sample.md output.docx

HTML → DOCX

sample.html:

<!DOCTYPE html>
<html lang="en">
<head><title>PiMD Sample Document</title></head>
<body>
    <h1>PiMD Sample Document</h1>
    <p>This is a sample HTML file used for testing PiMD conversion.</p>
</body>
</html>

pimd html sample.html output.docx

Diagrams

## Architecture Overview

```mermaid
graph TD
    A[Client] --> B[Load Balancer]
    B --> C[Server 1]
    B --> D[Server 2]


PiMD detects the `mermaid` language hint and renders the diagram as an embedded image in the DOCX output.

### Equations

```markdown
The quadratic formula:

$$ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} $$

Inline equation: $E = mc^2$

Equations are rendered as native Word OMML or SVG depending on the backend.

Templates

# List available templates
pimd template list

# Convert with a specific template
pimd md report.md report.docx --template academic

# Get template details
pimd template info academic

Books

pimd book compile book-config.json output.docx

Where book-config.json defines chapters, styling, and metadata.

Reports

# List available report types
pimd report list-types

# Generate a report
pimd report generate executive --title "Q4 Review" --output report.docx

Repository Conversion

# Convert an MkDocs project
pimd repo ./docs --output documentation.docx

# Convert with frontmatter handling
pimd frontmatter extract ./post.md

EPUB Output

PiMD includes a complete EPUB 3.2 renderer that produces valid EPUB packages from the document model.

Features

EPUB 3.2 compliance — valid OPF, NCX, nav.xhtml, XHTML content, CSS
Reflowable layout — single-column, device-agnostic reading experience
Table of Contents — auto-generated from heading hierarchy (NCX + nav)
Cover page — configurable cover with title, author, and image
Chapters — automatic chapter splitting from h1/h2 headings
Embedded assets — images, diagrams, and equations embedded inline
Custom CSS — configure typography, colors, and layout
Validation — built-in validate_epub() checks package structure and well-formedness

CLI Usage

# Convert Markdown to EPUB
pimd epub guide.md guide.epub

# With metadata and custom CSS
pimd epub report.md report.epub --title "Annual Report" --author "Jane Smith" --css custom.css

# Validate after generation
pimd epub book.md book.epub --validate

Python API

from pimd import PiMD
from pimd.export.formats.epub import EpubRenderer
from pimd.converters.markdown import MarkdownConverter

engine = PiMD()
engine.convert("input.md", "epub", "output.epub")

# Or use the renderer directly
converter = MarkdownConverter()
doc = converter.parse_text(open("input.md").read())
renderer = EpubRenderer()
renderer.render(doc, "output.epub", title="My Book", author="Jane Smith")

LaTeX Output

PiMD generates clean, readable LaTeX suitable for compilation with pdflatex, xelatex, or lualatex.

Features

Document classes — article, report, book with appropriate structure
Headings — section, subsection, subsubsection, paragraph
Tables — tabular environment with booktabs styling
Code blocks — listings package with language-specific formatting
Math expressions — inline $...$ and display equation*/equation
Images — graphicx with figure environment and captions
Hyperlinks — hyperref with link colors
Citations — biblatex with APA style preamble
Cross-references — label/ref support in generated output

CLI Usage

# Convert Markdown to LaTeX
pimd latex paper.md paper.tex

# With document class and TOC
pimd latex thesis.md thesis.tex --class book --toc --title "My Thesis" --author "John Doe"

Python API

from pimd import PiMD
from pimd.export.formats.latex import LatexRenderer

engine = PiMD()
engine.convert("input.md", "latex", "output.tex")

# Or use the renderer directly
renderer = LatexRenderer()
renderer.render(document, "output.tex", title="Paper", document_class="article")

PDF/A Output

PiMD supports archival PDF/A generation for long-term document preservation.

Features

PDF/A-1b and PDF/A-2b conformance levels
Font embedding — automatic font embedding for faithful rendering
Metadata preservation — title, author, subject carried through
Automatic fallback — LibreOffice PDF/A filter → fpdf2 → standard PDF
Standards-compliant — ISO 19005 archival format

CLI Usage

# Export to PDF/A
pimd export pdfa report.md output.pdf

# Specify conformance level
pimd export pdfa archive.md archive.pdf --level 1b

Python API

from pimd import PiMD
from pimd.export.pdf import convert_to_pdfa

engine = PiMD()
engine.convert("input.md", "pdfa", "output.pdf")

PDF/A Doctor

pimd export doctor

Internationalization (i18n)

PiMD provides comprehensive internationalization support across all output formats.

Script Detection

Automatic detection of text direction and script type:

Script	Direction	Languages
LTR	Left-to-right	English, French, German, Spanish, etc.
RTL	Right-to-left	Arabic (ar), Persian/Farsi (fa), Urdu (ur), Hebrew (he)
CJK	Mixed	Chinese (zh), Japanese (ja), Korean (ko)

Features

Unicode script detection — detect_script() classifies text as LTR, RTL, CJK, or neutral
Language-aware typography — get_language_config() provides font, size, line height for 15+ languages
Arabic reshaping — reshape_arabic() uses arabic_reshaper for proper Arabic glyph rendering
Bidirectional text — apply_bidi() uses bidi algorithm for mixed LTR/RTL text
DOCX i18n — configure_docx_for_language() sets document direction, fonts, and RTL properties
EPUB i18n — configure_epub_for_language() generates language-specific CSS with direction and font-family
LaTeX i18n — configure_latex_for_language() adds babel, ctex, or luatexja packages

CLI Usage

# Detect script direction of a document
pimd language input.md

Python API

from pimd.i18n import (
    detect_script, ScriptType,
    is_rtl_language, is_cjk_language,
    get_language_config, process_text_for_language,
)

script = detect_script("مرحبا بالعالم")  # ScriptType.RTL
is_rtl = is_rtl_language("ar")            # True
config = get_language_config("zh-CN")     # CJK config with appropriate font
text = process_text_for_language("Arabic text", "ar")

RTL Support by Output Format

Format	RTL Support
DOCX	Direction set via `w:bidi` on sections and paragraphs
EPUB	`direction: rtl` + `unicode-bidi: embed` in CSS
LaTeX	`babel` package with arabic/hebrew language support
PDF/A	Font embedding preserves Unicode glyphs

Collaborative Editing

PiMD provides a document revision model for tracking changes, comments, and annotations — enabling future collaborative workflows.

Revision Model

The RevisionTracker manages all tracked changes:

Insertions — new text added at a position
Deletions — text removed from a position
Replacements — text swapped for new content
Formatting changes — style modifications

Comment System

Comments support threading, resolution workflow, and full metadata:

Threaded replies — parent/child comment relationships
Resolution tracking — mark comments resolved with attribution
Position annotations — comment on specific text ranges

Review Metadata

Reviewers — list of assigned reviewers
Status tracking — draft, in_review, approved, rejected
Timestamps — creation, updates, resolution dates

CLI Usage

# Initialize a revision tracker
pimd revision init --id doc-123 --title "Annual Report"

# Add a tracked revision
pimd revision add insertion --author "alice" --desc "Added executive summary"

# List tracked revisions
pimd revision list

# Filter by status
pimd revision list --status pending

Python API

from pimd.revisions import RevisionTracker, RevisionType, RevisionStatus

tracker = RevisionTracker(document_id="doc-123", title="Report")

# Add revisions
tracker.add_revision(
    revision_type=RevisionType.INSERTION,
    author="alice",
    start_pos=42,
    end_pos=42,
    new_text="New content here",
    description="Added paragraph",
)

# Add comments
tracker.add_comment(
    author="bob",
    text="Please review this section",
    start_pos=10,
    end_pos=200,
)

# Export review summary
summary = tracker.export_review_summary()
print(f"Revisions: {summary['revisions']['total']}")
print(f"Comments: {summary['comments']['total']}")

DOCX Quality

PiMD produces publication-quality DOCX output:

Feature	Description
Layout	A4 (default), Letter, or custom page size
Margins	Normal (2.54 cm), Narrow (1.27 cm), or custom
Typography	Professional fonts (Calibri default), configurable heading/body fonts
Font sizes	Configurable per heading level and body text
Line spacing	Configurable (default 1.15)
Paragraph spacing	Configurable (default 6 pt after)
Headers	Configurable header text per section
Footers	Configurable footer text, page numbering
Table of Contents	Auto-generated from heading hierarchy
Cross-references	Heading references, figure references, table references
Numbering	Heading numbering, figure numbering, table numbering, equation numbering
Citations	APA, IEEE, MLA, Chicago, Harvard with full bibliography
Cover pages	Configurable cover page with title, author, date
Watermarks	Configurable text watermarking
Images	Embedded with alt text, sizing, positioning
Diagrams	High-resolution embedded PNG/SVG
Equations	Native Word OMML for perfect rendering
Tables	Formatted with headers, borders, alignment
Code blocks	Monospace font, syntax-highlighted (via formatting)
Hyperlinks	Clickable links preserved
Accessibility	Alt text, heading hierarchy, table headers

Diagram Support

PiMD includes a universal diagram engine with built-in renderers for:

Renderer	Language hint	Auto-detect	Output
Mermaid	`mermaid`	Yes	PNG, SVG
PlantUML	`plantuml`	Yes	PNG, SVG
Graphviz	`dot`, `graphviz`	Yes	PNG, SVG
D2	`d2`	Yes	PNG, SVG
BlockDiag	`blockdiag`	Yes	PNG
SeqDiag	`seqdiag`	Yes	PNG
ActDiag	`actdiag`	Yes	PNG
NwDiag	`nwdiag`	Yes	PNG
PacketDiag	`packetdiag`	Yes	PNG
Vega	`vega`	Yes	PNG
BPMN	`bpmn`	Yes	PNG
ASCII Art	`ascii`	Yes	SVG

Auto-Detection

When a code block has no language hint, PiMD inspects the content for box-drawing characters, connector patterns, and structural indicators to automatically detect diagram types.

Custom Renderers

from pimd.sdk import DiagramPlugin

class MyDiagramRenderer(DiagramPlugin):
    name = "my_renderer"
    version = "2.1.0"

    def render(self, source: str, language: str, **kwargs):
        # Return RenderResult with png_bytes and/or svg_bytes
        ...

from pimd.diagrams import register_diagram_renderer
register_diagram_renderer(MyDiagramRenderer())

Architecture

CodeBlock (language="mermaid")
    │
    ▼
Diagram Engine
    │
    ├─ Auto-detect language (if not specified)
    ├─ Find registered renderer
    ├─ Check cache
    ├─ Render (subprocess or library)
    ├─ Cache result
    └─ Return Diagram block
    │
    ▼
DOCX Renderer (embeds PNG/SVG in document)

Scientific Publishing

PiMD provides comprehensive support for scientific and technical document publishing.

Equation Rendering

The Fourier transform is defined as:

$$ \hat{f}(\omega) = \int_{-\infty}^{\infty} f(t) e^{-i\omega t} \, dt $$

Where $f(t)$ is a continuous function and $\omega$ is angular frequency.

LaTeX syntax — standard $...$ inline and $$...$$ display
Multiple backends — Content MathML, matplotlib rendering, LaTeX-based
Native OMML — equations are converted to native Word OMML for perfect inline rendering with Word's equation engine
SVG fallback — when OMML conversion is unavailable, equations are rendered as SVG images
Equation numbering — automatic (1), (2) numbering with \label{eq:foo} and \ref{eq:foo} cross-references
Chemical formulas — support for chemical notation via \ce{} syntax (mhchem-compatible)

Bibliography

from pimd import CitationEngine, CitationStyle

engine = CitationEngine()
engine.load_bibtex("references.bib")
citations = engine.cite("Einstein1915", style=CitationStyle.APA)
bibliography = engine.bibliography(style=CitationStyle.IEEE)

Supported citation styles:

APA (American Psychological Association)
IEEE (Institute of Electrical and Electronics Engineers)
MLA (Modern Language Association)
Chicago (Chicago Manual of Style)
Harvard (Harvard referencing)

Cross-References

As shown in @fig:architecture, the system...

See @tbl:results for experimental data.

Equation @eq:fourier describes the transform.

Cross-references to figures, tables, equations, and headings are resolved during rendering and converted to Word cross-reference fields.

Templates

PiMD includes 10 professionally designed template presets:

Template	Best for	Config highlights
Professional	General business documents	A4, Calibri, 11pt, TOC optional
Academic	Research papers, theses	A4, Times New Roman, 12pt, TOC, line numbers
Technical	Manuals, specifications	A4, Calibri, 10pt, TOC, page numbers
Business	Proposals, reports	A4, Calibri, 11pt, TOC, cover page
Book	Full-length publications	A4, Calibri, 11pt, TOC, chapters, headers
Proposal	Business proposals	A4, Calibri, 11pt, cover page, TOC
Invoice	Billing documents	A4, Calibri, 10pt, header/footer
Resume	CVs and resumes	A4, Calibri, 10pt, compact margins
Manual	Product documentation	A4, Calibri, 10pt, TOC, numbered headings
API	API documentation	A4, Calibri, 10pt, TOC, code blocks

Template Inheritance

Templates support inheritance chains. A child template can override specific settings while inheriting the rest from its parent:

{
    "metadata": {
        "name": "my_academic",
        "tags": ["base_academic"]
    },
    "config": {
        "default_font": "Georgia",
        "line_spacing": 1.5
    }
}

Custom Templates

Create a template directory with a template.json:

{
    "metadata": {
        "name": "custom",
        "type": "custom",
        "version": "2.1.0",
        "author": "Your Name",
        "description": "My custom template"
    },
    "config": {
        "page_size": "A4",
        "default_font": "Calibri",
        "default_font_size": 11,
        "generate_toc": true,
        "cover_page": true
    }
}

Templates are discovered from ~/.pimd/templates/ and project-local templates/ directories.

Plugin Ecosystem

PiMD has a first-class plugin system designed for extensibility.

Plugin Types

Type	Base Class	Purpose
Diagram	`DiagramPlugin`	Custom diagram renderers
Template	`TemplatePlugin`	Custom template loaders, transformations
Citation	`CitationPlugin`	Custom citation styles, bibliography formats
Renderer	`RendererPlugin`	Custom output renderers (e.g., EPUB, LaTeX)
Exporter	`ExporterPlugin`	Custom export formats
Asset	`AssetPlugin`	Custom asset handlers (images, fonts, etc.)
Validation	`ValidationPlugin`	Custom validation rules
Parser	`ParserPlugin`	Custom input parsers
Publishing	`PublishingPlugin`	Custom publishing pipelines

Plugin Lifecycle

Install → Register → Enable → Dispatch → Disable → Uninstall

Each plugin receives hook calls at every lifecycle stage.

Discovery

Plugins can be discovered via:

Entry points — pimd.plugins group in pyproject.toml:

[project.entry-points."pimd.plugins"]
my_plugin = "my_package.plugin:MyPlugin"

Filesystem — Python files in ~/.pimd/plugins/ or project-local plugins/ directory
Programmatic — PluginManager.register()

SDK

The Extension SDK (pimd.sdk) provides:

Base classes — BasePlugin with typed subclasses for each plugin type
Pre-defined hooks — methods that correspond to pipeline stages
Lifecycle hooks — on_install, on_uninstall, on_enable, on_disable
Event system — Event, EventBus for decoupled plugin communication
Hook registry — HookRegistry for managing lifecycle hooks

CLI

# List installed plugins
pimd plugin list

# Install a plugin
pimd plugin install my-plugin

# Enable/disable
pimd plugin enable my-plugin
pimd plugin disable my-plugin

# Run diagnostics
pimd plugin doctor

Backend Integration

PiMD's library-first design makes it straightforward to integrate into web frameworks.

FastAPI

from fastapi import FastAPI, File, UploadFile
from fastapi.responses import Response
from pimd import PiMD

app = FastAPI()
engine = PiMD()

@app.post("/markdown")
async def convert_markdown(file: UploadFile = File(...)) -> Response:
    content = await file.read()
    docx_bytes = engine.md_text_to_docx_bytes(content.decode("utf-8"))
    return Response(
        content=docx_bytes,
        media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
        headers={
            "Content-Disposition": f'attachment; filename="output.docx"'
        },
    )

See examples/fastapi_app.py for a complete example with streaming and form-encoded endpoints.

Flask

from flask import Flask, Response, request
from pimd import PiMD

app = Flask(__name__)
engine = PiMD()

@app.route("/markdown", methods=["POST"])
def convert_markdown():
    text = request.form.get("text", "")
    docx_bytes = engine.md_text_to_docx_bytes(text)
    return Response(
        docx_bytes,
        mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
        headers={"Content-Disposition": "attachment; filename=output.docx"},
    )

See examples/flask_example.py for a complete example.

Django

from django.http import HttpRequest, HttpResponse
from pimd import PiMD

engine = PiMD()

def convert_markdown(request: HttpRequest) -> HttpResponse:
    text = request.POST.get("text", "")
    docx_bytes = engine.md_text_to_docx_bytes(text)
    return HttpResponse(
        docx_bytes,
        content_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
        headers={"Content-Disposition": 'attachment; filename="output.docx"'},
    )

See examples/django_example.py for a complete example.

In-Memory Conversion

from pimd import PiMD

engine = PiMD()

# Convert text to bytes — no filesystem writes
docx_bytes = engine.md_text_to_docx_bytes("# Hello World")

# Convert HTML text to bytes
docx_bytes = engine.html_text_to_docx_bytes("<h1>Hello</h1>")

CLI Reference

Conversion Commands

Command	Description
`pimd md <input> <output>`	Convert Markdown → DOCX
`pimd html <input> <output>`	Convert HTML → DOCX
`pimd epub <input> <output>`	Convert Markdown → EPUB 3.2
`pimd latex <input> <output>`	Convert Markdown → LaTeX
`pimd export docx <input> <output>`	Export to DOCX
`pimd export pdf <input> <output>`	Export to PDF
`pimd export pdfa <input> <output>`	Export to PDF/A archival format
`pimd export epub <input> <output>`	Export to EPUB via unified system
`pimd export latex <input> <output>`	Export to LaTeX via unified system
`pimd export html <input> <output>`	Export to HTML
`pimd batch <input> <output>`	Batch convert directory
`pimd watch <dir>`	Watch directory for changes
`pimd build <config>`	Build multi-file project

Diagram Commands

Command	Description
`pimd diagrams list`	List available renderers
`pimd diagrams test <lang>`	Test a renderer
`pimd diagrams cache-clear`	Clear diagram cache
`pimd diagrams doctor`	Diagnose diagram tools

Equation Commands

Command	Description
`pimd equations list`	List equation formats
`pimd equations test <latex>`	Test equation rendering

Template Commands

Command	Description
`pimd template list`	List templates
`pimd template info <name>`	Show template details
`pimd template validate <name>`	Validate template

Plugin Commands

Command	Description
`pimd plugin list`	List installed plugins
`pimd plugin install <name>`	Install a plugin
`pimd plugin enable <name>`	Enable a plugin
`pimd plugin disable <name>`	Disable a plugin
`pimd plugin doctor`	Run plugin diagnostics

Report Commands

Command	Description
`pimd report generate <type>`	Generate a report
`pimd report list-types`	List report types

Config Commands

Command	Description
`pimd config show`	Show resolved configuration
`pimd config path`	Show config file locations
`pimd config init`	Generate default config file
`pimd config validate`	Validate configuration

Cache Commands

Command	Description
`pimd cache clear`	Clear all caches
`pimd cache status`	Show cache backend status
`pimd cache info`	Show cache diagnostics

Book Commands

Command	Description
`pimd book compile <config> <output>`	Compile a book

Citation Commands

Command	Description
`pimd citations load <bibtex>`	Load BibTeX file
`pimd citations bibliography`	Generate bibliography

Diagnostics

Command	Description
`pimd --version`	Show version
`pimd version`	Show detailed version + system info
`pimd info`	Show system information
`pimd doctor`	Run system diagnostics
`pimd flavor <file>`	Detect Markdown flavor
`pimd profile run <input>`	Profile a conversion

Other Commands

Command	Description
`pimd merge <files> <output>`	Merge multiple documents
`pimd validate <input>`	Validate a document
`pimd frontmatter extract <input>`	Extract frontmatter
`pimd frontmatter strip <input>`	Strip frontmatter
`pimd analyze <input>`	Analyze document structure
`pimd repo <input> <output>`	Convert documentation repo
`pimd language <input>`	Detect script direction (LTR/RTL/CJK)
`pimd accessibility check <input>`	Check accessibility
`pimd accessibility report <input> <output>`	Generate accessibility report
`pimd revision init`	Initialize revision tracker
`pimd revision add <type>`	Add a tracked revision
`pimd revision list`	List tracked revisions

Configuration

PiMD supports three levels of configuration with priority resolution:

Runtime options (highest priority)
    ↓
Project config (.pimdconfig)
    ↓
Global config (~/.pimd/config.toml)
    ↓
Built-in defaults (lowest priority)

Global Configuration

~/.pimd/config.toml:

[defaults]
theme = "professional"
author = "Your Name"
company = "Your Company"
language = "en-US"
page_size = "A4"
default_font = "Calibri"

[conversion]
generate_toc = true
page_numbers = true
continue_on_error = true

[cache]
enabled = true
backend = "memory"  # memory, filesystem, redis

[security]
max_input_size_mb = 100
max_nesting_depth = 100

Project Configuration

.pimdconfig in the project root directory — follows the same format as global config but overrides it.

Environment Variables

All configuration keys can be set via environment variables with the PIMD_ prefix:

export PIMD_DEFAULTS_THEME=academic
export PIMD_CONVERSION_GENERATE_TOC=true
export PIMD_CACHE_BACKEND=filesystem
export PIMD_SECURITY_MAX_INPUT_SIZE_MB=500

Environment variables have the highest priority, overriding all other configuration sources.

CLI Configuration

# Show resolved configuration
pimd config show

# Generate a default config file
pimd config init

# Validate configuration
pimd config validate

# Show config file locations
pimd config path

Performance

Caching

PiMD has a unified caching architecture with three backends:

Backend	Storage	TTL	Best for
Memory	In-process dict	Configurable	Single-server deployments
Filesystem	SHA256-keyed files	Configurable	Large caches, persistent across restarts
Redis	Remote key-value store	Configurable	Distributed deployments

from pimd import PiMD
from pimd.caching import MemoryCache, FileSystemCache

# In-memory cache (default)
engine = PiMD(cache=MemoryCache(default_ttl=300))

# Filesystem cache
engine = PiMD(cache=FileSystemCache(cache_dir="./.cache"))

Cache keys are content-addressable (SHA256 of input + options), so identical conversions reuse cached results automatically.

Parallel Processing

PiMD supports parallel processing for batch conversions and diagram rendering:

from pimd.parallel import ThreadExecutor, ProcessExecutor

# Thread-based parallel execution
with ThreadExecutor(max_workers=4) as executor:
    results = executor.map(convert_file, files)

# Process-based parallel execution
with ProcessExecutor(max_workers=4) as executor:
    results = executor.map(convert_file, files)

Large Document Support

Streaming — StreamingMarkdownReader processes files in chunks without loading the entire file into memory
Large files — SafetyGuard imposes configurable limits (default 100 MB input, 500 MB file)
Incremental builds — IncrementalBuildTracker uses content hashing to skip unchanged files

Repository Conversion

For large documentation repositories, PiMD provides:

Tree walking — recursive discovery of all Markdown/HTML files
Parallel processing — files are converted concurrently
Incremental builds — only changed files are reconverted
Watch mode — pimd watch rebuilds on file changes

Project Structure

pimd/
├── src/
│   └── pimd/
│       ├── __init__.py          # Public API (158 symbols)
│       ├── __main__.py          # python -m pimd entry point
│       ├── models.py            # Document model
│       ├── exceptions.py        # Exception hierarchy
│       ├── recovery.py          # Graceful failure recovery
│       ├── api/                 # PiMD public API class
│       ├── cli/                 # Typer CLI (40+ commands)
│       ├── parsers/             # Markdown and HTML parsers
│       ├── renderers/           # DOCX and HTML renderers
│       ├── converters/          # Convenience converters
│       ├── services/            # Orchestration service
│       ├── pipeline/            # Pipeline stages
│       ├── diagrams/            # Universal diagram engine
│       │   └── renderers/       # 12 built-in renderers
│       ├── equations/           # Equation engine
│       ├── templates/           # Template engine + 10 presets
│       ├── plugins/             # Plugin system
│       ├── sdk/                 # Extension SDK
│       ├── cache/               # Cache framework
│       ├── config/              # Configuration system
│       ├── observability/       # Metrics, profiling, reports
│       ├── safety/              # Security guards
│       ├── branding/            # Branding manager
│       ├── reports/             # Report engine
│       ├── books/               # Book compiler
│       ├── citations/           # Citation engine (5 styles)
│       ├── references/          # Cross-reference system
│       ├── accessibility/       # WCAG validation engine
│       ├── remote_assets/       # Remote asset management
│       ├── blocks/              # Content block library
│       ├── streaming/           # Large file streaming
│       ├── incremental/         # Incremental build tracker
│       ├── parallel/            # Parallel execution
│               ├── export/              # Export engine (EPUB, LaTeX, PDF/A, DOCX, PDF, HTML)
│       ├── batch/               # Batch processing
│       ├── project/             # Project converter
│       ├── compatibility/       # Ecosystem compatibility
│       ├── frontmatter/         # Frontmatter parsing
│       ├── callouts/            # Callout blocks
        │   ├── footnotes/           # Footnotes
        │   ├── attachments/         # Document attachments
        │   ├── profiles/            # Export profiles
        │   ├── jobs/                # Job system
        │   ├── themes/              # Theme system
        │   ├── i18n/                # Internationalization (RTL, CJK, Unicode)
        │   ├── revisions/           # Collaborative editing (revision tracking)
        │   ├── analyzer/            # Document analyzer
│       ├── repository/          # Repo conversion
│       ├── docusaurus/          # Docusaurus adapter
│       ├── mkdocs_/             # MkDocs adapter
│       ├── sphinx/              # Sphinx adapter
│       ├── obsidian/            # Obsidian adapter
│       └── github/              # GitHub Features adapter
├── tests/                       # 1100+ tests
├── benchmarks/                  # Benchmark suite
├── examples/                    # Integration examples
├── .github/workflows/           # CI/CD
├── CHANGELOG.md
├── CONTRIBUTING.md
├── SECURITY.md
├── SUPPORT.md
├── ROADMAP.md
└── README.md

Roadmap

v2.1 ✅ (Released)

✅ EPUB output — Full EPUB 3.2 renderer implemented
✅ LaTeX output — Full LaTeX renderer implemented
✅ PDF/A — Archival PDF output (PDF/A-1b, PDF/A-2b)
✅ i18n — Internationalization with RTL/CJK/Unicode support
✅ Collaborative editing — Revision tracking, comments, annotations

v2.2

Web API — RESTful API server for document conversion
Documentation site — Full documentation website at pimd.ai
Plugin marketplace — Registry of community plugins
Presentation output (PPTX) — PowerPoint rendering

v3.0

Plugin marketplace — Package index for community plugins
Distributed builds — Remote build workers
Webhooks — Event-driven build pipelines
Collaborative editing UI — Track changes visualization in outputs

Long-term: PiMD aims to be the standard Python framework for programmatic document generation — reliable, extensible, and production-ready for any publishing workflow.

Contributing

PiMD is an open-source project and welcomes contributions of all kinds.

How to contribute

Report bugs — Open a GitHub issue with a minimal reproduction
Suggest features — Describe the use case and desired behavior
Submit pull requests — Code changes, documentation, tests
Write plugins — Build on the Extension SDK

Development setup

git clone https://github.com/devasishpal/PiMd.git
cd PiMd
pip install -e ".[dev,all]"

Running tests

py -m pytest tests/ -v

Code style

ruff check src/ tests/ benchmarks/

Pull request process

Fork the repository
Create a feature branch
Write tests for your changes
Run the full test suite
Run ruff lint
Submit a pull request with a clear description

See CONTRIBUTING.md for detailed guidelines, including plugin development documentation.

License

PiMD is released under the MIT License.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.1.0

Jun 5, 2026

2.0.0

Jun 4, 2026

1.1.0

Jun 4, 2026

1.0.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pimd-2.1.0.tar.gz (442.5 kB view details)

Uploaded Jun 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pimd-2.1.0-py3-none-any.whl (289.7 kB view details)

Uploaded Jun 5, 2026 Python 3

File details

Details for the file pimd-2.1.0.tar.gz.

File metadata

Download URL: pimd-2.1.0.tar.gz
Upload date: Jun 5, 2026
Size: 442.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pimd-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7248aabc8f9e4976609cf39ead21f1b4520b8e60611c5c47c446d50fb5d719ff`
MD5	`b1a3b182905cb1fd8ea253a7094ad07b`
BLAKE2b-256	`b6649e93b2cc0505188c142931be3b178e87fa3671856d2e68abbf778accb3c2`

See more details on using hashes here.

File details

Details for the file pimd-2.1.0-py3-none-any.whl.

File metadata

Download URL: pimd-2.1.0-py3-none-any.whl
Upload date: Jun 5, 2026
Size: 289.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pimd-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3e2979aafa762b8d401f9535b085a2694af176ce9c316e792502ae4836be277`
MD5	`6abe1a0b1f864944612a7525e7b9134d`
BLAKE2b-256	`64d2264c297dbdbb03759a658821765d7dfb2e9b760e8a21462ae4845b26107f`

See more details on using hashes here.

pimd 2.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is PiMD?

Why PiMD?

vs Pandoc

vs Sphinx

vs MkDocs

vs Traditional Markdown Converters

Key Features

Markdown Support

HTML Support

DOCX Generation

Professional Templates

Diagram Rendering

Scientific Publishing

Asset Management

Repository Conversion

Book Publishing

Batch Processing

Plugin System

Backend Integration

CLI Interface

Enterprise Publishing

Multi-Format Export

Accessibility Validation

Internationalization (i18n)

Collaborative Editing

Quick Start

Installation

Python API

CLI

Backend Server

Architecture

Parsers

Document Model

Renderers

Publishing Engine

Diagram Engine

Equation Engine

Template Engine

Plugin System

Service Layer

Feature Showcase

Markdown → DOCX

HTML → DOCX

Diagrams

Templates

Books

Reports

Repository Conversion

EPUB Output

Features

CLI Usage

Python API

LaTeX Output

Features

CLI Usage

Python API

PDF/A Output

Features

CLI Usage

Python API

PDF/A Doctor

Internationalization (i18n)

Script Detection

Features

CLI Usage

Python API

RTL Support by Output Format

Collaborative Editing

Revision Model

Comment System