Convert AI-generated Markdown textbooks to polished DOCX with native math equations and syntax-highlighted code
Project description
markdocx
A Markdown-to-Word converter built for AI-generated textbooks
Convert Markdown files — complete with LaTeX math, syntax-highlighted code, tables, and images — into polished .docx documents in one command.
Why This Exists
Large language models (ChatGPT, Claude, Gemini, …) produce great Markdown, but the journey from .md to a well-formatted Word document is painful:
- LaTeX formulas become plain text or broken images
- Code blocks lose their highlighting
- Tables, lists, and blockquotes need manual reformatting
markdocx bridges that gap. Feed it a Markdown file that follows a few simple rules and get a publication-ready .docx — math rendered as native Word OMML equations, code with VS Code–style colors, diagrams rendered as images, and everything else properly formatted.
Features
| Category | What you get |
|---|---|
| Math | Inline ($...$) and display ($$...$$) LaTeX → native OMML equations in Word |
| Code | 30+ languages with Pygments syntax highlighting, VS Code light theme, language labels |
| Tables | Auto-formatted Table Grid — bold header row, left/center/right alignment, inline math in cells |
| Lists | Bullet (•◦▪) and numbered lists, up to 6 nesting levels |
| Matrix | \``matrix` blocks → visual matrix diagrams with brackets, labels, and captions |
| Charts | \``chart` blocks → bar, line, pie, and scatter charts via matplotlib |
| Graphs | \``graph` blocks → network/graph diagrams with weighted edges via networkx |
| Workflows | \``workflow` blocks → flowcharts with decision diamonds, process boxes, and arrows |
| Other | Blockquotes, horizontal rules, clickable hyperlinks, local images, footnotes |
Quick Start
Installation
# Using uv (recommended)
uv add markdocx
# Or using pip
pip install markdocx
Install from source (for development)
git clone https://github.com/shynerri-source/markdocx.git
cd markdocx
uv sync # or: pip install -e .
Usage
CLI
# Convert a single file
markdocx input.md
markdocx input.md -o output.docx
# Convert an entire directory
markdocx ./chapters/ -o ./output/
# Recursively search subdirectories
markdocx ./chapters/ -o ./output/ -r
# Verbose logging
markdocx input.md -v
Python API
from markdocx import convert_file, convert_directory
# Single file
convert_file("input.md", "output.docx")
# Entire directory
results = convert_directory("./chapters/", output_dir="./output/", recursive=True)
CLI Options
| Flag | Description |
|---|---|
input |
Markdown file or directory to convert |
-o, --output |
Output file or directory path |
-r, --recursive |
Recursively find .md files in subdirectories |
-v, --verbose |
Show detailed processing logs |
How It Works
Markdown file
│
▼
md_parser.py ─── markdown-it-py tokenizer
│
▼
docx_builder.py ─── walks the token stream, builds Word elements
├── math_renderer.py ─── LaTeX → MathML → OMML (native Word equations)
├── code_renderer.py ─── Pygments lexer → colored Word runs
├── diagram_renderer.py ─── matrix / chart / graph / workflow → PNG images
└── styles.py ─── fonts, colors, spacing presets
│
▼
.docx file ─── python-docx output
Math Pipeline
LaTeX is converted to native OMML (Office Math Markup Language), not images. This means formulas are editable, scale perfectly, and look like they were typed in Word's equation editor.
LaTeX string → latex2mathml → MathML → XSLT → OMML → Word paragraph
Code Pipeline
Source code → Pygments lexer + VS Code theme → colored Word runs inside a shaded table cell
Diagram Pipeline
Fenced code blocks with special language identifiers (matrix, chart, graph, workflow) are rendered as PNG images via matplotlib/networkx and embedded in the document.
```matrix
name: A
1 2 3
4 5 6
7 8 9
caption: Matrix A (3×3)
```
```chart
type: bar
title: Algorithm Performance
labels: Bubble Sort, Merge Sort, Quick Sort
Time (ms): 450, 38, 35
caption: Figure 1: Sorting comparison
```
```graph
directed: true
title: Shortest Path
A -> B: 5
B -> C: 3
A -> C: 7
caption: Figure 2: Weighted directed graph
```
```workflow
title: Login Process
[Start]
<User Input>
(Validate Credentials)
{Valid?}
(Grant Access)
[End]
caption: Figure 3: Authentication workflow
```
| Block type | Formats | Rendered via |
|---|---|---|
matrix |
Simple text or JSON | matplotlib |
chart |
Simple key-value or JSON — bar, line, pie, scatter | matplotlib |
graph |
Edge list (A -> B: 5) or JSON — directed/undirected |
matplotlib + networkx |
workflow |
Step notation ([Start], (Process), {Decision}, <I/O>) or JSON |
matplotlib |
Project Structure
markdocx/
├── main.py # Convenience entry point
├── pyproject.toml # Project metadata & dependencies
├── src/
│ └── markdocx/ # Installable package
│ ├── __init__.py # Public API (convert_file, convert_directory)
│ ├── cli.py # CLI entry point (markdocx command)
│ ├── core.py # Top-level orchestrator
│ ├── md_parser.py # Markdown → token stream
│ ├── math_renderer.py # LaTeX → OMML (native Word math)
│ ├── code_renderer.py # Code → syntax-highlighted Word runs
│ ├── diagram_renderer.py # Matrix / Chart / Graph / Workflow → PNG
│ ├── docx_builder.py # Token stream → DOCX elements
│ └── styles.py # Fonts, colors, and layout presets
└── rule/
├── ai_gen_doc_rule.md # AI writing rules (Vietnamese)
└── ai_gen_doc_rule_en.md # AI writing rules (English)
Dependencies
| Package | Version | Role |
|---|---|---|
| python-docx | 1.2.0 | DOCX generation |
| markdown-it-py | 4.0.0 | Markdown parsing |
| mdit-py-plugins | 0.5.0 | Math & footnote plugins |
| latex2mathml | 3.78.1 | LaTeX → MathML conversion |
| lxml | 6.0.2 | XML/XSLT processing |
| Pygments | 2.19.2 | Syntax highlighting |
| matplotlib | 3.10.8 | Chart, matrix, workflow rendering |
| Pillow | 12.1.0 | Image processing |
| networkx | 3.4+ | Graph/network diagram layouts |
AI Writing Rules
The rule/ directory contains detailed guidelines for prompting AI models to produce Markdown that converts cleanly:
| File | Language | Description |
|---|---|---|
rule/ai_gen_doc_rule.md |
Vietnamese | Full rule set — heading structure, LaTeX constraints, code block format, tables, etc. |
rule/ai_gen_doc_rule_en.md |
English | Same rules, English version |
How to use: Paste the contents of the appropriate rule file into your AI system prompt (or at the start of the conversation) before asking it to write textbook content.
Contributing
Contributions are welcome. Please open an issue first to discuss what you'd like to change.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License — see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markdocx-10.0.0.tar.gz.
File metadata
- Download URL: markdocx-10.0.0.tar.gz
- Upload date:
- Size: 64.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
148cb4e56c9ce3a56d41f476c20ab2ed9ca93fd8433c9d224f110714d2b55240
|
|
| MD5 |
36b35dc37a5184a9adede3b7aae9d513
|
|
| BLAKE2b-256 |
70b0a225dfa082383ba46689bdce8b820740941970f5d897c1398fc481d51596
|
File details
Details for the file markdocx-10.0.0-py3-none-any.whl.
File metadata
- Download URL: markdocx-10.0.0-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78a09b386cb4c1405df0d0381c8bbb52c9e047edf6e2d7ec1f4efd99d4bf38e5
|
|
| MD5 |
def3c378116ea80eeec7843b39a2e0c5
|
|
| BLAKE2b-256 |
4917728e090a91d850ebd23733272f56fe6f918b9c333ef0c19ca9d47efc0f9d
|