Convert AI-generated Markdown textbooks to polished DOCX with native math equations and syntax-highlighted code
Project description
MD to DOCX
A Markdown-to-Word converter built for AI-generated textbooks
Convert Markdown files — complete with LaTeX math, syntax-highlighted code, tables, and images — into polished .docx documents in one command.
Why This Exists
Large language models (ChatGPT, Claude, Gemini, …) produce great Markdown, but the journey from .md to a well-formatted Word document is painful:
- LaTeX formulas become plain text or broken images
- Code blocks lose their highlighting
- Tables, lists, and blockquotes need manual reformatting
MD to DOCX bridges that gap. Feed it a Markdown file that follows a few simple rules and get a publication-ready .docx — math rendered as native Word OMML equations, code with VS Code–style colors, and everything else properly formatted.
Features
| Category | What you get |
|---|---|
| Math | Inline ($...$) and display ($$...$$) LaTeX → native OMML equations in Word |
| Code | 30+ languages with Pygments syntax highlighting, VS Code light theme, language labels |
| Tables | Auto-formatted Table Grid — bold header row, left/center/right alignment, inline math in cells |
| Lists | Bullet (•◦▪) and numbered lists, up to 6 nesting levels |
| Other | Blockquotes, horizontal rules, clickable hyperlinks, local images, footnotes |
Quick Start
Installation
git clone https://github.com/shynerri-source/markdocx.git
cd markdocx
# Using uv (recommended)
uv sync
# Or using pip
pip install -r requirements.txt
Usage
# Convert a single file
python main.py input.md
python main.py input.md -o output.docx
# Convert an entire directory
python main.py ./chapters/ -o ./output/
# Recursively search subdirectories
python main.py ./chapters/ -o ./output/ -r
# Verbose logging
python main.py input.md -v
CLI Options
| Flag | Description |
|---|---|
input |
Markdown file or directory to convert |
-o, --output |
Output file or directory path |
-r, --recursive |
Recursively find .md files in subdirectories |
-v, --verbose |
Show detailed processing logs |
How It Works
Markdown file
│
▼
md_parser.py ─── markdown-it-py tokenizer
│
▼
docx_builder.py ─── walks the token stream, builds Word elements
├── math_renderer.py ─── LaTeX → MathML → OMML (native Word equations)
├── code_renderer.py ─── Pygments lexer → colored Word runs
└── styles.py ─── fonts, colors, spacing presets
│
▼
.docx file ─── python-docx output
Math Pipeline
LaTeX is converted to native OMML (Office Math Markup Language), not images. This means formulas are editable, scale perfectly, and look like they were typed in Word's equation editor.
LaTeX string → latex2mathml → MathML → XSLT → OMML → Word paragraph
Code Pipeline
Source code → Pygments lexer + VS Code theme → colored Word runs inside a shaded table cell
Project Structure
md_to_docx/
├── main.py # CLI entry point
├── pyproject.toml # Project metadata & dependencies
├── requirements.txt # Pip-compatible dependency list
├── converter/
│ ├── __init__.py
│ ├── core.py # Top-level orchestrator
│ ├── md_parser.py # Markdown → token stream
│ ├── math_renderer.py # LaTeX → OMML (native Word math)
│ ├── code_renderer.py # Code → syntax-highlighted Word runs
│ ├── docx_builder.py # Token stream → DOCX elements
│ └── styles.py # Fonts, colors, and layout presets
└── rule/
├── ai_gen_doc_rule.md # AI writing rules (Vietnamese)
└── ai_gen_doc_rule_en.md # AI writing rules (English)
Dependencies
| Package | Version | Role |
|---|---|---|
| python-docx | 1.2.0 | DOCX generation |
| markdown-it-py | 4.0.0 | Markdown parsing |
| mdit-py-plugins | 0.5.0 | Math & footnote plugins |
| latex2mathml | 3.78.1 | LaTeX → MathML conversion |
| lxml | 6.0.2 | XML/XSLT processing |
| Pygments | 2.19.2 | Syntax highlighting |
| matplotlib | 3.10.8 | LaTeX rendering (fallback) |
| Pillow | 12.1.0 | Image processing |
AI Writing Rules
The rule/ directory contains detailed guidelines for prompting AI models to produce Markdown that converts cleanly:
| File | Language | Description |
|---|---|---|
rule/ai_gen_doc_rule.md |
Vietnamese | Full rule set — heading structure, LaTeX constraints, code block format, tables, etc. |
rule/ai_gen_doc_rule_en.md |
English | Same rules, English version |
How to use: Paste the contents of the appropriate rule file into your AI system prompt (or at the start of the conversation) before asking it to write textbook content.
Contributing
Contributions are welcome. Please open an issue first to discuss what you'd like to change.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License — see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markdocx-1.tar.gz.
File metadata
- Download URL: markdocx-1.tar.gz
- Upload date:
- Size: 48.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a6473ba7122653dc19f839fe15af0035ae51969a8e72b40da25394af8aa1ba8
|
|
| MD5 |
787a083fef7c5995a2a232a595c37601
|
|
| BLAKE2b-256 |
3e720d947f4061f5842021e93e8afa9faaa3032d24634be1f0528c961225aa7d
|
File details
Details for the file markdocx-1-py3-none-any.whl.
File metadata
- Download URL: markdocx-1-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8850409ad50cd46fad5b7be7735c81d4af1c29877e52d52babb4dcb403de971
|
|
| MD5 |
fefb0635d0867cca3ddeb0c17614ec89
|
|
| BLAKE2b-256 |
d7d12fe39fb851bcdaadf34af2c03547734eac19ca85235bbea0ed36bfc03657
|