Markdown-driven MCP server to create, read and edit Word (.docx), Excel (.xlsx), PowerPoint (.pptx) and PDF documents — by the Touka project.
Project description
mcp-docgen
A Markdown-driven Model Context Protocol (MCP) server
to create, read and edit Word (.docx), Excel (.xlsx),
PowerPoint (.pptx) and PDF documents.
Built entirely on mature, permissively-licensed Python libraries
(python-docx,
python-pptx,
openpyxl,
XlsxWriter,
reportlab,
pypdf,
markdown-it-py) — no proprietary
dependencies. MIT licensed.
Part of the Touka project: giving AI agents the ability to produce, read and edit real Office documents using only open-source building blocks.
Why
LLMs are great at producing Markdown. mcp-docgen converts Markdown to polished Office
documents — and reads them back to Markdown — so an MCP-capable assistant (Claude Desktop,
Touka, …) can run a full read → edit → write loop on real .docx / .xlsx / .pptx
/ .pdf files.
Install & run
uvx mcp-docgen # once published to PyPI
# or, from a local checkout:
uv sync && uv run mcp-docgen
The server speaks MCP over stdio.
MCP client configuration
{
"mcpServers": {
"docgen": {
"command": "uvx",
"args": ["mcp-docgen"],
"env": { "MCP_DOCGEN_OUTPUT_DIR": "/absolute/path/to/workdir" }
}
}
}
From a local checkout, swap the command for:
{ "command": "uv", "args": ["run", "--directory", "/path/to/mcp-docgen", "mcp-docgen"] }
Tools
Create (Markdown / structured data → file)
| Tool | Input → Output |
|---|---|
create_docx(markdown, output_path, title?) |
Markdown → Word |
create_pptx(markdown, output_path, title?) |
Markdown → PowerPoint |
create_pdf(markdown, output_path, title?) |
Markdown → PDF |
create_xlsx(sheets, output_path) |
structured rows → Excel |
Markdown features: headings, bold / italic / inline code, bullet & numbered lists
(nested), tables, block quotes, fenced code blocks, horizontal rules.
PowerPoint slide convention: # Heading starts a new slide (its title); content below
becomes bullet points; --- forces a slide break; title adds a leading title slide.
Excel sheets: [{ "name": str, "rows": [[cell, …], …], "header"?: bool }]. Cells may
be strings / numbers / booleans / null; the first row is a bold, frozen header unless
"header": false.
Read (file → Markdown / structured data)
| Tool | Returns |
|---|---|
read_docx(input_path) |
{ "markdown": … } |
read_pptx(input_path) |
{ "markdown": … } |
read_xlsx(input_path) |
{ "sheets": [{ "name", "rows" }] } (round-trips with create_xlsx) |
read_pdf(input_path) |
{ "num_pages", "pages": […], "text" } |
Reading docx/pptx to Markdown enables editing without in-place tools: read → edit the
Markdown → create_* to regenerate.
Edit (in-place, preserving the rest)
| Tool | Effect |
|---|---|
edit_xlsx(input_path, output_path, edits) |
set cells / append rows / add sheets, keeping other sheets, formulas & formatting |
append_docx(input_path, output_path, markdown) |
append Markdown content to the end |
append_pptx(input_path, output_path, markdown) |
append Markdown-derived slides to the end |
edits = { "set_cells": [{"sheet","cell","value"}], "append_rows": [{"sheet","rows"}], "add_sheet": [{"name","rows"}] }.
PDF page operations
| Tool | Effect |
|---|---|
pdf_merge(input_paths, output_path) |
concatenate PDFs in order |
pdf_split(input_path, output_dir?) |
one file per page |
pdf_extract(input_path, pages, output_path) |
extract a page subset (e.g. "1-3,5") |
Note on PDF "editing": clean open-source PDF editing means page operations (merge / split / extract), not reflowing or replacing body text — PDFs are not designed for in-place text editing. To revise PDF content, regenerate with
create_pdf.
Create/edit tools return {"path": <absolute path>}; pdf_split returns {"paths": […]}.
Directories & safety
- Output files are written inside
MCP_DOCGEN_OUTPUT_DIR(default./out). - Input files (read / edit) are read from
MCP_DOCGEN_INPUT_DIR(default = the output dir), so a read → edit → write loop shares one working directory. - Every path is interpreted relative to its base; any path escaping it (via
..or an absolute path) is rejected, missing inputs and wrong suffixes raise errors. - The server makes no network calls and spawns no subprocesses.
Examples
uv run python examples/generate_samples.py # create report.docx / review.pptx / sales.xlsx
uv run python examples/roundtrip_demo.py # create → read → edit → PDF round-trip
Development
uv sync
uv run pytest
uv run ruff check .
License
MIT © 2026 Touka Project — see LICENSE.
Powered by python-docx, python-pptx, openpyxl, XlsxWriter, reportlab and pypdf; Markdown parsing by markdown-it-py. All MIT/BSD licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_docgen-0.2.0.tar.gz.
File metadata
- Download URL: mcp_docgen-0.2.0.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc4907d08766ca81a8aee4508388f5ff3e6cd4788b6e83cf189618907ac5a1e1
|
|
| MD5 |
c86bc63bec4d5bde889d4f5d594942a9
|
|
| BLAKE2b-256 |
79ae1b8ae9b08223ba1ee02fccb07a6697d77e0a12c9c100f9ef8872c5edfc92
|
File details
Details for the file mcp_docgen-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mcp_docgen-0.2.0-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc56e5830f29d5fc735bb7ff418100f2d83157d1908efd445e2e48777ed1b74e
|
|
| MD5 |
2b341c00df6fcb5e7eb41f0fcdc881c9
|
|
| BLAKE2b-256 |
909d975543040e33ae04698d302d4d92a30520351e1d6bdffb4f6bed6ce7cdce
|