Skip to main content

Markdown-driven MCP server to create, read and edit Word (.docx), Excel (.xlsx), PowerPoint (.pptx) and PDF documents — by the Touka project.

Project description

mcp-docgen

A Markdown-driven Model Context Protocol (MCP) server to create, read and edit Word (.docx), Excel (.xlsx), PowerPoint (.pptx) and PDF documents.

Built entirely on mature, permissively-licensed Python libraries (python-docx, python-pptx, openpyxl, XlsxWriter, reportlab, pypdf, markdown-it-py) — no proprietary dependencies. MIT licensed.

Part of the Touka project: giving AI agents the ability to produce, read and edit real Office documents using only open-source building blocks.

Why

LLMs are great at producing Markdown. mcp-docgen converts Markdown to polished Office documents — and reads them back to Markdown — so an MCP-capable assistant (Claude Desktop, Touka, …) can run a full read → edit → write loop on real .docx / .xlsx / .pptx / .pdf files.

Install & run

uvx mcp-docgen          # once published to PyPI
# or, from a local checkout:
uv sync && uv run mcp-docgen

The server speaks MCP over stdio.

MCP client configuration

{
  "mcpServers": {
    "docgen": {
      "command": "uvx",
      "args": ["mcp-docgen"],
      "env": { "MCP_DOCGEN_OUTPUT_DIR": "/absolute/path/to/workdir" }
    }
  }
}

From a local checkout, swap the command for:

{ "command": "uv", "args": ["run", "--directory", "/path/to/mcp-docgen", "mcp-docgen"] }

Tools

Create (Markdown / structured data → file)

Tool Input → Output
create_docx(markdown, output_path, title?) Markdown → Word
create_pptx(markdown, output_path, title?) Markdown → PowerPoint
create_pdf(markdown, output_path, title?) Markdown → PDF
create_xlsx(sheets, output_path) structured rows → Excel

Markdown features: headings, bold / italic / inline code, bullet & numbered lists (nested), tables, block quotes, fenced code blocks, horizontal rules.

PowerPoint slide convention: # Heading starts a new slide (its title); content below becomes bullet points; --- forces a slide break; title adds a leading title slide.

Excel sheets: [{ "name": str, "rows": [[cell, …], …], "header"?: bool }]. Cells may be strings / numbers / booleans / null; the first row is a bold, frozen header unless "header": false.

Read (file → Markdown / structured data)

Tool Returns
read_docx(input_path) { "markdown": … }
read_pptx(input_path) { "markdown": … }
read_xlsx(input_path) { "sheets": [{ "name", "rows" }] } (round-trips with create_xlsx)
read_pdf(input_path) { "num_pages", "pages": […], "text" }

Reading docx/pptx to Markdown enables editing without in-place tools: read → edit the Markdown → create_* to regenerate.

Edit (in-place, preserving the rest)

Tool Effect
edit_xlsx(input_path, output_path, edits) set cells / append rows / add sheets, keeping other sheets, formulas & formatting
append_docx(input_path, output_path, markdown) append Markdown content to the end
append_pptx(input_path, output_path, markdown) append Markdown-derived slides to the end

edits = { "set_cells": [{"sheet","cell","value"}], "append_rows": [{"sheet","rows"}], "add_sheet": [{"name","rows"}] }.

PDF page operations

Tool Effect
pdf_merge(input_paths, output_path) concatenate PDFs in order
pdf_split(input_path, output_dir?) one file per page
pdf_extract(input_path, pages, output_path) extract a page subset (e.g. "1-3,5")

Note on PDF "editing": clean open-source PDF editing means page operations (merge / split / extract), not reflowing or replacing body text — PDFs are not designed for in-place text editing. To revise PDF content, regenerate with create_pdf.

Create/edit tools return {"path": <absolute path>}; pdf_split returns {"paths": […]}.

Directories & safety

  • Output files are written inside MCP_DOCGEN_OUTPUT_DIR (default ./out).
  • Input files (read / edit) are read from MCP_DOCGEN_INPUT_DIR (default = the output dir), so a read → edit → write loop shares one working directory.
  • Every path is interpreted relative to its base; any path escaping it (via .. or an absolute path) is rejected, missing inputs and wrong suffixes raise errors.
  • The server makes no network calls and spawns no subprocesses.

Examples

uv run python examples/generate_samples.py   # create report.docx / review.pptx / sales.xlsx
uv run python examples/roundtrip_demo.py      # create → read → edit → PDF round-trip

Development

uv sync
uv run pytest
uv run ruff check .

License

MIT © 2026 Touka Project — see LICENSE.

Powered by python-docx, python-pptx, openpyxl, XlsxWriter, reportlab and pypdf; Markdown parsing by markdown-it-py. All MIT/BSD licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_docgen-0.2.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_docgen-0.2.0-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file mcp_docgen-0.2.0.tar.gz.

File metadata

  • Download URL: mcp_docgen-0.2.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcp_docgen-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fc4907d08766ca81a8aee4508388f5ff3e6cd4788b6e83cf189618907ac5a1e1
MD5 c86bc63bec4d5bde889d4f5d594942a9
BLAKE2b-256 79ae1b8ae9b08223ba1ee02fccb07a6697d77e0a12c9c100f9ef8872c5edfc92

See more details on using hashes here.

File details

Details for the file mcp_docgen-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_docgen-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.5 {"installer":{"name":"uv","version":"0.10.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcp_docgen-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cc56e5830f29d5fc735bb7ff418100f2d83157d1908efd445e2e48777ed1b74e
MD5 2b341c00df6fcb5e7eb41f0fcdc881c9
BLAKE2b-256 909d975543040e33ae04698d302d4d92a30520351e1d6bdffb4f6bed6ce7cdce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page