Skip to main content

MCP server for Microsoft Office document processing. Named for Milton Waddams, who was relocated to the basement with boxes of legacy documents.

Project description

๐Ÿ“Ž mcwaddams

MCP server for Microsoft Office document processing

Python 3.11+ FastMCP License: MIT MCP Protocol

"I was told there would be document extraction."

Installation โ€ข Tools โ€ข Examples โ€ข Testing


The Backstory

Milton Waddams was relocated to the basement. They took his stapler. But down there, surrounded by boxes of .doc files from 1997 and .xls spreadsheets that predate Unicode, he became something else entirely: a document processing expert.

This MCP server channels that energy. It handles the legacy formats nobody else wants to touch. It extracts text from files that should have been migrated to Google Docs a decade ago. It reads the TPS reports.


โœจ Features

  • Universal extraction โ€” Pull text, images, and metadata from any Office format
  • Format-specific tools โ€” Deep analysis for Word (tables, structure), Excel (formulas, charts), PowerPoint
  • Automatic pagination โ€” Large documents get chunked so they don't blow up your context window
  • Fallback processing โ€” When one library chokes on a weird file, we try another
  • URL support โ€” Pass a URL instead of a file path; we'll download and cache it
  • Legacy formats โ€” Yes, even those .doc and .xls files from the basement

๐Ÿš€ Installation

# Quick install with uvx (recommended)
uvx mcwaddams

# Or install with uv/pip
uv add mcwaddams
pip install mcwaddams

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "mcwaddams": {
      "command": "uvx",
      "args": ["mcwaddams"]
    }
  }
}

Claude Code Configuration

claude mcp add mcwaddams "uvx mcwaddams"

๐Ÿ›  Available Tools

Universal Tools

Work with all Office formats: Word, Excel, PowerPoint, CSV

Tool Description
extract_text Extract text with optional formatting preservation
extract_images Extract embedded images with size filtering
extract_metadata Get document properties (author, dates, statistics)
detect_office_format Identify format, version, encryption status
analyze_document_health Check integrity, corruption, password protection
get_supported_formats List all supported file extensions
index_document Scan document and create resource URIs for on-demand fetching

Word Tools

Tool Description
convert_to_markdown Convert to Markdown with automatic pagination for large docs
extract_word_tables Extract tables as structured JSON, CSV, or Markdown
analyze_word_structure Analyze headings, sections, styles, and document hierarchy
get_document_outline Get structured outline with chapter detection and word counts
check_style_consistency Find formatting issues, missing chapters, style problems
search_document Search text with context and chapter location
extract_entities Extract people, places, organizations using pattern recognition
get_chapter_summaries Generate chapter previews with opening sentences
save_reading_progress Bookmark your reading position for later
get_reading_progress Resume reading from saved position

Excel Tools

Tool Description
analyze_excel_data Statistical analysis: data types, missing values, outliers
extract_excel_formulas Extract formulas with values and dependency analysis
create_excel_chart_data Generate Chart.js/Plotly-ready data from spreadsheets

๐Ÿ“‹ Format Support

Here's what works and what's "good enough" โ€” legacy formats from Office 97-2003 have more limited extraction, but they still work:

Format Extension Text Images Metadata Tables Formulas
Word (Modern) .docx โœ… โœ… โœ… โœ… -
Word (Legacy) .doc โœ… โš ๏ธ โš ๏ธ โš ๏ธ -
Word Template .dotx โœ… โœ… โœ… โœ… -
Word Macro .docm โœ… โœ… โœ… โœ… -
Excel (Modern) .xlsx โœ… โœ… โœ… โœ… โœ…
Excel (Legacy) .xls โœ… โš ๏ธ โš ๏ธ โœ… โš ๏ธ
Excel Template .xltx โœ… โœ… โœ… โœ… โœ…
Excel Macro .xlsm โœ… โœ… โœ… โœ… โœ…
PowerPoint (Modern) .pptx โœ… โœ… โœ… โœ… -
PowerPoint (Legacy) .ppt โœ… โš ๏ธ โš ๏ธ โš ๏ธ -
PowerPoint Template .potx โœ… โœ… โœ… โœ… -
CSV .csv โœ… - โš ๏ธ โœ… -

โœ… Full support โ€ข โš ๏ธ Basic/partial support โ€ข - Not applicable


๐Ÿ”— MCP Resources

Instead of returning entire documents in tool responses, you can index a document once and fetch content on-demand via URI-based resources. This keeps context windows manageable when working with large files.

How It Works

  1. Index the document โ€” index_document scans the file and returns URIs
  2. Fetch what you need โ€” Request specific chapters, sheets, slides, or images by URI
  3. Format on demand โ€” Append .txt or .html to get different output formats

Resource URI Patterns

URI Pattern Description Example
chapter://{doc_id}/{n} Single chapter/section chapter://abc123/3
chapters://{doc_id}/{range} Multiple chapters chapters://abc123/1-5
section://{doc_id}/{n} Section by heading style section://abc123/2
paragraph://{doc_id}/{ch}/{p} Specific paragraph paragraph://abc123/3/7
sheet://{doc_id}/{name} Excel sheet as markdown table sheet://abc123/Revenue
slide://{doc_id}/{n} PowerPoint slide slide://abc123/5
slides://{doc_id}/{range} Multiple slides slides://abc123/1,3,5
image://{doc_id}/{n} Embedded image image://abc123/0

Format Suffixes

Append a format suffix to convert on the fly:

Suffix Output
.md (default) Markdown
.txt Plain text (no formatting)
.html Basic HTML

Examples:

  • chapter://abc123/3 โ†’ Markdown (default)
  • chapter://abc123/3.txt โ†’ Plain text
  • chapter://abc123/3.html โ†’ HTML

Range Syntax

Fetch multiple items at once:

  • 1-5 โ†’ Items 1 through 5
  • 1,3,5 โ†’ Specific items
  • 1-3,7,9-10 โ†’ Mixed ranges

Section Detection

The indexer detects document structure automatically:

  1. Heading 1 styles (primary) โ€” Business docs, manuals, technical documents
  2. "Chapter X" text patterns (fallback) โ€” Books, manuscripts, narratives

Use text_patterns_only=True to skip heading style detection for documents with messy formatting.


๐ŸŽฏ MCP Prompts

Pre-built workflows that chain multiple tools together:

Prompt Level Description
explore-document Basic Start with any new document - get structure and identify issues
find-character Basic Track all mentions of a person/character with context
chapter-preview Basic Quick overview of each chapter without full read
resume-reading Intermediate Check saved position and continue reading
document-analysis Intermediate Comprehensive multi-tool analysis
character-journey Advanced Track character arc through entire narrative
document-comparison Advanced Compare entities and themes between chapters
full-reading-session Advanced Guided reading with bookmarking
manuscript-review Advanced Complete editorial workflow for editors

๐Ÿ’ก Usage Examples

Extract Text from Any Document

# Simple extraction
result = await extract_text("report.docx")
print(result["text"])

# With formatting preserved
result = await extract_text(
    file_path="report.docx",
    preserve_formatting=True,
    include_metadata=True
)

Convert Word to Markdown (with Pagination)

Large documents get paginated automatically. Three ways to handle it:

# Option 1: Follow the cursor for each chunk
result = await convert_to_markdown("big-manual.docx")
if result.get("pagination", {}).get("has_more"):
    next_page = await convert_to_markdown(
        "big-manual.docx",
        cursor_id=result["pagination"]["cursor_id"]
    )

# Option 2: Grab specific pages
result = await convert_to_markdown("big-manual.docx", page_range="1-10")

# Option 3: Extract by chapter heading
result = await convert_to_markdown("big-manual.docx", chapter_name="Introduction")

Analyze Excel Data Quality

result = await analyze_excel_data(
    file_path="sales-data.xlsx",
    include_statistics=True,
    check_data_quality=True
)

# Returns per-column analysis with quality issues

Index Document for On-Demand Resource Fetching

# Index the document - returns URIs for all content
result = await index_document("novel.docx")

# Returns:
# {
#   "doc_id": "56036b0f171a",
#   "resources": {
#     "chapter": [
#       {"id": "1", "title": "Chapter 1", "uri": "chapter://56036b0f171a/1"},
#       ...
#     ],
#     "image": [
#       {"id": "0", "uri": "image://56036b0f171a/0"},
#       ...
#     ]
#   }
# }

# Fetch specific content via MCP resources:
# - chapter://56036b0f171a/1      โ†’ Chapter 1 as markdown
# - chapter://56036b0f171a/1.txt  โ†’ Chapter 1 as plain text
# - chapters://56036b0f171a/1-3   โ†’ Chapters 1-3 combined

๐Ÿงช Testing

# Run tests and generate the dashboard
make test

# Just pytest
make test-pytest

# Open dashboard
make view-dashboard

๐Ÿ— Architecture

The mixin pattern keeps things modular โ€” universal tools work on everything, format-specific tools go deeper.

mcwaddams/
โ”œโ”€โ”€ src/mcwaddams/
โ”‚   โ”œโ”€โ”€ server.py              # FastMCP server + resource templates
โ”‚   โ”œโ”€โ”€ resources.py           # Resource store for on-demand content
โ”‚   โ”œโ”€โ”€ mixins/
โ”‚   โ”‚   โ”œโ”€โ”€ universal.py       # Format-agnostic tools
โ”‚   โ”‚   โ”œโ”€โ”€ word.py            # Word-specific tools
โ”‚   โ”‚   โ”œโ”€โ”€ excel.py           # Excel-specific tools
โ”‚   โ”‚   โ””โ”€โ”€ powerpoint.py      # PowerPoint tools
โ”‚   โ”œโ”€โ”€ utils/                 # Validation, caching, detection
โ”‚   โ””โ”€โ”€ pagination.py          # Large document pagination
โ”œโ”€โ”€ tests/
โ””โ”€โ”€ reports/

Processing Libraries

Format Primary Library Fallback
.docx python-docx mammoth
.xlsx openpyxl pandas
.pptx python-pptx -
.doc/.xls/.ppt olefile -
.csv pandas built-in csv

๐Ÿ”ง Development

git clone https://github.com/ryanmalloy/mcwaddams.git
cd mcwaddams
uv sync --dev

uv run pytest
uv run black src/ tests/
uv run ruff check src/ tests/

๐Ÿ‘ค Author

Ryan Malloy โ€” ryanmalloy.com

This package emerged from a human-AI collaboration session. The process raised questions about discernment, voice, and what makes tools actually useful:


๐Ÿ“œ License

MIT License - see LICENSE for details.


Named for Milton Waddams, who was relocated to the basement with the legacy documents.

"I could set the building on fire..."

Built with FastMCP and the Model Context Protocol

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcwaddams-2026.5.22.tar.gz (69.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcwaddams-2026.5.22-py3-none-any.whl (79.7 kB view details)

Uploaded Python 3

File details

Details for the file mcwaddams-2026.5.22.tar.gz.

File metadata

  • Download URL: mcwaddams-2026.5.22.tar.gz
  • Upload date:
  • Size: 69.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"EndeavourOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcwaddams-2026.5.22.tar.gz
Algorithm Hash digest
SHA256 b4c41db1cc88f3c648245e821c5e76229e53ffaa92459bcaf86311a3a9da815c
MD5 c232a9447df5b2f898dfa06e1ddf6076
BLAKE2b-256 d91d09241aaf55e899b1c89a378dc4976d2d044038cd1e672597215ebc797a65

See more details on using hashes here.

File details

Details for the file mcwaddams-2026.5.22-py3-none-any.whl.

File metadata

  • Download URL: mcwaddams-2026.5.22-py3-none-any.whl
  • Upload date:
  • Size: 79.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"EndeavourOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mcwaddams-2026.5.22-py3-none-any.whl
Algorithm Hash digest
SHA256 5ca48f76864dda12bc9c3d60dcfec19e615c2a8ac42e293c508f2ff105e37674
MD5 851e5280e7c16f04f8df109f5b5040ef
BLAKE2b-256 04ee7cb35704b305bbdc7a7a04ffd8aaac8eb85f6b45567587890dfc37b3047e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page