MCP Gosling - Advanced document processing server for Goose AI using IBM's Docling library

These details have not been verified by PyPI

Project links

Project description

🦆 MCP Gosling - Document Processor for Goose

Advanced document processing extension for Goose AI with enterprise-grade offline fallback

A powerful Model Context Protocol (MCP) server that provides advanced document processing capabilities for Goose. Process PDFs, DOCX, PPTX, images, and HTML documents with high fidelity using IBM's Docling library, with intelligent fallback to offline processing for network-restricted environments.

✨ Key Features

🔧 Enterprise-Ready: SSL certificate fixes for corporate networks
📄 Multi-Format: PDF, DOCX, PPTX, images, HTML, and more
🌐 Offline Capable: Graceful fallback to PyPDF2 when Hugging Face is blocked
⚡ High Performance: Optimized for production workloads
🛡️ Robust: Comprehensive error handling and validation
🎯 AI-Optimized: Clean Markdown output perfect for AI analysis

🚀 Quick Start

Installation Options

Option 1: Standard Installation (Recommended)

pip install mcp-gosling

Option 2: Using uvx (Modern Tooling)

# Run directly without installation
uvx mcp-gosling

# Or using uv tool run (identical behavior)
uv tool run mcp-gosling

Configuration for Goose

With Standard Installation:

{
  "mcpServers": {
    "gosling": {
      "command": "mcp-gosling",
      "args": []
    }
  }
}

With uvx:

{
  "mcpServers": {
    "gosling": {
      "command": "uvx",
      "args": ["mcp-gosling"]
    }
  }
}

Usage

# Process your AWS certification PDF
goose "Process my AWS certification document at /path/to/cert.pdf"

# Batch process multiple documents
goose "Process all PDFs in my documents folder and summarize them"

# Extract metadata only
goose "What are the metadata details of this document?"

📋 Available Tools

`process_document`

Process a single document and return clean Markdown content.

Parameters:

file_path (string): Path to the document file
output_format (string): "markdown", "json", or "text" (default: "markdown")
extract_images (boolean): Whether to extract and describe images (default: false)
extract_tables (boolean): Whether to extract table structure (default: true)

`batch_process_documents`

Process multiple documents in batch with optional file output.

Parameters:

file_paths (array): List of document file paths (max 20 files)
output_format (string): Output format for all documents (default: "markdown")
output_directory (string): Directory to save files (empty = return content)

`extract_document_metadata`

Extract detailed metadata and structure information from a document.

Parameters:

file_path (string): Path to the document file

🔧 Advanced Features

Corporate Network Support

✅ SSL certificate fixes for enterprise environments
✅ Automatic fallback when Hugging Face Hub is blocked
✅ Works behind corporate firewalls and proxies

Intelligent Processing

Primary: IBM Docling for high-fidelity extraction with OCR and table recognition
Fallback: PyPDF2 for reliable offline PDF processing
Formats: PDF, DOCX, PPTX, PNG, JPG, HTML, TXT, MD, JSON

Performance & Reliability

File size limits (50MB for full processing, 5MB for metadata)
Batch processing (up to 20 files)
Comprehensive error handling
Memory-efficient processing

🎯 Use Cases

📑 Document Analysis: Extract and analyze content from reports, papers, contracts
🏢 Enterprise: Process documents in network-restricted corporate environments
🔍 Research: Batch process academic papers and research documents
📊 Data Extraction: Convert documents to structured data for AI analysis
📝 Content Migration: Bulk convert document formats with preserved structure

🛠 Technical Details

Built With:

IBM Docling - Enterprise-grade document processing
PyPDF2 - Reliable offline PDF processing
MCP Python SDK - Model Context Protocol

Requirements:

Python 3.9+
Works on macOS, Linux, Windows
Optional: GPU acceleration for enhanced performance

🚀 Installation Options

For Goose Users (Recommended)

Install via pip:
```
pip install mcp-gosling
```
Configure in Goose: Add the MCP server to your Goose configuration
Start using:
```
goose "Process this document for me"
```

For MCP Development

Clone and install:

git clone https://github.com/masanderso/goose-docling.git
cd goose-docling
pip install -e .

Test with MCP Inspector:
```
mcp dev src/mcp_docling/server.py
```

🔍 Example Outputs

Document Processing

# Document: AWS Certified Solutions Architect - Associate.pdf
**Source:** /path/to/document.pdf
**Format:** .pdf
**Pages:** 2

---

## Page 1

AWS Certified Solutions Architect - Associate
Notice of Exam Results
Candidate: Matthew Sanderson Exam Date: 12/3/2024
Candidate Score: 779 Pass/Fail: PASS
...

Metadata Extraction

{
  "file_info": {
    "name": "document.pdf",
    "size_mb": 0.03,
    "format": ".pdf"
  },
  "document_structure": {
    "page_count": 2,
    "has_tables": true,
    "has_figures": false
  }
}

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🔗 Links

🏷️ Tags

goose-extension document-processing mcp-server docling pdf-processing enterprise-ready offline-capable ai-tools

🚀 Features

Multi-format Support: PDF, DOCX, PPTX, images (PNG, JPG), HTML, and more
Intelligent Processing: OCR, table extraction, and structure preservation
Flexible Output: Markdown, JSON, or plain text formats
Batch Processing: Handle multiple documents efficiently
Metadata Extraction: Detailed document structure and properties
Production Ready: Robust error handling and file size limits

📋 Tools Available

This MCP server exposes three main tools:

1. `process_document`

Process a single document and return the converted content.

Parameters:

file_path (string): Path to the document file
output_format (string): "markdown", "json", or "text" (default: "markdown")
extract_images (boolean): Whether to extract and describe images (default: false)
extract_tables (boolean): Whether to extract table structure (default: true)

Example:

process_document("report.pdf", "markdown", true, true)

2. `batch_process_documents`

Process multiple documents in batch with optional file output.

Parameters:

file_paths (array): List of document file paths (max 20 files)
output_format (string): Output format for all documents (default: "markdown")
output_directory (string): Directory to save files (empty = return content)

Example:

batch_process_documents(["doc1.pdf", "doc2.docx"], "markdown", "/output")

3. `extract_document_metadata`

Extract detailed metadata and structure information from a document.

Parameters:

file_path (string): Path to the document file

Example:

extract_document_metadata("report.pdf")

🛠 Installation

For Goose Users

Option 1: Standard Installation

Install the MCP server:

pip install mcp-gosling

Add to your Goose configuration:

{
  "mcpServers": {
    "gosling": {
      "command": "mcp-gosling",
      "args": []
    }
  }
}

Option 2: Using uvx (Modern)

Ensure uv is installed:

pip install uv

Add to your Goose configuration:

{
  "mcpServers": {
    "gosling": {
      "command": "uvx",
      "args": ["mcp-gosling"]
    }
  }
}

For MCP Development

Clone and install:

git clone https://github.com/masanderso/mcp-gosling.git
cd mcp-gosling
pip install -e .

Test with MCP Inspector:

mcp dev src/mcp_docling/server.py

🔧 Configuration

The server automatically configures Docling with optimal settings:

OCR enabled for scanned documents
Table structure extraction with cell matching
Support for all major document formats
50MB file size limit for safety

🎯 Use Cases

Research: Extract content from academic papers and reports
Business: Process contracts, invoices, and presentations
Data Extraction: Convert documents to structured data
Content Migration: Bulk convert document formats
Analysis: Extract metadata and document structure

🏗 Architecture

This server follows the MCP (Model Context Protocol) specification:

Tools: Document processing functions exposed to AI assistants
STDIO Transport: Communication via standard input/output
Error Handling: Proper MCP error responses
Type Safety: Full type annotations and validation

🤝 Integration Examples

With Goose

"Process the quarterly report in /documents/q4-report.pdf and summarize the key findings"

With other MCP clients

# Call the process_document tool
result = await client.call_tool("process_document", {
    "file_path": "/path/to/document.pdf",
    "output_format": "markdown"
})

📊 Performance

Speed: Optimized for production workloads
Memory: Efficient processing of large documents
Reliability: Robust error handling and validation
Scalability: Suitable for batch processing workflows

🐛 Troubleshooting

Common issues and solutions:

File not found: Ensure file paths are absolute and accessible
Large files: Files over 50MB are rejected for safety
Format errors: Check that file format is supported
Memory issues: Process large batches in smaller chunks

📄 License

MIT License - see LICENSE file for details.

🤝 Contributing

Contributions welcome! Please read our contributing guidelines and submit pull requests.

🔗 Links

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Aug 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_gosling-0.1.0.tar.gz (13.0 kB view details)

Uploaded Aug 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_gosling-0.1.0-py3-none-any.whl (9.7 kB view details)

Uploaded Aug 25, 2025 Python 3

File details

Details for the file mcp_gosling-0.1.0.tar.gz.

File metadata

Download URL: mcp_gosling-0.1.0.tar.gz
Upload date: Aug 25, 2025
Size: 13.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for mcp_gosling-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e10387db6c0bf40417811dbd2de29dcdbbc947b1320eccb169201e8661dbd8b1`
MD5	`038724748987a2f3f348fc432d7545f9`
BLAKE2b-256	`7c9256579a8d071af21548040fb24c661f8a0fb368b62e2394b9166205b2518a`

See more details on using hashes here.

File details

Details for the file mcp_gosling-0.1.0-py3-none-any.whl.

File metadata

Download URL: mcp_gosling-0.1.0-py3-none-any.whl
Upload date: Aug 25, 2025
Size: 9.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for mcp_gosling-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85af761fdd47d19e591cbb04dd1217ec663911157717a4cdf979d3e22afa4b69`
MD5	`6c63ad9202db68d0228e2f294e3dddac`
BLAKE2b-256	`a0a68c57b35b9a61d594fb79d8eefc350724d1ce631ed4618bb583f2814b752e`

See more details on using hashes here.

mcp-gosling 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🦆 MCP Gosling - Document Processor for Goose

✨ Key Features

🚀 Quick Start

Installation Options

Option 1: Standard Installation (Recommended)

Option 2: Using uvx (Modern Tooling)

Configuration for Goose

With Standard Installation:

With uvx:

Usage

📋 Available Tools

process_document

batch_process_documents

extract_document_metadata

🔧 Advanced Features

Corporate Network Support

Intelligent Processing

Performance & Reliability

🎯 Use Cases

🛠 Technical Details

🚀 Installation Options

For Goose Users (Recommended)

For MCP Development

🔍 Example Outputs

Document Processing

Metadata Extraction

🤝 Contributing

📄 License

🔗 Links

🏷️ Tags

🚀 Features

📋 Tools Available

1. process_document

2. batch_process_documents

3. extract_document_metadata

🛠 Installation

For Goose Users

Option 1: Standard Installation

Option 2: Using uvx (Modern)

For MCP Development

🔧 Configuration

🎯 Use Cases

🏗 Architecture

🤝 Integration Examples

With Goose

With other MCP clients

📊 Performance

🐛 Troubleshooting

📄 License

🤝 Contributing

🔗 Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`process_document`

`batch_process_documents`

`extract_document_metadata`

1. `process_document`

2. `batch_process_documents`

3. `extract_document_metadata`