An AWS Labs Model Context Protocol (MCP) server for document parsing
Project description
Document Loader MCP Server
Model Context Protocol (MCP) server for document parsing and content extraction
This MCP server provides tools to parse and extract content from various document formats including PDF, Word documents, Excel spreadsheets, PowerPoint presentations, and images.
Features
- PDF Text Extraction: Extract text content from PDF files using pdfplumber
- Word Document Processing: Convert DOCX/DOC files to markdown using markitdown
- Excel Spreadsheet Reading: Parse XLSX/XLS files and convert to markdown
- PowerPoint Presentation Processing: Extract content from PPTX/PPT files
- Image Loading: Load and display various image formats (PNG, JPG, GIF, BMP, TIFF, WEBP)
Prerequisites
Installation Requirements
- Install
uvfrom Astral or the GitHub README - Install Python 3.10 or newer using
uv python install 3.10(or a more recent version)
Installation
| Cursor | VS Code |
|---|---|
Configure the MCP server in your MCP client configuration:
{
"mcpServers": {
"awslabs.document-loader-mcp-server": {
"command": "uvx",
"args": ["awslabs.document-loader-mcp-server@latest"],
"env": {
"FASTMCP_LOG_LEVEL": "ERROR"
},
"disabled": false,
"autoApprove": []
}
}
}
For Amazon Q Developer CLI, add the MCP client configuration and tool command to the agent file in ~/.aws/amazonq/cli-agents.
Example, ~/.aws/amazonq/cli-agents/default.json
{
"mcpServers": {
"awslabs.document-loader-mcp-server": {
"command": "uvx",
"args": ["awslabs.document-loader-mcp-server@latest"],
"env": {
"FASTMCP_LOG_LEVEL": "ERROR"
},
"disabled": false,
"autoApprove": []
}
}
}
Available Tools
read_document: Extract content from various document formats by specifying file_path and file_type ('pdf', 'docx', 'doc', 'xlsx', 'xls', 'pptx', 'ppt')read_image: Load image files for LLM viewing and analysis
Environment Variables
FASTMCP_LOG_LEVEL: Set logging level (ERROR, INFO, DEBUG)
Development
Setup
# Clone the repository
git clone https://github.com/awslabs/mcp.git
cd mcp/src/document-loader-mcp-server
# Install dependencies
uv sync
# Install in development mode
uv pip install -e .
Testing
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=awslabs.document_loader_mcp_server
The test suite includes:
- Server functionality validation
- Document parsing tests with generated sample files
- Error handling verification
Sample Documents
The test suite automatically generates sample documents for testing:
- PDF with multi-page content
- DOCX with formatted text and lists
- XLSX with multiple sheets and data
- PPTX with slides and content
- Various image formats
Docker
You can also run this server in a Docker container:
docker build -t document-loader-mcp-server .
docker run -p 8000:8000 document-loader-mcp-server
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributing
We welcome contributions! Please see CONTRIBUTING.md for details.
Support
For issues and questions, please use the GitHub issue tracker.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file awslabs_document_loader_mcp_server-1.0.4.tar.gz.
File metadata
- Download URL: awslabs_document_loader_mcp_server-1.0.4.tar.gz
- Upload date:
- Size: 175.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b0d6bdb45153ac3142bd20ceaa60a3c94cd13e181b1779ebe4da40cf76fb771
|
|
| MD5 |
976ce37c55dcb41b76eb46a365dfb3f9
|
|
| BLAKE2b-256 |
acef5d1b96dd0849d115cfd786be4395833fb0dffc39867795ec6d7e0520a8ce
|
Provenance
The following attestation bundles were made for awslabs_document_loader_mcp_server-1.0.4.tar.gz:
Publisher:
release.yml on awslabs/mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
awslabs_document_loader_mcp_server-1.0.4.tar.gz -
Subject digest:
4b0d6bdb45153ac3142bd20ceaa60a3c94cd13e181b1779ebe4da40cf76fb771 - Sigstore transparency entry: 782602461
- Sigstore integration time:
-
Permalink:
awslabs/mcp@4c395a2996bc0e90108f286e767d187e8fcaf20f -
Branch / Tag:
refs/tags/2025.12.20251230231100 - Owner: https://github.com/awslabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4c395a2996bc0e90108f286e767d187e8fcaf20f -
Trigger Event:
push
-
Statement type:
File details
Details for the file awslabs_document_loader_mcp_server-1.0.4-py3-none-any.whl.
File metadata
- Download URL: awslabs_document_loader_mcp_server-1.0.4-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dde7c438ada5e100f99011a135e6c1d754f013a21119b9ed7d6a281be244b699
|
|
| MD5 |
fec58f9cc12cdd09a02d42df656c1029
|
|
| BLAKE2b-256 |
06884fc7528c0441aa4bb22ce99ecf547bcd1dd561d569f723fc845a50ae94bd
|
Provenance
The following attestation bundles were made for awslabs_document_loader_mcp_server-1.0.4-py3-none-any.whl:
Publisher:
release.yml on awslabs/mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
awslabs_document_loader_mcp_server-1.0.4-py3-none-any.whl -
Subject digest:
dde7c438ada5e100f99011a135e6c1d754f013a21119b9ed7d6a281be244b699 - Sigstore transparency entry: 782602468
- Sigstore integration time:
-
Permalink:
awslabs/mcp@4c395a2996bc0e90108f286e767d187e8fcaf20f -
Branch / Tag:
refs/tags/2025.12.20251230231100 - Owner: https://github.com/awslabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4c395a2996bc0e90108f286e767d187e8fcaf20f -
Trigger Event:
push
-
Statement type: