Skip to main content

An MCP server for Upstage document parsing and information extraction

Project description

Upstage MCP Server

A Model Context Protocol (MCP) server for Upstage AI's document digitization and information extraction capabilities

📋 Overview

The Upstage MCP Server provides a bridge between AI assistants and Upstage AI's powerful document processing APIs. This server enables AI models like Claude to seamlessly extract and structure content from various document types including PDFs, images, and Office files.

✨ Key Features

  • Document Digitization: Extract structured content from documents while preserving layout.
  • Information Extraction: Extract specific data points based on intelligent schemas.
  • Multi-format Support: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX.
  • Claude Desktop Integration: Seamless integration with Claude and other MCP clients.

🔑 Prerequisites

Before using this server, you'll need:

  1. Upstage API Key: Obtain your API key from Upstage API
  2. Python 3.10+: The server requires Python 3.10 or higher.
  3. uv package manager: For dependency management and installation.

🚀 Local/Dev Setup Instructions

Step 1: Clone the Repository

# Clone the repository
git clone https://github.com/PritamPatil2603/upstage-mcp-server.git

# Navigate to the project directory
cd upstage-mcp-server

Step 2: Set Up Python Environment

# Install uv if not already installed
pip install uv

# Create and activate a virtual environment
uv venv

# Activate the virtual environment
# On Windows, run:
# .venv\Scripts\activate
# On macOS/Linux, run:
source .venv/bin/activate

# Install dependencies in editable mode
uv pip install -e .

Step 3: Configure Claude Desktop

  1. Download Claude Desktop:

  2. Open Claude Desktop:

    • Navigate to Claude → Settings → Developer → Edit Config
  3. Edit claude_desktop_config.json:

    Add the following configuration:

    For Windows:

    {
      "mcpServers": {
        "upstage-mcp-server": {
          "command": "uv",
          "args": [
            "run",
            "--directory",
            "C:\\path\\to\\cloned\\upstage-mcp-server",
            "python",
            "-m",
            "upstage_mcp.server"
          ],
          "env": {
            "UPSTAGE_API_KEY": "your_api_key_here"
          }
        }
      }
    }
    

Replace the C:\\path\\to\\cloned\\upstage-mcp-server with the actual repository path on your system.

For macOS/Linux:

{
  "mcpServers": {
    "upstage-mcp-server": {
      "command": "/Users/username/.local/bin/uv",
      "args": [
        "run",
        "--directory",
        "/path/to/cloned/upstage-mcp-server",
        "python",
        "-m",
        "upstage_mcp.server"
      ],
      "env": {
        "UPSTAGE_API_KEY": "your_api_key_here"
      }
    }
  }
}

Replace the following:

  • /Users/username/.local/bin/uv with the full path to your uv executable (find it using which uv)
  • /path/to/cloned/upstage-mcp-server with the absolute path to your repository

Tip for macOS/Linux users: If you're experiencing connection issues, using the full path to the uv executable is often more reliable than just uv. Find the path using which uv in your terminal.

  1. Once above steps are completed, please restart Claude Desktop

🛠️ Available Tools

The server exposes two main tools for AI models:

  1. Document Parsing (parse_document):

    • Description: Processes documents and extracts their content with structure preservation.
    • Parameters:
      • file_path: Path to the document file to be processed.
    • Example Query to Claude:

      Can you parse this document located at "C:\Users\username\Documents\contract.pdf" and summarize its contents?

  2. Information Extraction (extract_information):

    • Description: Extracts structured information from documents according to schemas.
    • Parameters:
      • file_path: Path to the document file to process.
      • schema_path (optional): Path to a JSON file containing the extraction schema.
      • auto_generate_schema (default: true): Whether to automatically generate a schema.
    • Example Query to Claude:

      Extract the invoice number, date, and total amount from this document at "C:\Users\username\Documents\invoice.pdf".

📂 Output Files

The server saves processing results in these locations:

  • Document Parsing Results: upstage_mcp/outputs/document_parsing/
  • Information Extraction Results: upstage_mcp/outputs/information_extraction/
  • Generated Schemas: upstage_mcp/outputs/information_extraction/schemas/

🔧 Troubleshooting

Common Issues

  • API Key Not Found:
    Ensure your Upstage API key is correctly set in environment variables or the .env file.

  • File Not Found:
    Verify that the file path is correct and accessible to the server.

  • Server Not Starting:
    Check if you've activated the virtual environment and installed all dependencies.

Checking Logs

Claude Desktop logs can be found at:

  • Windows: %APPDATA%\Claude\logs\mcp-server-upstage-mcp-server.log
  • macOS: ~/Library/Logs/Claude/mcp-server-upstage-mcp-server.log

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request to enhance the project or add new features.

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upstage_mcp_server-0.1.0.tar.gz (5.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

upstage_mcp_server-0.1.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file upstage_mcp_server-0.1.0.tar.gz.

File metadata

  • Download URL: upstage_mcp_server-0.1.0.tar.gz
  • Upload date:
  • Size: 5.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for upstage_mcp_server-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e80a76e5238ea3e38767f373578f14d343b053a7c7bf09e9bc5947b5aa846a9d
MD5 c0fa0272d6b39026b47a0135b1c74c06
BLAKE2b-256 bfa7e73bcbe52836f1ef2dc12961ea5ed98d5b8b2c40b3ce94258be1c82a81da

See more details on using hashes here.

File details

Details for the file upstage_mcp_server-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for upstage_mcp_server-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c5001be3c7b3aaa9cbd77fa8475b299c868033745b1de8b96bcc592799b4be00
MD5 28a727fa9ffce0bf32379f0c71f1ffd3
BLAKE2b-256 6904d5c41e6cc8f73257b2ef8af9ddeb74720a95b4b77b55177996a165ccdc5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page