Skip to main content

VLM Image Understanding MCP Server - Support OpenAI protocol compatible VLMs

Project description

VLM MCP Server

中文版本

Why This Project?

When using Claude Code with third-party models, they are typically text-only models without image processing capabilities. Adding an MCP server with image processing capability is essential for tasks that require visual understanding.

This project enables users to select their own Vision-Language Model (VLM) for image processing.

Features

  • extract_text_from_image: Extract text from images (OCR)
  • ui_to_artifact: Convert UI screenshots to code, prompts, design specs, or descriptions
  • extract_text_from_screenshot: Extract text from screenshots with code recognition support
  • diagnose_error_screenshot: Analyze error screenshots and diagnose issues
  • understand_technical_diagram: Analyze technical diagrams (architecture, flowcharts, UML, etc.)
  • analyze_data_visualization: Analyze data visualization charts
  • ui_diff_check: UI comparison to detect visual differences
  • analyze_image: General-purpose image analysis

Environment Variables

Variable Required Description
VLM_API_KEY Yes API key
VLM_BASE_URL No Custom API endpoint (default: https://api.openai.com/v1)
VLM_MODEL No Model to use (default: gpt-4o)
VLM_MAX_IMAGE_SIZE No Maximum image size (default: 3MB). Images exceeding this size will be automatically compressed before processing. Supported formats: 3MB, 3M, 3145728 (bytes), 1024KB, etc.

Quick Start

Using uvx (Recommended)

# Copy config template and fill in your API Key
cp .env.example .env
# Edit .env file and fill in VLM_API_KEY

# Run directly (will automatically load .env file)
uvx vlm-mcp

Using pip

# Install
pip install vlm-mcp

# Or install in development mode
pip install -e .

Configure Environment Variables

# OpenAI
export VLM_API_KEY=sk-xxx
export VLM_MODEL=gpt-4o

# Custom API (e.g., Ollama)
export VLM_API_KEY=your-api-key
export VLM_BASE_URL=http://localhost:11434/v1
export VLM_MODEL=qwen2.5-vl

Run the Server

# Run directly
python -m vlm_mcp

# Or use installed command
vlm-mcp

Supported Models

Any VLM model compatible with OpenAI Chat Completions API:

  • gpt-4o
  • gpt-4o-mini
  • gpt-4-turbo
  • qwen2.5-vl series
  • Other OpenAI API compatible models

Claude Code Configuration

1. Configure MCP Server

Add the following to your Claude Code configuration:

{
  "mcpServers": {
    "vlm-mcp": {
      "command": "uvx",
      "args": ["vlm-mcp"],
      "env": {
        "VLM_API_KEY": "your-api-key",
        "VLM_BASE_URL": "https://api.openai.com/v1",
        "VLM_MODEL": "gpt-4o",
        "VLM_MAX_IMAGE_SIZE": "5MB"
      }
    }
  }
}

2. Configure CLAUDE.md

To ensure Claude Code uses MCP tools for reading images instead of the built-in Read tool, add the following to your project or global CLAUDE.md:

## MCP Priority

1. Use mcp tools to read images instead of claude code's read tool.

Usage Examples

In Claude Code:

Please use extract_text_from_image tool to analyze this image /path/to/image.jpg and extract the text.
Please use ui_to_artifact tool to convert this UI screenshot to React code.

Inspired by the approach used by Zhipu AI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_mcp-0.2.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlm_mcp-0.2.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file vlm_mcp-0.2.0.tar.gz.

File metadata

  • Download URL: vlm_mcp-0.2.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vlm_mcp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e93837601000503915fd9edd2055377bde964d244c356a8f118a7fe31f6c7e36
MD5 175215d470676bca0e390daf68f73cf0
BLAKE2b-256 fe9a3d25ca8c2f341c8d86d0912da44f8d2afa147843d63ab655326e8d2db4dc

See more details on using hashes here.

File details

Details for the file vlm_mcp-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vlm_mcp-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vlm_mcp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38fb81324d2aa4ec297986621e4631d4fc890d111cabc29bfa49cd8ca13fa3b7
MD5 26b7182942c7016234fcffe8769ecd10
BLAKE2b-256 26d548578280c0dc1bad9470d1ec8040fe9e76fa9b7a85a1cbf39540ee027ae3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page