VLM Image Understanding MCP Server - Support OpenAI protocol compatible VLMs
Project description
VLM MCP Server
Why This Project?
When using Claude Code with third-party models, they are typically text-only models without image processing capabilities. Adding an MCP server with image processing capability is essential for tasks that require visual understanding.
This project enables users to select their own Vision-Language Model (VLM) for image processing.
Features
- extract_text_from_image: Extract text from images (OCR)
- ui_to_artifact: Convert UI screenshots to code, prompts, design specs, or descriptions
- extract_text_from_screenshot: Extract text from screenshots with code recognition support
- diagnose_error_screenshot: Analyze error screenshots and diagnose issues
- understand_technical_diagram: Analyze technical diagrams (architecture, flowcharts, UML, etc.)
- analyze_data_visualization: Analyze data visualization charts
- ui_diff_check: UI comparison to detect visual differences
- analyze_image: General-purpose image analysis
Environment Variables
| Variable | Required | Description |
|---|---|---|
VLM_API_KEY |
Yes | API key |
VLM_BASE_URL |
No | Custom API endpoint (default: https://api.openai.com/v1) |
VLM_MODEL |
No | Model to use (default: gpt-4o) |
VLM_MAX_IMAGE_SIZE |
No | Maximum image size (default: 3MB). Images exceeding this size will be automatically compressed before processing. Supported formats: 3MB, 3M, 3145728 (bytes), 1024KB, etc. |
Quick Start
Using uvx (Recommended)
# Copy config template and fill in your API Key
cp .env.example .env
# Edit .env file and fill in VLM_API_KEY
# Run directly (will automatically load .env file)
uvx vlm-mcp
Using pip
# Install
pip install vlm-mcp
# Or install in development mode
pip install -e .
Configure Environment Variables
# OpenAI
export VLM_API_KEY=sk-xxx
export VLM_MODEL=gpt-4o
# Custom API (e.g., Ollama)
export VLM_API_KEY=your-api-key
export VLM_BASE_URL=http://localhost:11434/v1
export VLM_MODEL=qwen2.5-vl
Run the Server
# Run directly
python -m vlm_mcp
# Or use installed command
vlm-mcp
Supported Models
Any VLM model compatible with OpenAI Chat Completions API:
- gpt-4o
- gpt-4o-mini
- gpt-4-turbo
- qwen2.5-vl series
- Other OpenAI API compatible models
Claude Code Configuration
1. Configure MCP Server
Add the following to your Claude Code configuration:
{
"mcpServers": {
"vlm-mcp": {
"command": "uvx",
"args": ["vlm-mcp"],
"env": {
"VLM_API_KEY": "your-api-key",
"VLM_BASE_URL": "https://api.openai.com/v1",
"VLM_MODEL": "gpt-4o",
"VLM_MAX_IMAGE_SIZE": "5MB"
}
}
}
}
2. Configure CLAUDE.md
To ensure Claude Code uses MCP tools for reading images instead of the built-in Read tool, add the following to your project or global CLAUDE.md:
## MCP Priority
1. Use mcp tools to read images instead of claude code's read tool.
Usage Examples
In Claude Code:
Please use extract_text_from_image tool to analyze this image /path/to/image.jpg and extract the text.
Please use ui_to_artifact tool to convert this UI screenshot to React code.
Inspired by the approach used by Zhipu AI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vlm_mcp-0.2.0.tar.gz.
File metadata
- Download URL: vlm_mcp-0.2.0.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e93837601000503915fd9edd2055377bde964d244c356a8f118a7fe31f6c7e36
|
|
| MD5 |
175215d470676bca0e390daf68f73cf0
|
|
| BLAKE2b-256 |
fe9a3d25ca8c2f341c8d86d0912da44f8d2afa147843d63ab655326e8d2db4dc
|
File details
Details for the file vlm_mcp-0.2.0-py3-none-any.whl.
File metadata
- Download URL: vlm_mcp-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38fb81324d2aa4ec297986621e4631d4fc890d111cabc29bfa49cd8ca13fa3b7
|
|
| MD5 |
26b7182942c7016234fcffe8769ecd10
|
|
| BLAKE2b-256 |
26d548578280c0dc1bad9470d1ec8040fe9e76fa9b7a85a1cbf39540ee027ae3
|