VLM Image Understanding MCP Server - Support OpenAI protocol compatible VLMs

These details have not been verified by PyPI

Project links

Project description

VLM MCP Server

Why This Project?

When using Claude Code with third-party models, they are typically text-only models without image processing capabilities. Adding an MCP server with image processing capability is essential for tasks that require visual understanding.

This project enables users to select their own Vision-Language Model (VLM) for image processing.

Features

extract_text_from_image: Extract text from images (OCR)
ui_to_artifact: Convert UI screenshots to code, prompts, design specs, or descriptions
extract_text_from_screenshot: Extract text from screenshots with code recognition support
diagnose_error_screenshot: Analyze error screenshots and diagnose issues
understand_technical_diagram: Analyze technical diagrams (architecture, flowcharts, UML, etc.)
analyze_data_visualization: Analyze data visualization charts
ui_diff_check: UI comparison to detect visual differences
analyze_image: General-purpose image analysis

Environment Variables

Variable	Required	Description
`VLM_API_KEY`	Yes	API key
`VLM_BASE_URL`	No	Custom API endpoint (default: https://api.openai.com/v1)
`VLM_MODEL`	No	Model to use (default: gpt-4o)
`VLM_MAX_IMAGE_SIZE`	No	Maximum image size (default: 3MB). Images exceeding this size will be automatically compressed before processing. Supported formats: `3MB`, `3M`, `3145728` (bytes), `1024KB`, etc.

Quick Start

Using uvx (Recommended)

# Copy config template and fill in your API Key
cp .env.example .env
# Edit .env file and fill in VLM_API_KEY

# Run directly (will automatically load .env file)
uvx vlm-mcp

Using pip

# Install
pip install vlm-mcp

# Or install in development mode
pip install -e .

Configure Environment Variables

# OpenAI
export VLM_API_KEY=sk-xxx
export VLM_MODEL=gpt-4o

# Custom API (e.g., Ollama)
export VLM_API_KEY=your-api-key
export VLM_BASE_URL=http://localhost:11434/v1
export VLM_MODEL=qwen2.5-vl

Run the Server

# Run directly
python -m vlm_mcp

# Or use installed command
vlm-mcp

Supported Models

Any VLM model compatible with OpenAI Chat Completions API:

gpt-4o
gpt-4o-mini
gpt-4-turbo
qwen2.5-vl series
Other OpenAI API compatible models

Claude Code Configuration

1. Configure MCP Server

Add the following to your Claude Code configuration:

{
  "mcpServers": {
    "vlm-mcp": {
      "command": "uvx",
      "args": ["vlm-mcp"],
      "env": {
        "VLM_API_KEY": "your-api-key",
        "VLM_BASE_URL": "https://api.openai.com/v1",
        "VLM_MODEL": "gpt-4o",
        "VLM_MAX_IMAGE_SIZE": "5MB"
      }
    }
  }
}

2. Configure CLAUDE.md

To ensure Claude Code uses MCP tools for reading images instead of the built-in Read tool, add the following to your project or global CLAUDE.md:

## MCP Priority

1. Use mcp tools to read images instead of claude code's read tool.

Usage Examples

In Claude Code:

Please use extract_text_from_image tool to analyze this image /path/to/image.jpg and extract the text.

Please use ui_to_artifact tool to convert this UI screenshot to React code.

Inspired by the approach used by Zhipu AI.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Mar 11, 2026

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_mcp-0.2.0.tar.gz (11.6 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vlm_mcp-0.2.0-py3-none-any.whl (10.3 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file vlm_mcp-0.2.0.tar.gz.

File metadata

Download URL: vlm_mcp-0.2.0.tar.gz
Upload date: Mar 11, 2026
Size: 11.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vlm_mcp-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e93837601000503915fd9edd2055377bde964d244c356a8f118a7fe31f6c7e36`
MD5	`175215d470676bca0e390daf68f73cf0`
BLAKE2b-256	`fe9a3d25ca8c2f341c8d86d0912da44f8d2afa147843d63ab655326e8d2db4dc`

See more details on using hashes here.

File details

Details for the file vlm_mcp-0.2.0-py3-none-any.whl.

File metadata

Download URL: vlm_mcp-0.2.0-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 10.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vlm_mcp-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`38fb81324d2aa4ec297986621e4631d4fc890d111cabc29bfa49cd8ca13fa3b7`
MD5	`26b7182942c7016234fcffe8769ecd10`
BLAKE2b-256	`26d548578280c0dc1bad9470d1ec8040fe9e76fa9b7a85a1cbf39540ee027ae3`

See more details on using hashes here.

vlm-mcp 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VLM MCP Server

Why This Project?

Features

Environment Variables

Quick Start

Using uvx (Recommended)

Using pip

Configure Environment Variables

Run the Server

Supported Models

Claude Code Configuration

1. Configure MCP Server

2. Configure CLAUDE.md

Usage Examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes