Universal vision tools for AI agents via Model Context Protocol
Project description
agent-vision-mcp
Give MCP-compatible AI agents image analysis, metadata inspection, cropping, OCR, and image comparison through any OpenAI-compatible vision model.
Features
- Analyze screenshots, charts, documents, UI, objects, and general images.
- Inspect image dimensions and metadata without calling a model.
- Crop and zoom into regions using normalized coordinates.
- Extract visible text with a VLM or an optional dedicated OCR model.
- Compare two to four images.
- Accept public URLs, local files, data URLs, and Base64 images.
- Run locally over the standard MCP stdio transport.
Claude Code
Requirements
- Python 3.10 or newer
uv- An OpenAI-compatible vision API endpoint and API key
uvx downloads the published package from PyPI into an isolated environment
and runs it. It does not use the source code in your current directory and
does not permanently install the package into your system Python.
Add To Claude Code
The command below configures Claude Code to start agent-vision-mcp from PyPI:
claude mcp add --scope user agent-vision \
--env UV_DEFAULT_INDEX=https://pypi.org/simple \
VISION_API_KEY="your-api-key" \
VISION_BASE_URL="https://your-provider.example/v1" \
VISION_MODEL_ID="your-vision-model" \
-- uvx agent-vision-mcp
Use UV_DEFAULT_INDEX=https://pypi.org/simple when your local PyPI mirror has
not synchronized the latest release.
Verify the connection:
claude mcp get agent-vision
claude mcp list
Then start Claude Code and ask:
Use vision_capabilities to show the available vision tools.
Analyze a local image:
Use vision_inspect on /data/example.png, then use vision_analyze to describe it.
By default, local image access is limited to /data and /tmp. Add another
directory with:
claude mcp remove --scope user agent-vision
claude mcp add --scope user agent-vision \
--env UV_DEFAULT_INDEX=https://pypi.org/simple \
VISION_API_KEY="your-api-key" \
VISION_BASE_URL="https://your-provider.example/v1" \
VISION_MODEL_ID="your-vision-model" \
VISION_ALLOWED_PATHS="/data,/tmp,/home/your-user/Pictures" \
-- uvx agent-vision-mcp
Dedicated OCR Model
Without dedicated OCR configuration, vision_extract_text uses the configured
vision model. To use a separate OCR model:
claude mcp add --scope user agent-vision \
--env UV_DEFAULT_INDEX=https://pypi.org/simple \
VISION_API_KEY="your-vision-api-key" \
VISION_BASE_URL="https://your-provider.example/v1" \
VISION_MODEL_ID="your-vision-model" \
OCR_ENABLED=true \
OCR_API_KEY="your-ocr-api-key" \
OCR_BASE_URL="https://your-provider.example/v1" \
OCR_MODEL_ID="your-ocr-model" \
-- uvx agent-vision-mcp
Never commit real API keys to Git.
Other MCP Clients
Use this stdio configuration with MCP clients that accept JSON configuration:
{
"mcpServers": {
"agent-vision": {
"command": "uvx",
"args": ["agent-vision-mcp"],
"env": {
"UV_DEFAULT_INDEX": "https://pypi.org/simple",
"VISION_API_KEY": "your-api-key",
"VISION_BASE_URL": "https://your-provider.example/v1",
"VISION_MODEL_ID": "your-vision-model"
}
}
}
}
Tools
| Tool | Purpose |
|---|---|
vision_analyze |
Analyze an image with task-specific prompts |
vision_inspect |
Read image dimensions, format, size, and mode |
vision_crop_analyze |
Crop and analyze a normalized image region |
vision_extract_text |
Extract visible text using OCR or the VLM |
vision_compare |
Compare two to four images |
vision_capabilities |
Show server configuration and limits |
URL Handling
VISION_URL_MODE controls remote-image handling:
autopasses URLs through for analysis and comparison, but downloads them when inspection, cropping, or OCR requires image bytes.passthroughprefers URL passthrough, except for tools that require bytes.downloadalways downloads and verifies remote images before model calls.
Downloads are streamed with byte limits, redirects are security checked, and downloaded or encoded inputs are verified as supported images.
Troubleshooting
If Claude Code cannot find the PyPI package:
UV_DEFAULT_INDEX=https://pypi.org/simple uvx --refresh agent-vision-mcp
If the MCP server does not connect:
claude mcp get agent-vision
uvx agent-vision-mcp
If you change the Claude Code configuration:
claude mcp remove --scope user agent-vision
Then add it again with the updated values.
Development
git clone https://github.com/idealizing/agent-vision-mcp.git
cd agent-vision-mcp
python -m venv .venv
.venv/bin/pip install -e ".[dev]"
cp .env.example .env
.venv/bin/python -m unittest discover -s tests -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_vision_mcp-0.0.2.tar.gz.
File metadata
- Download URL: agent_vision_mcp-0.0.2.tar.gz
- Upload date:
- Size: 23.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
357909f9a7622ec3c78a1430283595a2266ec0118fe9345cb43aced4ab716e5f
|
|
| MD5 |
a1882de4dd4c340f5eef0844b8f49c6d
|
|
| BLAKE2b-256 |
9e8fcdf74732a3dd65863e6c5c7d0b1734f04e98f9d86ca35972f7a955f2e9e0
|
Provenance
The following attestation bundles were made for agent_vision_mcp-0.0.2.tar.gz:
Publisher:
publish.yml on idealizing/agent-vision-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_vision_mcp-0.0.2.tar.gz -
Subject digest:
357909f9a7622ec3c78a1430283595a2266ec0118fe9345cb43aced4ab716e5f - Sigstore transparency entry: 1768122321
- Sigstore integration time:
-
Permalink:
idealizing/agent-vision-mcp@ad6966a88b188f98a924fe956ef68f517d7f50a4 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/idealizing
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ad6966a88b188f98a924fe956ef68f517d7f50a4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file agent_vision_mcp-0.0.2-py3-none-any.whl.
File metadata
- Download URL: agent_vision_mcp-0.0.2-py3-none-any.whl
- Upload date:
- Size: 23.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b11f6798414e07a1762ce67238de222d018c7b1dded01e346e95f29becbd0a2
|
|
| MD5 |
0892d3072cac6abc2824bd1a58639e0b
|
|
| BLAKE2b-256 |
00204d64c9224b2b8a31e9aeb68e092326d315346b04f5b727334c8682839748
|
Provenance
The following attestation bundles were made for agent_vision_mcp-0.0.2-py3-none-any.whl:
Publisher:
publish.yml on idealizing/agent-vision-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_vision_mcp-0.0.2-py3-none-any.whl -
Subject digest:
9b11f6798414e07a1762ce67238de222d018c7b1dded01e346e95f29becbd0a2 - Sigstore transparency entry: 1768122736
- Sigstore integration time:
-
Permalink:
idealizing/agent-vision-mcp@ad6966a88b188f98a924fe956ef68f517d7f50a4 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/idealizing
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ad6966a88b188f98a924fe956ef68f517d7f50a4 -
Trigger Event:
push
-
Statement type: