Skip to main content

Allow VLMs to call dedicated specialist CV models

Project description

License: MIT Read More

mcp-vision by

A Model Context Protocol (MCP) server exposing HuggingFace computer vision models such as zero-shot object detection as tools, enhancing the vision capabilities of large language or vision-language models.

This repo is in active development. See below for details of currently available tools.

Installation

Clone the repo:

git clone git@github.com:groundlight/mcp-vision.git

Build a local docker image:

cd mcp-vision
make build-docker

Configuring Claude Desktop

Add this to your claude_desktop_config.json:

If your local environment has access to a NVIDIA GPU:

"mcpServers": {
  "mcp-vision": {
    "command": "docker",
    "args": ["run", "-i", "--rm", "--runtime=nvidia", "--gpus", "all", "mcp-vision"],
	"env": {}
  }
}

Or, CPU only:

"mcpServers": {
  "mcp-vision": {
    "command": "docker",
    "args": ["run", "-i", "--rm", "mcp-vision"],
	"env": {}
  }
}

When running on CPU, the default large-size object detection model make take a long time to laod and run inference. Consider using a smaller model as DEFAULT_OBJDET_MODEL (you can tell Claude directly to use a specific model too).

(Beta) It is possible to run the public docker image directly without building locally, however the download time may interfere with Claude's loading of the server.

"mcpServers": {
  "mcp-vision": {
    "command": "docker",
    "args": ["run", "-i", "--rm", "--runtime=nvidia", "--gpus", "all", "groundlight/mcp-vision:latest"],
	"env": {}
  }
}

Tools

The following tools are currently available through the mcp-vision server:

  1. locate_objects
  • Description: Detect and locate objects in an image using one of the zero-shot object detection pipelines available through HuggingFace (list for reference [https://huggingface.co/models?pipeline_tag=zero-shot-object-detection&sort=trending]).
  • Input: image_path (string) URL or file path, candidate_labels (list of strings) list of possible objects to detect, hf_model (optional string), will use "google/owlvit-large-patch14" by default, which could be slow on a non-GPU machine
  • Returns: List of dicts in HF object-detection format
  1. zoom_to_object
  • Description: Zoom into an object in the image, allowing you to analyze it more closely. Crop image to the object bounding box and return the cropped image. If many objects are present in the image, will return the 'best' one as represented by object score.
  • Input: image_path (string) URL or file path, label (string) object label to find and zoom and crop to, hf_model (optional), will use "google/owlvit-large-patch14" by default, which could be slow on a non-GPU machine
  • Returns: MCPImage or None

Example in blog post and video

Run Claude Desktop with Claude Sonnet 3.7 and mcp-vision configured as an MCP server in claude_desktop_config.json.

The prompt used in the example video and blog post was:

From the information on that advertising board, what is the type of this shop?
Options:
The shop is a yoga studio.
The shop is a cafe.
The shop is a seven-eleven.
The shop is a milk tea shop.

The image is the first image in the V*Bench/GPT4V-hard dataset and can be found here: https://huggingface.co/datasets/craigwu/vstar_bench/blob/main/GPT4V-hard/0.JPG (use the download link).

Note:

  • If you upload the image directly into the conversation with Claude instead of providing a download link, it will not be able to call the tools and will attempt to answer directly.
  • On accounts that have web search enabled, Claude will prefer to use web search over local MCP tools AFAIK. Disable web search for best results.

Development

Run locally using the uv package manager:

uv install
uv run python mcp_vision

Build the Docker image locally:

make build-docker

Run the Docker image locally:

make run-docker-cpu

or

make run-docker-gpu

[Groundlight Internal] Push the Docker image to Docker Hub (requires DockerHub credentials):

make push-docker

Troubleshooting

If Claude Desktop is failing to connect to mcp-vision:

  • Check the configuration is correct (CPU vs GPU)
  • Developer options may need to be enabled in Claude Desktop
  • Depending on the size of the model(s) used, give it a few minutes to download them from HuggingFace on first opening Claude Desktop. Once downloaded, the server will respond and Claude will connect.

On accounts that have web search enabled, Claude will prefer to use web search over local MCP tools AFAIK. Disable web search for best results.

TODO

  • Host best models online instead of requiring local download
  • Add more tools

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iflow_mcp_groundlight_mcp_vision-0.1.2.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file iflow_mcp_groundlight_mcp_vision-0.1.2.tar.gz.

File metadata

  • Download URL: iflow_mcp_groundlight_mcp_vision-0.1.2.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_groundlight_mcp_vision-0.1.2.tar.gz
Algorithm Hash digest
SHA256 0b8e7100c9d8d829aca96b2fdc391728c207b51f7e54c755ad4b449f8dc0ef40
MD5 f2fce86483315c35554aa6440475828a
BLAKE2b-256 e1db615b98b4a6c55f16247a3bfb118b09cf240e3dc4c018180840eee577a5c4

See more details on using hashes here.

File details

Details for the file iflow_mcp_groundlight_mcp_vision-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: iflow_mcp_groundlight_mcp_vision-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_groundlight_mcp_vision-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4d4ea476ce5f1fc6fb51a224806779f17f1f77b562f50eebb129daf9a3590d19
MD5 0947681c483afe371b047cb15c27ba39
BLAKE2b-256 306953fba70ddfe051b6dd7919f449ae3940bb69f00fc14cc3de7e2fe7e028d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page