Model-agnostic image analysis via MCP for OpenCode text-only models. Vision proxy server using Google Gemini + tesseract.
Project description
opencode-vision ๐๏ธ
Model-agnostic image analysis via MCP for OpenCode.
Give vision capabilities to any text-only model โ big-pickle, DeepSeek, MiMo, MiniMax, or any other model that can't process images natively.
pip install opencode-vision
The Problem
OpenCode supports many models, but most open-weight and free models are
text-only. When you paste an image or try to read() one, you get:
ERROR: Cannot read image (this model does not support image input).
This is not a configuration issue โ it's a fundamental limitation of the model architecture. Text-only models have no visual neurons.
The Solution
opencode-vision is an MCP server that acts as a "guide dog" for text-only
models. It runs as an independent process and handles image analysis via:
- Google Gemini Vision API (FREE tier, 1,500 requests/day)
- Local tesseract OCR (fast, private, works offline)
It returns plain text descriptions that any model can understand โ no vision capabilities needed on the model's side.
โโโโโโโโโโโโโโโโโโโโโโโ
โ opencode-vision โ
[big-pickle] โโโโโบโ MCP Server โโโโโโบ Google Gemini API
[DeepSeek] โโโโโบโ (Python process) โโโโโโบ tesseract OCR (local)
[MiMo] โโโโโบโ Returns TEXT only โ
โโโโโโโโโโโโโโโโโโโโโโโ
Quick Start
1. Install
pip install opencode-vision
2. Get a Gemini API key
Get a free key at aistudio.google.com (1,500 requests/day, no credit card required).
Set it in ~/.config/opencode/.env:
echo 'GOOGLE_API_KEY=your_key_here' >> ~/.config/opencode/.env
Or export it directly:
export GOOGLE_API_KEY=your_key_here
3. Add to OpenCode config
Add this to ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"vision": {
"type": "local",
"command": ["python3", "-m", "opencode_vision.server"],
"enabled": true,
"timeout": 30000
}
}
}
4. Restart OpenCode
Start a new session. The vision_describe, vision_ocr, and vision_analyze
tools will be available to all models โ even text-only ones.
5. Ask about images
User: What's in this image?
Model: [calls vision_describe("/path/to/image.png")]
"A dark gradient banner with 'Nicolรกs Rรญos Herrera'..."
Tools
| Tool | Description | When to use |
|---|---|---|
vision_describe(path, prompt?) |
Describe an image in detail | "What does this show?" |
vision_ocr(path) |
Extract all visible text | "What text is in this screenshot?" |
vision_analyze(path) |
Metadata + description + OCR | Comprehensive understanding |
Requirements
| Component | Required? | Notes |
|---|---|---|
| Python >= 3.10 | โ Required | |
GOOGLE_API_KEY |
โ Required | Get free at aistudio.google.com |
pillow |
๐ฆ Recommended | pip install pillow for metadata + auto-resize |
tesseract-ocr |
๐ง Recommended | For local OCR. apt install tesseract-ocr or brew install tesseract |
The server auto-detects the API key from (in order):
GOOGLE_API_KEYenvironment variableGOOGLE_GENERATIVE_AI_API_KEYenvironment variable~/.config/opencode/.envfile~/.envfile$PWD/.envfile
CLI Usage (without OpenCode)
# Start MCP server (for OpenCode integration)
opencode-vision
# Direct analysis
opencode-vision describe ~/screenshot.png
opencode-vision ocr ~/scanned-document.png
opencode-vision analyze ~/photo.jpg
# Custom prompt
opencode-vision describe ~/chart.png "What are the values in this chart?"
Architecture
Why Python?
All existing MCP vision servers for OpenCode are Node.js/TypeScript and
require npm install or npx. opencode-vision is pure Python because:
- Python is already installed on every developer machine
pillow(PIL) is the standard image processing library- tesseract has first-class Python bindings
- Zero
node_modules, zeronpm, zeronpx - The MCP protocol is simple JSON-RPC over stdio โ no framework needed
Hybrid OCR Strategy
โโโโโโโโโโโโโโโโ
User image โโโโโโโบโ tesseract โโโโโบ Text found? โโโบ Return
โ (local, โ
โ private, โ
โ offline) โ
โโโโโโโโฌโโโโโโโโ
โ No text / failed
โผ
โโโโโโโโโโโโโโโโ
โ Gemini โโโโโบ Return
โ Vision API โ
โ (FREE tier) โ
โโโโโโโโโโโโโโโโ
Cost: $0
- Gemini 2.5 Flash: 1,500 free requests/day via Google AI Studio API key
- tesseract: free and local for OCR
- Pillow: free and local for metadata
- No OpenCode Go credits consumed โ the API call happens in the vision server, not through OpenCode's model proxy
Comparison with Alternatives
| Feature | opencode-vision | opencode-minimax-easy-vision | qwen-vision-mcp | opencode-image-proxy |
|---|---|---|---|---|
| Runtime | Python (stdlib) | Node.js + npm | Node.js + npm | Node.js |
| Dependencies | pip install opencode-vision |
npm install + MiniMax API key |
npx + Ollama (6GB RAM) |
npx |
| OCR | Local tesseract + Gemini fallback | None (API only) | None (API only) | None (API only) |
| API cost | $0 (Gemini FREE tier) | MiniMax pricing | $0 (local) or Ollama Cloud | OpenCode credits |
| Auto .env | โ Reads ~/.config/opencode/.env | โ Manual env vars | โ Manual env vars | โ Manual config |
| Image resize | โ Pillow auto-resize | โ | โ | โ |
| Install size | ~200 KB (pure Python) | ~30 MB (node_modules) | ~30 MB + ~6GB (Ollama model) | ~30 MB |
Why "Model-Agnostic"?
The key architectural insight: the model never needs to see pixels. The MCP server does all the visual processing externally and returns text. This means:
- Works with any text-only model (big-pickle, DeepSeek, MiMo, MiniMax, GLM, etc.)
- Works with any multimodal model too (it doesn't interfere)
- No model-specific configuration
- No provider-specific setup
- The model can be changed at any time without reconfigureing vision
License
MIT
Built with โค๏ธ by Nicolรกs Rรญos Herrera for the OpenCode community.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opencode_vision-1.0.0.tar.gz.
File metadata
- Download URL: opencode_vision-1.0.0.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ce0e49333e7e4e0356a8fe192a30b129547ee7c14b2cff3144dfe91c68dcef3
|
|
| MD5 |
423f379e0846c13430b7c560c36a0227
|
|
| BLAKE2b-256 |
3ed93a8caf9152e1a8ecf05f9b43767d095af0418a1483ce92f90dcc50317a05
|
File details
Details for the file opencode_vision-1.0.0-py3-none-any.whl.
File metadata
- Download URL: opencode_vision-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a45f20dbd9918335b579dc19efebfb85a59efa50fcc2edfd11bbea2f96ef67b9
|
|
| MD5 |
6a715cddabd3698324c93c44216bba6b
|
|
| BLAKE2b-256 |
bde4fe0dda104af80f71b614b9b20bfcfdc9fdd7e3743a7725f9255c62ad6c23
|