Model-agnostic image analysis via MCP for OpenCode text-only models. Vision proxy server using Google Gemini + tesseract.

These details have not been verified by PyPI

Project links

Project description

opencode-vision 👁️

Model-agnostic image analysis via MCP for OpenCode.

Give vision capabilities to any text-only model — big-pickle, DeepSeek, MiMo, MiniMax, or any other model that can't process images natively.

pip install opencode-vision

The Problem

OpenCode supports many models, but most open-weight and free models are text-only. When you paste an image or try to read() one, you get:

ERROR: Cannot read image (this model does not support image input).

This is not a configuration issue — it's a fundamental limitation of the model architecture. Text-only models have no visual neurons.

The Solution

opencode-vision is an MCP server that acts as a "guide dog" for text-only models. It runs as an independent process and handles image analysis via:

Google Gemini Vision API (FREE tier, 1,500 requests/day)
Local tesseract OCR (fast, private, works offline)

It returns plain text descriptions that any model can understand — no vision capabilities needed on the model's side.

                    ┌─────────────────────┐
                    │  opencode-vision     │
  [big-pickle] ────►│  MCP Server          │────► Google Gemini API
  [DeepSeek]   ────►│  (Python process)    │────► tesseract OCR (local)
  [MiMo]       ────►│  Returns TEXT only   │
                    └─────────────────────┘

Quick Start

1. Install

pip install opencode-vision

2. Get a Gemini API key

Get a free key at aistudio.google.com (1,500 requests/day, no credit card required).

Set it in ~/.config/opencode/.env:

echo 'GOOGLE_API_KEY=your_key_here' >> ~/.config/opencode/.env

Or export it directly:

export GOOGLE_API_KEY=your_key_here

3. Add to OpenCode config

Add this to ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "vision": {
      "type": "local",
      "command": ["python3", "-m", "opencode_vision.server"],
      "enabled": true,
      "timeout": 30000
    }
  }
}

4. Restart OpenCode

Start a new session. The vision_describe, vision_ocr, and vision_analyze tools will be available to all models — even text-only ones.

5. Ask about images

User: What's in this image?
Model: [calls vision_describe("/path/to/image.png")]
       "A dark gradient banner with 'Nicolás Ríos Herrera'..."

Tools

Tool	Description	When to use
`vision_describe(path, prompt?)`	Describe an image in detail	"What does this show?"
`vision_ocr(path)`	Extract all visible text	"What text is in this screenshot?"
`vision_analyze(path)`	Metadata + description + OCR	Comprehensive understanding

Requirements

Component	Required?	Notes
Python >= 3.10	✅ Required
`GOOGLE_API_KEY`	✅ Required	Get free at aistudio.google.com
`pillow`	📦 Recommended	`pip install pillow` for metadata + auto-resize
`tesseract-ocr`	🔧 Recommended	For local OCR. `apt install tesseract-ocr` or `brew install tesseract`

The server auto-detects the API key from (in order):

GOOGLE_API_KEY environment variable
GOOGLE_GENERATIVE_AI_API_KEY environment variable
~/.config/opencode/.env file
~/.env file
$PWD/.env file

CLI Usage (without OpenCode)

# Start MCP server (for OpenCode integration)
opencode-vision

# Direct analysis
opencode-vision describe ~/screenshot.png
opencode-vision ocr ~/scanned-document.png
opencode-vision analyze ~/photo.jpg

# Custom prompt
opencode-vision describe ~/chart.png "What are the values in this chart?"

Architecture

Why Python?

All existing MCP vision servers for OpenCode are Node.js/TypeScript and require npm install or npx. opencode-vision is pure Python because:

Python is already installed on every developer machine
pillow (PIL) is the standard image processing library
tesseract has first-class Python bindings
Zero node_modules, zero npm, zero npx
The MCP protocol is simple JSON-RPC over stdio — no framework needed

Hybrid OCR Strategy

                    ┌──────────────┐
  User image ──────►│ tesseract    │───► Text found? ──► Return
                    │ (local,      │
                    │  private,    │
                    │  offline)    │
                    └──────┬───────┘
                           │ No text / failed
                           ▼
                    ┌──────────────┐
                    │ Gemini       │───► Return
                    │ Vision API   │
                    │ (FREE tier)  │
                    └──────────────┘

Cost: $0

Gemini 2.5 Flash: 1,500 free requests/day via Google AI Studio API key
tesseract: free and local for OCR
Pillow: free and local for metadata
No OpenCode Go credits consumed — the API call happens in the vision server, not through OpenCode's model proxy

Comparison with Alternatives

Feature	opencode-vision	opencode-minimax-easy-vision	qwen-vision-mcp	opencode-image-proxy
Runtime	Python (stdlib)	Node.js + npm	Node.js + npm	Node.js
Dependencies	`pip install opencode-vision`	`npm install` + MiniMax API key	`npx` + Ollama (6GB RAM)	`npx`
OCR	Local tesseract + Gemini fallback	None (API only)	None (API only)	None (API only)
API cost	$0 (Gemini FREE tier)	MiniMax pricing	$0 (local) or Ollama Cloud	OpenCode credits
Auto .env	✓ Reads ~/.config/opencode/.env	✗ Manual env vars	✗ Manual env vars	✗ Manual config
Image resize	✓ Pillow auto-resize	✗	✗	✗
Install size	~200 KB (pure Python)	~30 MB (node_modules)	~30 MB + ~6GB (Ollama model)	~30 MB

Why "Model-Agnostic"?

The key architectural insight: the model never needs to see pixels. The MCP server does all the visual processing externally and returns text. This means:

Works with any text-only model (big-pickle, DeepSeek, MiMo, MiniMax, GLM, etc.)
Works with any multimodal model too (it doesn't interfere)
No model-specific configuration
No provider-specific setup
The model can be changed at any time without reconfigureing vision

License

MIT

Built with ❤️ by Nicolás Ríos Herrera for the OpenCode community.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.1.0

May 15, 2026

2.0.0

May 15, 2026

This version

1.0.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opencode_vision-1.0.0.tar.gz (14.7 kB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

opencode_vision-1.0.0-py3-none-any.whl (13.4 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file opencode_vision-1.0.0.tar.gz.

File metadata

Download URL: opencode_vision-1.0.0.tar.gz
Upload date: May 15, 2026
Size: 14.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for opencode_vision-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`8ce0e49333e7e4e0356a8fe192a30b129547ee7c14b2cff3144dfe91c68dcef3`
MD5	`423f379e0846c13430b7c560c36a0227`
BLAKE2b-256	`3ed93a8caf9152e1a8ecf05f9b43767d095af0418a1483ce92f90dcc50317a05`

See more details on using hashes here.

File details

Details for the file opencode_vision-1.0.0-py3-none-any.whl.

File metadata

Download URL: opencode_vision-1.0.0-py3-none-any.whl
Upload date: May 15, 2026
Size: 13.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for opencode_vision-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a45f20dbd9918335b579dc19efebfb85a59efa50fcc2edfd11bbea2f96ef67b9`
MD5	`6a715cddabd3698324c93c44216bba6b`
BLAKE2b-256	`bde4fe0dda104af80f71b614b9b20bfcfdc9fdd7e3743a7725f9255c62ad6c23`

See more details on using hashes here.

opencode-vision 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

opencode-vision 👁️

The Problem

The Solution

Quick Start

1. Install

2. Get a Gemini API key

3. Add to OpenCode config

4. Restart OpenCode

5. Ask about images

Tools

Requirements

CLI Usage (without OpenCode)

Architecture

Why Python?

Hybrid OCR Strategy

Cost: $0

Comparison with Alternatives

Why "Model-Agnostic"?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes