Screen-context sidecar for coding agents
Project description
Eye2byte
Your AI coding agent can read every file in your repo.
It just can't see what's on your screen.
Screen-context sidecar for AI coding agents. Captures your screen, voice, and annotations, feeds them to any vision model, and produces structured Context Packs your coding agent can act on — via MCP.
Screen / Voice / Annotations ──> Vision Model + Whisper ──> Context Pack ──> Coding Agent
(Ollama, Gemini, (goal, errors, (Claude Code,
OpenRouter, Hyperbolic) signals, next) Codex, Gemini CLI)
Use Cases
"Debug what I'm looking at" — Capture your screen + voice-describe the bug. Your agent gets full visual context instead of you copy-pasting error messages.
"See all my monitors" — Agent captures your IDE, browser, and terminal simultaneously. Multi-monitor support: active, specific, or all displays at once.
"Annotate the problem" — Freeze your screen, draw arrows and circles on the exact bug. Agent sees precisely what you mean.
"Watch my phone" — Capture your Android device screen via ADB while developing mobile apps.
"Give remote agents eyes" — SSE server lets cloud agents, CI runners, or SSH dev boxes see your local screen. Bearer token auth included.
"Voice-first workflow" — Hold spacebar, describe what you want while looking at your screen. Agent sees + hears simultaneously.
"Monitor dashboards" — Point it at Grafana, production logs, or any dashboard. Agent captures and analyzes what's on screen.
"Context switch instantly" — Capture your screen state when switching tasks. Agent knows your new context without explanation.
"Click what you see" — OCR finds text elements with coordinates, then click/type/scroll to interact. The full see-locate-act loop for agent automation.
"What did I see before?" — Search past Context Packs by keyword. Agent recalls "last time I saw this error, the fix was X" from your observation history.
Quick Start
1. Install
pip install eye2byte[all]
Granular install options
pip install eye2byte # Core + MCP server (Pillow + fastmcp)
pip install eye2byte[voice] # + local voice transcription (openai-whisper)
pip install eye2byte[ui] # + control panel (customtkinter + pystray)
pip install eye2byte[ocr] # + coordinate-aware OCR (easyocr)
pip install eye2byte[interact] # + mouse/keyboard automation (pyautogui)
pip install eye2byte[all] # Everything
ffmpeg is required for voice/clips — install via your package manager.
2. Configure a vision provider
| Provider | Setup | Cost |
|---|---|---|
| Ollama | Install Ollama, ollama pull qwen3-vl:8b |
Free (local) |
| Gemini | Set GEMINI_API_KEY in .env |
Free tier |
| OpenRouter | Set OPENROUTER_API_KEY in .env |
Free models available |
| Hyperbolic | Set HYPERBOLIC_API_KEY in .env |
Pay per use |
# .env file — place in project dir, cwd, or ~/.eye2byte/.env
GEMINI_API_KEY=your-key-here
3. Run
eye2byte capture # Screenshot + analysis
eye2byte capture --voice # + voice narration
eye2byte capture --mode window # Active window only
eye2byte-ui # Launch control panel
Or run the scripts directly:
python eye2byte.py capture
python eye2byte_ui.py
How It Works
Eye2byte sits between your screen and your coding agent.
- Capture — takes a screenshot (full screen, window, region, or all monitors), optionally records voice and annotations
- Process — optimizes the image (~5x smaller, zero quality loss), cleans audio (noise removal + normalization), transcribes speech locally via Whisper
- Analyze — sends everything to your chosen vision model
- Output — produces a structured Context Pack the agent can act on
Context Pack Format
Every analysis produces a markdown document with structured sections:
## Goal — what the user appears to be doing
## Environment — OS, editor, repo, branch, language
## Screen State — visible panels, files, terminal output
## Signals — verbatim errors, stack traces, warnings
## Likely Situation — what's probably happening
## Suggested Next Info — what a coding agent needs next
The agent receives this as actionable context — not a raw image dump.
MCP Integration
Eye2byte exposes 11 tools via the Model Context Protocol. Any MCP-compatible agent can use them.
| Tool | What it does | Install |
|---|---|---|
capture_and_summarize |
Screenshot + vision analysis (monitor selection, delay, window targeting) | core |
capture_with_voice |
Screenshot + voice recording + transcription + analysis | core |
record_clip_and_summarize |
Screen clip with keyframe extraction and sequence analysis | core |
summarize_screenshot |
Analyze an existing image file | core |
transcribe_audio |
Local Whisper transcription of any audio file | core |
get_recent_context |
Retrieve recent Context Pack summaries | core |
get_screen_elements |
OCR with coordinates — find text elements and their screen positions | [ocr] |
search_context_history |
Full-text search across all past Context Packs | core |
click_element |
Click at screen coordinates (from get_screen_elements output) |
[interact] |
type_text |
Type text at current cursor position | [interact] |
press_key |
Press keyboard key or combo (e.g. "ctrl+a", "enter") | [interact] |
scroll_screen |
Scroll at a screen position | [interact] |
OpenClaw
Eye2byte works with OpenClaw out of the box. Add to your openclaw.json:
{
"mcpServers": {
"eye2byte": {
"command": "python",
"args": ["eye2byte_mcp.py"]
}
}
}
Now your OpenClaw can see your screen from any channel — WhatsApp, Telegram, Slack, Discord. An Eye2byte skill is also available on ClawHub.
Local agents (stdio)
For agents running on the same machine (Claude Code, Codex CLI, etc.). Add to .mcp.json:
{
"mcpServers": {
"eye2byte": {
"command": "python",
"args": ["C:/path/to/eye2byte_mcp.py"]
}
}
}
That's it. The agent auto-starts the server. Use full absolute paths.
Remote agents (SSE)
For agents on a different machine (cloud VM, SSH dev box, CI runner).
On your local machine (the one with the screen):
python eye2byte_mcp.py --sse # No auth (LAN only)
python eye2byte_mcp.py --sse --token mysecret123 # Bearer token auth
python eye2byte_mcp.py --sse --port 9000 --token abc # Custom port + auth
On the remote machine (where the agent runs) — add to MCP config:
{
"mcpServers": {
"eye2byte": {
"url": "http://YOUR_LOCAL_IP:8808/sse",
"headers": {"Authorization": "Bearer mysecret123"}
}
}
}
Omit headers if the server was started without --token.
Firewall note (Windows)
netsh advfirewall firewall add rule name="Eye2byte MCP" dir=in action=allow protocol=TCP localport=8808
Find your local IP: ipconfig (Windows) or ip addr (Linux/macOS).
Multi-monitor
capture_and_summarize(monitor=0) # active monitor (default)
capture_and_summarize(monitor=1) # first monitor
capture_and_summarize(monitor=2) # second monitor
capture_and_summarize(monitor=-1) # ALL monitors at once
Control Panel
eye2byte-ui # or: python eye2byte_ui.py
A small always-on-top floating panel. Drag it anywhere. Global hotkeys work even when the panel isn't focused.
Global Hotkeys (Windows)
| Hotkey | Action |
|---|---|
Ctrl+Shift+1 |
Capture screenshot (uses current mode) |
Ctrl+Shift+2 |
Annotate (freeze screen, open drawing overlay) |
Ctrl+Shift+3 |
Toggle voice recording |
Ctrl+Shift+5 |
Grab clipboard image |
All shortcuts are customizable from Settings > Keyboard Shortcuts.
Panel Controls
| Control | Action |
|---|---|
Space (hold) |
Push-to-talk — hold to record, release to stop |
| Mode selector | Cycle between Full Screen / Window / Region |
| Settings | Provider, model, image quality, shortcuts, cleanup |
| Copy @path | Copy session path for @-mentioning in chat |
Settings Tabs
| Tab | What you configure |
|---|---|
| Provider | Vision provider, model selection, API keys |
| Media | Image quality, max size, voice cleaning |
| Shortcuts | All keyboard shortcuts with key capture UI |
| Maintenance | Auto-cleanup days, cache management |
Features Reference
Annotation Overlay
Press Ctrl+Shift+2 or click Annotate to freeze the screen and draw on it.
| Key | Tool | How to use |
|---|---|---|
X |
Arrow | Click and drag |
C |
Circle | Click and drag |
V |
Rectangle | Click and drag |
B |
Freehand | Click and drag |
T |
Text | Click to place, type your text |
| Action | How |
|---|---|
| Save | Enter — commits annotations, sends to vision model |
| Cancel | Escape — discards all annotations |
| Undo | Right-click near an annotation to remove it |
| Newline | Shift+Enter (Enter alone commits) |
| Multi-line | Text box auto-grows up to 6 lines |
Voice Recording
Three ways to record:
| Method | How |
|---|---|
| Toggle | Ctrl+Shift+3 to start, press again to stop |
| Push-to-talk | Hold Space while panel is focused |
| Mouse PTT | Hold click on the Record button |
While recording, any captures you take are automatically bundled with the voice note into a single session.
Platforms
| Platform | Screenshot | Voice | Annotation | Hotkeys |
|---|---|---|---|---|
| Windows | PowerShell .NET | ffmpeg | Pillow | Ctrl+Shift+1-5 |
| macOS | screencapture | ffmpeg | Pillow | — |
| Linux | scrot/maim/flameshot | ffmpeg | Pillow | — |
| Android | ADB (Termux) | Termux:API | — | — |
Configuration
Config file: ~/.eye2byte/config.json (created on first run or via eye2byte init)
| Setting | Default | Description |
|---|---|---|
provider |
"ollama" |
Vision provider: ollama, gemini, openrouter, hyperbolic |
model |
"auto" |
Model name or "auto" for auto-detection |
voice_clean |
true |
Noise removal + pause trimming + volume normalization |
auto_cleanup_days |
7 |
Delete old captures/summaries after N days (0 = disabled) |
image_max_size |
1920 |
Max image dimension before LLM processing |
image_quality |
90 |
JPEG quality (1-100) |
Files
| File | Purpose |
|---|---|
eye2byte.py |
Core engine — capture, voice, clip, summarize |
eye2byte_ui.py |
Control panel with hotkeys and annotation overlay |
eye2byte_mcp.py |
MCP server for coding agent integration |
eye2byte_ocr.py |
Coordinate-aware OCR via easyocr |
eye2byte_interact.py |
Mouse/keyboard automation via pyautogui |
eye2byte_history.py |
Searchable context history via SQLite FTS5 |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eye2byte-0.4.0.tar.gz.
File metadata
- Download URL: eye2byte-0.4.0.tar.gz
- Upload date:
- Size: 67.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8df79c32dfa1649fe9f5f225bb50ed4926667542a6d471ce9162e1ac0161e324
|
|
| MD5 |
23c6f95997fd4888427abf78a5da8ff4
|
|
| BLAKE2b-256 |
7b318fabfd3810932e2bbf8c0049b037f6c727f59d96cd5fc4c73bd6f9ea773e
|
File details
Details for the file eye2byte-0.4.0-py3-none-any.whl.
File metadata
- Download URL: eye2byte-0.4.0-py3-none-any.whl
- Upload date:
- Size: 69.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53562931bac5a22091e5878017b62751755ca8d13850f1f6bc812db1cb310ac7
|
|
| MD5 |
7c5a9be8ce021ea482b248cc4c7b6573
|
|
| BLAKE2b-256 |
575d0f6901e4951fe70e949af357c3b449501d08d60b1d018ec0faf035326809
|