Skip to main content

Screen-context sidecar for coding agents

Project description

Eye2byte

Screen-context sidecar for coding agents

Python 3.10+ MIT License Cross-platform Changelog


Captures your screen, voice, and annotations, feeds them to any vision model, and produces structured Context Packs your coding agent can act on.

Screen / Voice / Annotations  -->  Vision Model + Whisper  -->  Context Pack  -->  Coding Agent

Features

  • Multi-monitor capture — active, specific (1/2/3), or all monitors at once
  • Voice narration — record, clean (noise removal + normalization), transcribe locally
  • Annotations — arrows, circles, rectangles, freehand, multi-line text on a frozen screenshot
  • Screen clips — record short videos, extract keyframes, analyze the sequence
  • Image optimization — auto resize + compress (~5x smaller, zero quality loss)
  • MCP server — coding agents query your screen directly via Model Context Protocol
  • Context Packs — structured output: goal, environment, errors, signals, next steps

Platforms

Platform Screenshot Voice Annotation Hotkeys
Windows PowerShell .NET ffmpeg Pillow Ctrl+Shift+1-5
macOS screencapture ffmpeg Pillow -
Linux scrot/maim/flameshot ffmpeg Pillow -
Android ADB (Termux) Termux:API - -

Setup

1. Install dependencies

pip install Pillow fastmcp       # Core + MCP server
pip install openai-whisper       # Local voice transcription (optional)
# ffmpeg is required for voice/clips — install via your package manager

2. Configure a vision provider

Eye2byte works with any vision model — local or cloud. Set your provider in ~/.eye2byte/config.json or the Settings UI:

Provider Setup Cost
Ollama (local) Install Ollama, ollama pull qwen3-vl:8b Free
Gemini Set GEMINI_API_KEY in .env Free tier (1000 req/day)
OpenRouter Set OPENROUTER_API_KEY in .env Free models available
Hyperbolic Set HYPERBOLIC_API_KEY in .env Pay per use
# .env file (project dir, cwd, or ~/.eye2byte/.env)
GEMINI_API_KEY=your-key-here
# or OPENROUTER_API_KEY=...
# or HYPERBOLIC_API_KEY=...

3. Run

python eye2byte.py capture              # Screenshot + analysis
python eye2byte.py capture --voice      # + voice narration
python eye2byte.py capture --mode window # Active window only
python eye2byte_ui.py                    # Launch control panel

Control Panel

python eye2byte_ui.py

A small always-on-top floating panel. Drag it anywhere. Global hotkeys work even when the panel isn't focused.

Global Hotkeys (Windows)

These work system-wide — no need to focus the Eye2byte window:

Hotkey Action Notes
Ctrl+Shift+1 Capture screenshot Uses current mode (Full/Window/Region)
Ctrl+Shift+2 Annotate Freezes screen, opens drawing overlay
Ctrl+Shift+3 Toggle voice recording Press once to start, again to stop
Ctrl+Shift+5 Grab clipboard image Analyzes whatever image is on your clipboard

All keyboard shortcuts are customizable from Settings > Keyboard Shortcuts.

Panel Controls

Control Action
Space (hold) Push-to-talk — hold to record, release to stop
Mode selector Cycle between Full Screen / Window / Region
Settings Configure provider, model, image quality, cleanup
Copy @path Copy session path to clipboard for @-mentioning

Annotation Overlay

When you press Ctrl+Shift+2 or click Annotate, the screen freezes and you can draw on it:

Key Tool How to use
X Arrow Click and drag to draw an arrow
C Circle Click and drag to draw an ellipse
V Rectangle Click and drag to draw a box
B Freehand Click and drag to draw freely
T Text Click to place, type your text
Action How
Save Enter (commits annotations and sends to vision model)
Cancel Escape (discards all annotations)
Undo Right-click near an annotation to remove it
Newline in text Shift+Enter (Enter alone commits the text)
Multi-line text Text box auto-grows up to 6 lines

Voice Recording

Three ways to record voice:

  1. ToggleCtrl+Shift+3 starts recording, press again to stop
  2. Push-to-talk — Hold Space while panel is focused
  3. Mouse PTT — Hold click on the Record button

While recording, any captures you take are automatically bundled with the voice note into a single session.

MCP Server

Eye2byte exposes 6 tools via the Model Context Protocol, letting coding agents capture and analyze your screen directly.

Tool Description
capture_and_summarize Screenshot + vision analysis. Supports monitor selection, delay, window targeting
capture_with_voice Screenshot + voice recording + transcription + analysis
record_clip_and_summarize Screen clip with keyframe extraction and sequence analysis
summarize_screenshot Analyze an existing image file
transcribe_audio Local Whisper transcription of any audio file
get_recent_context Retrieve recent Context Pack summaries

Local Setup (stdio)

Eye2byte runs on the machine whose screen you want to capture. For local agents like Claude Code on the same machine, use stdio transport:

Claude Code — add to your project's .mcp.json:

{
  "mcpServers": {
    "eye2byte": {
      "command": "python",
      "args": ["C:/path/to/eye2byte_mcp.py"]
    }
  }
}

That's it — Claude Code will auto-start the server. Use full absolute paths.

Remote Setup (SSE)

When your coding agent runs on a different machine (cloud VM, SSH dev box, CI runner) but needs to see your local screen, use SSE transport:

Step 1 — On your local machine (the one with the screen):

# Install Eye2byte + dependencies
pip install Pillow fastmcp
pip install openai-whisper  # optional, for voice

# Start the SSE server
python eye2byte_mcp.py --sse                           # No auth (LAN only)
python eye2byte_mcp.py --sse --token mysecret123       # Bearer token auth
python eye2byte_mcp.py --sse --port 9000 --token abc   # Custom port + auth

The server stays running and accepts connections from any machine on your network. Use --token when the server is reachable beyond your trusted LAN.

Step 2 — On the remote machine (where the coding agent runs):

Nothing to install. Just configure the MCP client to point at your local IP:

{
  "mcpServers": {
    "eye2byte": {
      "url": "http://YOUR_LOCAL_IP:8808/sse",
      "headers": {"Authorization": "Bearer mysecret123"}
    }
  }
}

Omit the headers field if the server was started without --token.

Find your local IP: ipconfig (Windows) or ifconfig / ip addr (Linux/macOS).

Firewall: You may need to allow inbound TCP on port 8808. On Windows, run as admin:

netsh advfirewall firewall add rule name="Eye2byte MCP" dir=in action=allow protocol=TCP localport=8808

Multi-monitor Examples

capture_and_summarize(monitor=0)    # active monitor (default)
capture_and_summarize(monitor=1)    # first monitor
capture_and_summarize(monitor=2)    # second monitor
capture_and_summarize(monitor=-1)   # ALL monitors at once

Context Pack Format

Every analysis produces a structured Context Pack:

## Goal         — what the user appears to be doing
## Environment  — OS, editor, repo, branch, language
## Screen State — visible panels, files, terminal output
## Signals      — verbatim errors, stack traces, warnings
## Likely Situation — what's probably happening
## Suggested Next Info — what a coding agent needs next

Configuration

Config: ~/.eye2byte/config.json (created on first run or via python eye2byte.py init)

Setting Default Description
provider "ollama" Vision provider: ollama, gemini, openrouter, hyperbolic
model "auto" Model name or "auto" for auto-detection
voice_clean true Noise removal + pause trimming + volume normalization
auto_cleanup_days 7 Delete old captures/summaries after N days (0=disabled)
image_max_size 1920 Max image dimension before LLM processing
image_quality 90 JPEG quality (1-100)

Files

File Purpose
eye2byte.py Core engine — capture, voice, clip, summarize, watch
eye2byte_ui.py Control panel with hotkeys and annotation overlay
eye2byte_mcp.py MCP server for coding agent integration

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eye2byte-0.3.0.tar.gz (52.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eye2byte-0.3.0-py3-none-any.whl (54.2 kB view details)

Uploaded Python 3

File details

Details for the file eye2byte-0.3.0.tar.gz.

File metadata

  • Download URL: eye2byte-0.3.0.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for eye2byte-0.3.0.tar.gz
Algorithm Hash digest
SHA256 23ddd88027da89247a46ec99c38dcddb487715bf38a04a0169193f7c945beef4
MD5 1bcd2b598386020200a5f3d6031854ea
BLAKE2b-256 fb5e7e2a656218e12733320eca32bef37b170d8bae47c112a708cf78958f5089

See more details on using hashes here.

File details

Details for the file eye2byte-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: eye2byte-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 54.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for eye2byte-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a984e0c9ce3f1371e223a4c497e2effc5963112b8c59ec0d07822e7f47b1ea9d
MD5 f58ddd48c90fe5f9aa3399cccd5806a0
BLAKE2b-256 31b3891975f3f161d9c7e7fb34dfb5ffc18c6edb8c7cf84b2625424d82ba1c52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page