Screen-context sidecar for coding agents
Project description
Eye2byte
Screen-context sidecar for coding agents
Captures your screen, voice, and annotations, feeds them to any vision model, and produces structured Context Packs your coding agent can act on.
Screen / Voice / Annotations --> Vision Model + Whisper --> Context Pack --> Coding Agent
Features
- Multi-monitor capture — active, specific (1/2/3), or all monitors at once
- Voice narration — record, clean (noise removal + normalization), transcribe locally
- Annotations — arrows, circles, rectangles, freehand, multi-line text on a frozen screenshot
- Screen clips — record short videos, extract keyframes, analyze the sequence
- Image optimization — auto resize + compress (~5x smaller, zero quality loss)
- MCP server — coding agents query your screen directly via Model Context Protocol
- Context Packs — structured output: goal, environment, errors, signals, next steps
Platforms
| Platform | Screenshot | Voice | Annotation | Hotkeys |
|---|---|---|---|---|
| Windows | PowerShell .NET | ffmpeg | Pillow | Ctrl+Shift+1-5 |
| macOS | screencapture | ffmpeg | Pillow | - |
| Linux | scrot/maim/flameshot | ffmpeg | Pillow | - |
| Android | ADB (Termux) | Termux:API | - | - |
Setup
1. Install dependencies
pip install Pillow fastmcp # Core + MCP server
pip install openai-whisper # Local voice transcription (optional)
# ffmpeg is required for voice/clips — install via your package manager
2. Configure a vision provider
Eye2byte works with any vision model — local or cloud. Set your provider in ~/.eye2byte/config.json or the Settings UI:
| Provider | Setup | Cost |
|---|---|---|
| Ollama (local) | Install Ollama, ollama pull qwen3-vl:8b |
Free |
| Gemini | Set GEMINI_API_KEY in .env |
Free tier (1000 req/day) |
| OpenRouter | Set OPENROUTER_API_KEY in .env |
Free models available |
| Hyperbolic | Set HYPERBOLIC_API_KEY in .env |
Pay per use |
# .env file (project dir, cwd, or ~/.eye2byte/.env)
GEMINI_API_KEY=your-key-here
# or OPENROUTER_API_KEY=...
# or HYPERBOLIC_API_KEY=...
3. Run
python eye2byte.py capture # Screenshot + analysis
python eye2byte.py capture --voice # + voice narration
python eye2byte.py capture --mode window # Active window only
python eye2byte_ui.py # Launch control panel
Control Panel
python eye2byte_ui.py
A small always-on-top floating panel. Drag it anywhere. Global hotkeys work even when the panel isn't focused.
Global Hotkeys (Windows)
These work system-wide — no need to focus the Eye2byte window:
| Hotkey | Action | Notes |
|---|---|---|
Ctrl+Shift+1 |
Capture screenshot | Uses current mode (Full/Window/Region) |
Ctrl+Shift+2 |
Annotate | Freezes screen, opens drawing overlay |
Ctrl+Shift+3 |
Toggle voice recording | Press once to start, again to stop |
Ctrl+Shift+5 |
Grab clipboard image | Analyzes whatever image is on your clipboard |
All keyboard shortcuts are customizable from Settings > Keyboard Shortcuts.
Panel Controls
| Control | Action |
|---|---|
Space (hold) |
Push-to-talk — hold to record, release to stop |
| Mode selector | Cycle between Full Screen / Window / Region |
| Settings | Configure provider, model, image quality, cleanup |
| Copy @path | Copy session path to clipboard for @-mentioning |
Annotation Overlay
When you press Ctrl+Shift+2 or click Annotate, the screen freezes and you can draw on it:
| Key | Tool | How to use |
|---|---|---|
X |
Arrow | Click and drag to draw an arrow |
C |
Circle | Click and drag to draw an ellipse |
V |
Rectangle | Click and drag to draw a box |
B |
Freehand | Click and drag to draw freely |
T |
Text | Click to place, type your text |
| Action | How |
|---|---|
| Save | Enter (commits annotations and sends to vision model) |
| Cancel | Escape (discards all annotations) |
| Undo | Right-click near an annotation to remove it |
| Newline in text | Shift+Enter (Enter alone commits the text) |
| Multi-line text | Text box auto-grows up to 6 lines |
Voice Recording
Three ways to record voice:
- Toggle —
Ctrl+Shift+3starts recording, press again to stop - Push-to-talk — Hold
Spacewhile panel is focused - Mouse PTT — Hold click on the Record button
While recording, any captures you take are automatically bundled with the voice note into a single session.
MCP Server
Eye2byte exposes 6 tools via the Model Context Protocol, letting coding agents capture and analyze your screen directly.
| Tool | Description |
|---|---|
capture_and_summarize |
Screenshot + vision analysis. Supports monitor selection, delay, window targeting |
capture_with_voice |
Screenshot + voice recording + transcription + analysis |
record_clip_and_summarize |
Screen clip with keyframe extraction and sequence analysis |
summarize_screenshot |
Analyze an existing image file |
transcribe_audio |
Local Whisper transcription of any audio file |
get_recent_context |
Retrieve recent Context Pack summaries |
Local Setup (stdio)
Eye2byte runs on the machine whose screen you want to capture. For local agents like Claude Code on the same machine, use stdio transport:
Claude Code — add to your project's .mcp.json:
{
"mcpServers": {
"eye2byte": {
"command": "python",
"args": ["C:/path/to/eye2byte_mcp.py"]
}
}
}
That's it — Claude Code will auto-start the server. Use full absolute paths.
Remote Setup (SSE)
When your coding agent runs on a different machine (cloud VM, SSH dev box, CI runner) but needs to see your local screen, use SSE transport:
Step 1 — On your local machine (the one with the screen):
# Install Eye2byte + dependencies
pip install Pillow fastmcp
pip install openai-whisper # optional, for voice
# Start the SSE server
python eye2byte_mcp.py --sse # No auth (LAN only)
python eye2byte_mcp.py --sse --token mysecret123 # Bearer token auth
python eye2byte_mcp.py --sse --port 9000 --token abc # Custom port + auth
The server stays running and accepts connections from any machine on your network. Use --token when the server is reachable beyond your trusted LAN.
Step 2 — On the remote machine (where the coding agent runs):
Nothing to install. Just configure the MCP client to point at your local IP:
{
"mcpServers": {
"eye2byte": {
"url": "http://YOUR_LOCAL_IP:8808/sse",
"headers": {"Authorization": "Bearer mysecret123"}
}
}
}
Omit the headers field if the server was started without --token.
Find your local IP: ipconfig (Windows) or ifconfig / ip addr (Linux/macOS).
Firewall: You may need to allow inbound TCP on port 8808. On Windows, run as admin:
netsh advfirewall firewall add rule name="Eye2byte MCP" dir=in action=allow protocol=TCP localport=8808
Multi-monitor Examples
capture_and_summarize(monitor=0) # active monitor (default)
capture_and_summarize(monitor=1) # first monitor
capture_and_summarize(monitor=2) # second monitor
capture_and_summarize(monitor=-1) # ALL monitors at once
Context Pack Format
Every analysis produces a structured Context Pack:
## Goal — what the user appears to be doing
## Environment — OS, editor, repo, branch, language
## Screen State — visible panels, files, terminal output
## Signals — verbatim errors, stack traces, warnings
## Likely Situation — what's probably happening
## Suggested Next Info — what a coding agent needs next
Configuration
Config: ~/.eye2byte/config.json (created on first run or via python eye2byte.py init)
| Setting | Default | Description |
|---|---|---|
provider |
"ollama" |
Vision provider: ollama, gemini, openrouter, hyperbolic |
model |
"auto" |
Model name or "auto" for auto-detection |
voice_clean |
true |
Noise removal + pause trimming + volume normalization |
auto_cleanup_days |
7 |
Delete old captures/summaries after N days (0=disabled) |
image_max_size |
1920 |
Max image dimension before LLM processing |
image_quality |
90 |
JPEG quality (1-100) |
Files
| File | Purpose |
|---|---|
eye2byte.py |
Core engine — capture, voice, clip, summarize, watch |
eye2byte_ui.py |
Control panel with hotkeys and annotation overlay |
eye2byte_mcp.py |
MCP server for coding agent integration |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eye2byte-0.3.0.tar.gz.
File metadata
- Download URL: eye2byte-0.3.0.tar.gz
- Upload date:
- Size: 52.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23ddd88027da89247a46ec99c38dcddb487715bf38a04a0169193f7c945beef4
|
|
| MD5 |
1bcd2b598386020200a5f3d6031854ea
|
|
| BLAKE2b-256 |
fb5e7e2a656218e12733320eca32bef37b170d8bae47c112a708cf78958f5089
|
File details
Details for the file eye2byte-0.3.0-py3-none-any.whl.
File metadata
- Download URL: eye2byte-0.3.0-py3-none-any.whl
- Upload date:
- Size: 54.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a984e0c9ce3f1371e223a4c497e2effc5963112b8c59ec0d07822e7f47b1ea9d
|
|
| MD5 |
f58ddd48c90fe5f9aa3399cccd5806a0
|
|
| BLAKE2b-256 |
31b3891975f3f161d9c7e7fb34dfb5ffc18c6edb8c7cf84b2625424d82ba1c52
|