Desktop automation platform — screen capture, OCR, interaction, macros, events, and integrations
Project description
Eye2byte
Desktop automation platform for AI agents.
See the screen. Find elements. Click, type, record, automate.
Eye2byte is a desktop automation platform that exposes MCP tools to AI coding agents. It captures your screen, finds UI elements via OCR/VLM, clicks and types, records macros, monitors system events, and integrates with PowerToys and EventGhost — all through the Model Context Protocol.
Screen / Voice / Annotations ──> Vision Model + Whisper ──> Context Pack ──> Coding Agent
(Ollama, Gemini, (goal, errors, (Claude Code,
OpenRouter, Hyperbolic) signals, next) Codex, Gemini CLI)
Current reference docs:
- Capability maturity map:
docs/CAPABILITIES_MAP.md - Testing and CI contracts:
docs/testing.md - Release gate checklist:
docs/release-checklist.md
What Can Eye2byte Do?
Perception — See the Screen
- Screenshot capture — full screen, active window, region, all monitors, or specific monitor
- AI analysis — sends screenshots to vision models, produces structured Context Packs
- OCR — extract text with coordinates via WinOCR (150ms) or EasyOCR fallback
- Voice — record narration, transcribe locally via Whisper, bundle with captures
- Screen clips — record video, extract keyframes, summarize sequences
Interaction — Control the Desktop
- Click, type, scroll, keypress — full mouse/keyboard control at coordinates
- Find and click — describe what you want to click in plain text, OCR+VLM finds and clicks it
- Confidence-scored matching — retry logic, fail-safe abort below threshold, deterministic ranking
Window Management — Arrange Your Workspace
- List, focus, minimize any window by name or process
- Pin/unpin windows as always-on-top
- Move and resize to exact coordinates
- Set transparency on any window (0-255 alpha)
- Get pixel color at any screen coordinate
Desktop Automation — Fill PowerToys Gaps
- Hot corners — trigger actions when mouse hits screen corners
- Text expander — type abbreviations, get expanded text (supports
{today},{time},{clipboard}) - Focus lock — prevent apps from stealing focus
- Distraction dimmer — dim all windows except the active one
- Auto-tile — i3-style window tiling (columns, rows, grid, master+stack)
- App-specific hotkeys — remap keys only when a specific app is focused
- Key-to-mouse — map any keyboard key to mouse clicks
Macro Recording — Record and Replay
- Record mouse clicks, keyboard input, and scroll events with timestamps
- Replay using the find engine (resilient to window repositioning) or raw coordinates
- Visual assertions — verify expected state after each step
- Audit logs — before/after screenshots with confidence scores for every action
Event System — React to System Changes
- 10 event types: window focus, window open/close, clipboard, global hotkeys, file system changes, USB devices, process start/stop, display changes, power state, volume
- Webhook delivery — POST events to any URL (Node-RED, custom server)
- MCP polling — buffer events and query them from your agent
- Configurable — listen to specific events, ignore the rest
Integrations — Extend with PowerToys & EventGhost
- PowerToys — detect installation, control FancyZones layouts, check file locks, resize images, quick-preview files
- EventGhost — trigger events, read/write variables, execute scripts, bridge events via WebSocket (GPL-safe HTTP API, no imports)
- Node-RED — custom node for visual workflow automation, 7 example flows, UIBUILDER dashboard
- CmdPal — PowerToys Command Palette extension (C#/.NET 9) exposes all tools to millions of users
System Control
- Set volume (0-100)
- Text-to-speech via Windows SAPI
- Keep awake — prevent sleep/display-off for a duration
- Search history — full-text search across all past Context Packs
Quick Start
1. Install
pip install eye2byte[all]
Alternative installation methods:
# Isolated app install (recommended for CLI use)
pipx install "eye2byte[all]"
# uv (fast Python package manager)
uv tool install "eye2byte[all]"
# Latest main branch (before PyPI release catches up)
pip install "git+https://github.com/wolverin0/Eye2byte.git@main"
Granular install options
pip install eye2byte # Core + MCP server (Pillow + fastmcp)
pip install eye2byte[voice] # + local voice transcription (openai-whisper)
pip install eye2byte[ui] # + control panel (customtkinter + pystray)
pip install eye2byte[ocr] # + coordinate-aware OCR (winocr on Windows, easyocr fallback)
pip install eye2byte[capture] # + optional mss in-memory capture backend
pip install eye2byte[find] # + rapidfuzz matching acceleration
pip install eye2byte[interact] # + mouse/keyboard automation (pyautogui)
pip install eye2byte[win32] # + volume control, text-to-speech (comtypes)
pip install eye2byte[bridges] # + EventGhost WebSocket bridge (websocket-client)
pip install eye2byte[uia] # + Windows UI Automation helpers
pip install eye2byte[all] # Everything
System prerequisites:
ffmpegis required for voice and clip workflows.- Linux: OCR and screenshot stacks may require additional desktop packages (
tesseract,scrot, compositor-specific deps). - macOS: grant Accessibility and Screen Recording permissions for interaction/capture tools.
- Windows: PowerToys/EventGhost integrations are optional and can be added later.
2. Configure a vision provider
| Provider | Setup | Cost |
|---|---|---|
| Ollama | Install Ollama, ollama pull qwen3-vl:8b |
Free (local) |
| Gemini | Set GEMINI_API_KEY in .env |
Free tier |
| OpenRouter | Set OPENROUTER_API_KEY in .env |
Free models available |
| Hyperbolic | Set HYPERBOLIC_API_KEY in .env |
Pay per use |
3. Run
eye2byte capture # Screenshot + analysis
eye2byte capture --voice # + voice narration
eye2byte capture --mode window # Active window only
eye2byte-ui # Launch control panel
MCP Tools
Eye2byte exposes a broad MCP tool catalog. Any MCP-compatible agent can use them.
Perception (8 tools)
| Tool | What it does |
|---|---|
capture_and_summarize |
Screenshot + vision AI analysis → Context Pack |
capture_with_voice |
Screenshot + voice recording + transcription |
record_clip_and_summarize |
Screen clip with keyframe extraction |
summarize_screenshot |
Analyze an existing image file |
transcribe_audio |
Local Whisper transcription |
get_screen_elements |
OCR with coordinates — find text positions |
get_recent_context |
Retrieve recent Context Pack summaries |
search_context_history |
Full-text search past observations |
Interaction (6 tools)
| Tool | What it does |
|---|---|
find_element |
Find UI element by text/description (OCR+VLM, confidence scored) |
find_and_click |
Find element + click in one call |
click_element |
Click at screen coordinates |
type_text |
Type text at cursor position |
press_key |
Press keyboard key or combo |
scroll_screen |
Scroll at a screen position |
Window Management (9 tools)
| Tool | What it does |
|---|---|
list_windows |
List all visible windows with titles and positions |
focus_window |
Bring window to foreground |
minimize_window |
Minimize by title or process |
pin_window |
Set always-on-top |
unpin_window |
Remove always-on-top |
move_window |
Move and resize to exact coordinates |
set_window_opacity |
Set transparency (0-255) |
get_pixel_color |
Get RGB + hex color at coordinates |
keep_awake |
Prevent sleep/display-off |
Desktop Automation (7 tools)
| Tool | What it does |
|---|---|
hot_corners |
Trigger actions on screen corner hover |
text_expander |
Type abbreviation → expanded text |
focus_lock |
Prevent focus stealing |
distraction_dimmer |
Dim inactive windows |
auto_tile |
i3-style window tiling |
app_specific_hotkeys |
Per-app key remapping |
key_to_mouse |
Map keyboard key to mouse action |
Macro Recording (6 tools)
| Tool | What it does |
|---|---|
start_macro_recording |
Begin recording mouse/keyboard actions |
stop_macro_recording |
Save recorded macro |
replay_macro |
Replay with find-engine or raw coordinates |
list_macros |
List saved macros with metadata |
add_macro_assertion |
Add visual assertion to macro step |
delete_macro |
Delete macro and associated files |
Event System (4 tools)
| Tool | What it does |
|---|---|
start_event_listener |
Monitor: focus, clipboard, hotkeys, files, USB, process, display, power, volume |
stop_event_listener |
Stop specific or all listeners |
get_events |
Poll buffered events with filters |
get_event_listeners |
List active listeners |
System (2 tools)
| Tool | What it does |
|---|---|
set_volume |
Set system volume (0-100) |
speak |
Text-to-speech via Windows SAPI |
PowerToys Integration (5 tools)
| Tool | What it does |
|---|---|
powertoys_status |
Check installation, version, available CLIs |
powertoys_fancy_zones |
Control FancyZones layouts |
powertoys_file_locksmith |
Detect processes locking files |
powertoys_image_resize |
Resize images via PowerToys |
powertoys_peek |
Quick file preview |
EventGhost Integration (7 tools)
| Tool | What it does |
|---|---|
eventghost_status |
Check if EG Webserver is running |
eventghost_trigger |
Trigger EventGhost event via HTTP |
eventghost_get_value |
Read EG variable |
eventghost_set_value |
Write EG variable |
eventghost_get_all_values |
Get all EG variables |
eventghost_execute_script |
Run Python 2.7 in EG context |
eventghost_bridge |
Manage WebSocket event bridge |
Use Cases
"Debug what I'm looking at" — Capture your screen + voice-describe the bug. Your agent gets full visual context.
"Click what you see" — find_and_click("Submit button") — OCR finds it, confidence scores it, clicks it. No coordinates needed.
"Record my workflow" — start_macro_recording("login") — record clicks and keystrokes, then replay_macro("login") re-locates elements visually.
"Dim distractions" — distraction_dimmer(180) dims everything except your active window. focus_lock("VSCode") prevents apps from stealing focus.
"Auto-tile my windows" — auto_tile("master+stack") for i3-style tiling. Rebalances automatically when windows open/close.
"React to system events" — start_event_listener(["clipboard", "focus", "hotkey:ctrl+shift+c"]) — get notified when clipboard changes, windows gain focus, or custom hotkeys fire.
"Automate with Node-RED" — Visual workflow: Timer → capture → OCR → if error keyword → send alert. Drag-and-drop automation.
"Control PowerToys" — powertoys_fancy_zones(command="set-layout", layout="Coding") — assign windows to zones programmatically.
"Bridge EventGhost" — eventghost_trigger("IR.VolumeUp") — trigger IR remote commands, control media players, read hardware events.
"Give remote agents eyes" — SSE server lets cloud agents see your local screen. Bearer token auth included.
"Search past observations" — search_context_history("CORS error login") — find when you last saw this bug and what the fix was.
MCP Integration
Local agents (stdio)
{
"mcpServers": {
"eye2byte": {
"command": "python",
"args": ["C:/path/to/eye2byte_mcp.py"]
}
}
}
Remote agents (SSE)
python eye2byte_mcp.py --sse # No auth (LAN only)
python eye2byte_mcp.py --sse --token mysecret123 # Bearer token auth
{
"mcpServers": {
"eye2byte": {
"url": "http://YOUR_LOCAL_IP:8808/sse",
"headers": {"Authorization": "Bearer mysecret123"}
}
}
}
Node-RED
Install the custom node:
cd ~/.node-red
npm install /path/to/node-red-contrib-eye2byte
PowerToys CmdPal
Build and install the extension:
cd eye2byte-cmdpal
dotnet build
Control Panel
eye2byte-ui # or: python eye2byte_ui.py
A small always-on-top floating panel with global hotkeys.
Global Hotkeys (Windows)
| Hotkey | Action |
|---|---|
Ctrl+Shift+1 |
Capture screenshot |
Ctrl+Shift+2 |
Annotate (freeze screen, draw overlay) |
Ctrl+Shift+3 |
Toggle voice recording |
Ctrl+Shift+5 |
Grab clipboard image |
Annotation Tools
| Key | Tool |
|---|---|
X |
Arrow |
C |
Circle |
V |
Rectangle |
B |
Freehand |
T |
Text |
Enter saves, Escape cancels, right-click to undo.
Architecture
eye2byte.py Core engine — capture, voice, clip, summarize
eye2byte_ui.py Control panel with hotkeys and annotation overlay
eye2byte_mcp.py MCP server (FastMCP)
eye2byte_ocr.py OCR via WinOCR (150ms) with EasyOCR fallback
eye2byte_find.py Hybrid element finding — OCR fast path + VLM fallback
eye2byte_interact.py Mouse/keyboard automation via pyautogui
eye2byte_windows.py Win32 window management + system control (ctypes)
eye2byte_desktop.py Desktop automation — hot corners, tiling, text expander
eye2byte_macro.py Macro recording and replay with visual assertions
eye2byte_events.py Native Win32 event system with webhook delivery
eye2byte_history.py Searchable context history via SQLite FTS5
eye2byte_powertoys.py PowerToys CLI wrappers (FancyZones, Locksmith, etc.)
eye2byte_eventghost.py EventGhost HTTP/WebSocket bridge (GPL-safe)
node-red-contrib-eye2byte/ Node-RED custom node + 7 example flows + dashboard
eye2byte-cmdpal/ PowerToys CmdPal extension (C#/.NET 9)
Product Tracks
- eye2byte-annotator (
v0.4.0) — lightweight screenshot annotation tool. Tagged atv0.4.0-annotator. - eye2byte-platform (
v0.5.0) — full desktop automation suite with the complete toolset.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eye2byte-0.7.0.tar.gz.
File metadata
- Download URL: eye2byte-0.7.0.tar.gz
- Upload date:
- Size: 226.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4339bc7bda80c560e4b4e2247f7c2008e2687ac79fb18acdc3ece565bf29614
|
|
| MD5 |
ef02cf642f3e7bdeae06c3aaa683e1b0
|
|
| BLAKE2b-256 |
a73c61074f48fbf2d63390ce63caf7472dd53247bf37372737fc3bab95430b61
|
File details
Details for the file eye2byte-0.7.0-py3-none-any.whl.
File metadata
- Download URL: eye2byte-0.7.0-py3-none-any.whl
- Upload date:
- Size: 193.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d577975501780bdb5cbdefcee6c3bea0b8bfd68165b8ed15e5f923c42b791d8
|
|
| MD5 |
5f0f56413e97d18add71e30db9b0c97a
|
|
| BLAKE2b-256 |
b2d55e56062362ddb87c3122e414e2301c8f0ac89869ee7752d1f0979be57da1
|