Skip to main content

Desktop automation platform — screen capture, OCR, interaction, macros, events, and integrations

Project description

Eye2byte icon

Eye2byte

Desktop automation platform for AI agents.
See the screen. Find elements. Click, type, record, automate.

PyPI Python 3.10+ MIT License Windows Changelog Use Cases Capabilities Map Tests


Eye2byte is a desktop automation platform that exposes MCP tools to AI coding agents. It captures your screen, finds UI elements via OCR/VLM, clicks and types, records macros, monitors system events, and integrates with PowerToys and EventGhost — all through the Model Context Protocol.

Screen / Voice / Annotations  ──>  Vision Model + Whisper  ──>  Context Pack  ──>  Coding Agent
                                   (Ollama, Gemini,              (goal, errors,    (Claude Code,
                                    OpenRouter, Hyperbolic)       signals, next)     Codex, Gemini CLI)

Current reference docs:

  • Capability maturity map: docs/CAPABILITIES_MAP.md
  • Testing and CI contracts: docs/testing.md
  • Release gate checklist: docs/release-checklist.md

What Can Eye2byte Do?

Perception — See the Screen

  • Screenshot capture — full screen, active window, region, all monitors, or specific monitor
  • AI analysis — sends screenshots to vision models, produces structured Context Packs
  • OCR — extract text with coordinates via WinOCR (150ms) or EasyOCR fallback
  • Voice — record narration, transcribe locally via Whisper, bundle with captures
  • Screen clips — record video, extract keyframes, summarize sequences

Interaction — Control the Desktop

  • Click, type, scroll, keypress — full mouse/keyboard control at coordinates
  • Find and click — describe what you want to click in plain text, OCR+VLM finds and clicks it
  • Confidence-scored matching — retry logic, fail-safe abort below threshold, deterministic ranking

Window Management — Arrange Your Workspace

  • List, focus, minimize any window by name or process
  • Pin/unpin windows as always-on-top
  • Move and resize to exact coordinates
  • Set transparency on any window (0-255 alpha)
  • Get pixel color at any screen coordinate

Desktop Automation — Fill PowerToys Gaps

  • Hot corners — trigger actions when mouse hits screen corners
  • Text expander — type abbreviations, get expanded text (supports {today}, {time}, {clipboard})
  • Focus lock — prevent apps from stealing focus
  • Distraction dimmer — dim all windows except the active one
  • Auto-tile — i3-style window tiling (columns, rows, grid, master+stack)
  • App-specific hotkeys — remap keys only when a specific app is focused
  • Key-to-mouse — map any keyboard key to mouse clicks

Macro Recording — Record and Replay

  • Record mouse clicks, keyboard input, and scroll events with timestamps
  • Replay using the find engine (resilient to window repositioning) or raw coordinates
  • Visual assertions — verify expected state after each step
  • Audit logs — before/after screenshots with confidence scores for every action

Event System — React to System Changes

  • 10 event types: window focus, window open/close, clipboard, global hotkeys, file system changes, USB devices, process start/stop, display changes, power state, volume
  • Webhook delivery — POST events to any URL (Node-RED, custom server)
  • MCP polling — buffer events and query them from your agent
  • Configurable — listen to specific events, ignore the rest

Integrations — Extend with PowerToys & EventGhost

  • PowerToys — detect installation, control FancyZones layouts, check file locks, resize images, quick-preview files
  • EventGhost — trigger events, read/write variables, execute scripts, bridge events via WebSocket (GPL-safe HTTP API, no imports)
  • Node-RED — custom node for visual workflow automation, 7 example flows, UIBUILDER dashboard
  • CmdPal — PowerToys Command Palette extension (C#/.NET 9) exposes all tools to millions of users

System Control

  • Set volume (0-100)
  • Text-to-speech via Windows SAPI
  • Keep awake — prevent sleep/display-off for a duration
  • Search history — full-text search across all past Context Packs

Quick Start

1. Install

pip install eye2byte[all]

Alternative installation methods:

# Isolated app install (recommended for CLI use)
pipx install "eye2byte[all]"

# uv (fast Python package manager)
uv tool install "eye2byte[all]"

# Latest main branch (before PyPI release catches up)
pip install "git+https://github.com/wolverin0/Eye2byte.git@main"
Granular install options
pip install eye2byte             # Core + MCP server (Pillow + fastmcp)
pip install eye2byte[voice]      # + local voice transcription (openai-whisper)
pip install eye2byte[ui]         # + control panel (customtkinter + pystray)
pip install eye2byte[ocr]        # + coordinate-aware OCR (winocr on Windows, easyocr fallback)
pip install eye2byte[capture]    # + optional mss in-memory capture backend
pip install eye2byte[find]       # + rapidfuzz matching acceleration
pip install eye2byte[interact]   # + mouse/keyboard automation (pyautogui)
pip install eye2byte[win32]      # + volume control, text-to-speech (comtypes)
pip install eye2byte[bridges]    # + EventGhost WebSocket bridge (websocket-client)
pip install eye2byte[uia]        # + Windows UI Automation helpers
pip install eye2byte[all]        # Everything

System prerequisites:

  • ffmpeg is required for voice and clip workflows.
  • Linux: OCR and screenshot stacks may require additional desktop packages (tesseract, scrot, compositor-specific deps).
  • macOS: grant Accessibility and Screen Recording permissions for interaction/capture tools.
  • Windows: PowerToys/EventGhost integrations are optional and can be added later.

2. Configure a vision provider

Provider Setup Cost
Ollama Install Ollama, ollama pull qwen3-vl:8b Free (local)
Gemini Set GEMINI_API_KEY in .env Free tier
OpenRouter Set OPENROUTER_API_KEY in .env Free models available
Hyperbolic Set HYPERBOLIC_API_KEY in .env Pay per use

3. Run

eye2byte capture                   # Screenshot + analysis
eye2byte capture --voice           # + voice narration
eye2byte capture --mode window     # Active window only
eye2byte-ui                        # Launch control panel

MCP Tools

Eye2byte exposes a broad MCP tool catalog. Any MCP-compatible agent can use them.

Perception (8 tools)

Tool What it does
capture_and_summarize Screenshot + vision AI analysis → Context Pack
capture_with_voice Screenshot + voice recording + transcription
record_clip_and_summarize Screen clip with keyframe extraction
summarize_screenshot Analyze an existing image file
transcribe_audio Local Whisper transcription
get_screen_elements OCR with coordinates — find text positions
get_recent_context Retrieve recent Context Pack summaries
search_context_history Full-text search past observations

Interaction (6 tools)

Tool What it does
find_element Find UI element by text/description (OCR+VLM, confidence scored)
find_and_click Find element + click in one call
click_element Click at screen coordinates
type_text Type text at cursor position
press_key Press keyboard key or combo
scroll_screen Scroll at a screen position

Window Management (9 tools)

Tool What it does
list_windows List all visible windows with titles and positions
focus_window Bring window to foreground
minimize_window Minimize by title or process
pin_window Set always-on-top
unpin_window Remove always-on-top
move_window Move and resize to exact coordinates
set_window_opacity Set transparency (0-255)
get_pixel_color Get RGB + hex color at coordinates
keep_awake Prevent sleep/display-off

Desktop Automation (7 tools)

Tool What it does
hot_corners Trigger actions on screen corner hover
text_expander Type abbreviation → expanded text
focus_lock Prevent focus stealing
distraction_dimmer Dim inactive windows
auto_tile i3-style window tiling
app_specific_hotkeys Per-app key remapping
key_to_mouse Map keyboard key to mouse action

Macro Recording (6 tools)

Tool What it does
start_macro_recording Begin recording mouse/keyboard actions
stop_macro_recording Save recorded macro
replay_macro Replay with find-engine or raw coordinates
list_macros List saved macros with metadata
add_macro_assertion Add visual assertion to macro step
delete_macro Delete macro and associated files

Event System (4 tools)

Tool What it does
start_event_listener Monitor: focus, clipboard, hotkeys, files, USB, process, display, power, volume
stop_event_listener Stop specific or all listeners
get_events Poll buffered events with filters
get_event_listeners List active listeners

System (2 tools)

Tool What it does
set_volume Set system volume (0-100)
speak Text-to-speech via Windows SAPI

PowerToys Integration (5 tools)

Tool What it does
powertoys_status Check installation, version, available CLIs
powertoys_fancy_zones Control FancyZones layouts
powertoys_file_locksmith Detect processes locking files
powertoys_image_resize Resize images via PowerToys
powertoys_peek Quick file preview

EventGhost Integration (7 tools)

Tool What it does
eventghost_status Check if EG Webserver is running
eventghost_trigger Trigger EventGhost event via HTTP
eventghost_get_value Read EG variable
eventghost_set_value Write EG variable
eventghost_get_all_values Get all EG variables
eventghost_execute_script Run Python 2.7 in EG context
eventghost_bridge Manage WebSocket event bridge

Use Cases

"Debug what I'm looking at" — Capture your screen + voice-describe the bug. Your agent gets full visual context.

"Click what you see"find_and_click("Submit button") — OCR finds it, confidence scores it, clicks it. No coordinates needed.

"Record my workflow"start_macro_recording("login") — record clicks and keystrokes, then replay_macro("login") re-locates elements visually.

"Dim distractions"distraction_dimmer(180) dims everything except your active window. focus_lock("VSCode") prevents apps from stealing focus.

"Auto-tile my windows"auto_tile("master+stack") for i3-style tiling. Rebalances automatically when windows open/close.

"React to system events"start_event_listener(["clipboard", "focus", "hotkey:ctrl+shift+c"]) — get notified when clipboard changes, windows gain focus, or custom hotkeys fire.

"Automate with Node-RED" — Visual workflow: Timer → capture → OCR → if error keyword → send alert. Drag-and-drop automation.

"Control PowerToys"powertoys_fancy_zones(command="set-layout", layout="Coding") — assign windows to zones programmatically.

"Bridge EventGhost"eventghost_trigger("IR.VolumeUp") — trigger IR remote commands, control media players, read hardware events.

"Give remote agents eyes" — SSE server lets cloud agents see your local screen. Bearer token auth included.

"Search past observations"search_context_history("CORS error login") — find when you last saw this bug and what the fix was.


MCP Integration

Local agents (stdio)

{
  "mcpServers": {
    "eye2byte": {
      "command": "python",
      "args": ["C:/path/to/eye2byte_mcp.py"]
    }
  }
}

Remote agents (SSE)

python eye2byte_mcp.py --sse                         # No auth (LAN only)
python eye2byte_mcp.py --sse --token mysecret123     # Bearer token auth
{
  "mcpServers": {
    "eye2byte": {
      "url": "http://YOUR_LOCAL_IP:8808/sse",
      "headers": {"Authorization": "Bearer mysecret123"}
    }
  }
}

Node-RED

Install the custom node:

cd ~/.node-red
npm install /path/to/node-red-contrib-eye2byte

PowerToys CmdPal

Build and install the extension:

cd eye2byte-cmdpal
dotnet build

Control Panel

eye2byte-ui          # or: python eye2byte_ui.py

A small always-on-top floating panel with global hotkeys.

Global Hotkeys (Windows)

Hotkey Action
Ctrl+Shift+1 Capture screenshot
Ctrl+Shift+2 Annotate (freeze screen, draw overlay)
Ctrl+Shift+3 Toggle voice recording
Ctrl+Shift+5 Grab clipboard image

Annotation Tools

Key Tool
X Arrow
C Circle
V Rectangle
B Freehand
T Text

Enter saves, Escape cancels, right-click to undo.


Architecture

eye2byte.py              Core engine — capture, voice, clip, summarize
eye2byte_ui.py           Control panel with hotkeys and annotation overlay
eye2byte_mcp.py          MCP server (FastMCP)
eye2byte_ocr.py          OCR via WinOCR (150ms) with EasyOCR fallback
eye2byte_find.py         Hybrid element finding — OCR fast path + VLM fallback
eye2byte_interact.py     Mouse/keyboard automation via pyautogui
eye2byte_windows.py      Win32 window management + system control (ctypes)
eye2byte_desktop.py      Desktop automation — hot corners, tiling, text expander
eye2byte_macro.py        Macro recording and replay with visual assertions
eye2byte_events.py       Native Win32 event system with webhook delivery
eye2byte_history.py      Searchable context history via SQLite FTS5
eye2byte_powertoys.py    PowerToys CLI wrappers (FancyZones, Locksmith, etc.)
eye2byte_eventghost.py   EventGhost HTTP/WebSocket bridge (GPL-safe)

node-red-contrib-eye2byte/   Node-RED custom node + 7 example flows + dashboard
eye2byte-cmdpal/             PowerToys CmdPal extension (C#/.NET 9)

Product Tracks

  • eye2byte-annotator (v0.4.0) — lightweight screenshot annotation tool. Tagged at v0.4.0-annotator.
  • eye2byte-platform (v0.5.0) — full desktop automation suite with the complete toolset.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eye2byte-0.7.0.tar.gz (226.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eye2byte-0.7.0-py3-none-any.whl (193.8 kB view details)

Uploaded Python 3

File details

Details for the file eye2byte-0.7.0.tar.gz.

File metadata

  • Download URL: eye2byte-0.7.0.tar.gz
  • Upload date:
  • Size: 226.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for eye2byte-0.7.0.tar.gz
Algorithm Hash digest
SHA256 d4339bc7bda80c560e4b4e2247f7c2008e2687ac79fb18acdc3ece565bf29614
MD5 ef02cf642f3e7bdeae06c3aaa683e1b0
BLAKE2b-256 a73c61074f48fbf2d63390ce63caf7472dd53247bf37372737fc3bab95430b61

See more details on using hashes here.

File details

Details for the file eye2byte-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: eye2byte-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 193.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for eye2byte-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0d577975501780bdb5cbdefcee6c3bea0b8bfd68165b8ed15e5f923c42b791d8
MD5 5f0f56413e97d18add71e30db9b0c97a
BLAKE2b-256 b2d55e56062362ddb87c3122e414e2301c8f0ac89869ee7752d1f0979be57da1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page