Skip to main content

Accessibility-tree vision for AI agents — see and interact with ANY application without screenshots

Project description

agent-eyes

Accessibility-tree vision for AI agents — see and interact with any application without screenshots.

Instead of pixel-based screen capture, agent-eyes reads the OS accessibility tree to give AI agents a structured, semantic view of every UI element on screen. The tree is the vision.

Key Advantages

  • No screenshots needed — works through accessibility APIs, not pixels
  • Cross-platform — macOS (AXUIElement), Windows (UI Automation), Linux (AT-SPI2)
  • Native + Web — interact with desktop apps and Chrome tabs from one server
  • Shadow mode — control Chrome in the background without stealing window focus
  • Human-like input — real keyboard/mouse events that trigger all event listeners
  • Element IDs — every UI element gets an [id] for precise click/type targeting
  • OCR fallback — for apps with sparse accessibility trees, get text via screen OCR

Installation

Run directly via uvx (no install needed):

uvx agent-eyes

Linux only — install AT-SPI2 via system package manager

apt install python3-pyatspi   # Debian/Ubuntu
dnf install python3-pyatspi   # Fedora

Requirements: Python 3.10+ • Chrome Extension (recommended) or Chrome with --remote-debugging-port=9222 for web tools

Quick Start

As an MCP server

Add to your Claude Code config (~/.claude.json):

{
  "mcpServers": {
    "agent-eyes": {
      "command": "uvx",
      "args": ["agent-eyes"]
    }
  }
}

Standalone

agent-eyes

First-Time Setup

After adding agent-eyes as an MCP server, run the setup wizard to auto-detect competing servers (Playwright, Puppeteer, etc.) and configure your AI tools:

/agent-eyes-init

This scans your machine for AI coding tools and competing MCP servers, then presents interactive choices to replace them with agent-eyes. All changes are backed up automatically.

Tip: In Claude Code, setup uses native multi-choice prompts. In other AI tools, it falls back to text-based selection.

Tools (28)

Orientation

Tool Description
eyes_status Check platform adapter, permissions, CDP availability
eyes_context Quick snapshot — frontmost app, active window, focused element
eyes_list_apps List all running apps with PIDs and window titles
eyes_get_focused Get the currently focused UI element

Reading UI

Tool Description
eyes_get_tree Full accessibility tree of an app by PID
eyes_get_subtree Drill into a specific subtree by element ID
eyes_find Search elements by role, name, or value (regex/contains/exact)
eyes_element_at Identify the element at screen coordinates
eyes_get_ocr_hints OCR fallback — get text blocks with coordinates

Interaction

Tool Description
eyes_click Click an element by ID or screen coordinates
eyes_type Type text into a field with real key events
eyes_press_key Press keys with modifiers (Enter, Tab, Ctrl+C, etc.)
eyes_hover Hover to trigger tooltips and :hover states
eyes_scroll Scroll vertically/horizontally in apps or browser
eyes_drag Drag and drop between coordinates
eyes_fill_form Fill multiple form fields in one call
eyes_file_upload Upload files to a file input element
eyes_wait_for Poll until an element appears (with timeout)

App Management

Tool Description
eyes_app Launch, quit, or focus an application
eyes_window List, focus, minimize, close, move, or resize windows

Chrome / Web

Tool Description
eyes_list_chrome_tabs List all Chrome tabs (title, URL)
eyes_get_web_tree Chrome tab accessibility tree via CDP
eyes_navigate Navigate a tab to a URL
eyes_evaluate Execute JavaScript in a tab
eyes_new_tab Open a new Chrome tab
eyes_close_tab Close a Chrome tab
eyes_handle_dialog Accept/dismiss JS dialogs (alert, confirm, prompt)

Shadow Mode

Tool Description
eyes_shadow Control Chrome without focusing it — click, type, scroll, read, run JS

How It Works

AI Agent
  ↓ MCP
agent-eyes server
  ├── Tier 1: Chrome Extension Bridge (best — no flags, cross-platform)
  │     └── chrome.scripting / chrome.tabs → fast web automation
  ├── Tier 2: CDP Persistent Connection (fast — needs debugging port)
  │     └── Single WebSocket + flat sessions → Chrome accessibility tree
  ├── Tier 3: Native Fallback (always available)
  │     ├── OS Accessibility API → structured UI tree
  │     ├── AppleScript JS injection → web interaction (macOS)
  │     └── Input Simulator → real keyboard/mouse events
  └── Desktop/Native Apps
        └── Always uses native accessibility (unchanged)
  1. Readeyes_get_tree returns every button, text field, heading, link, etc. as a numbered tree
  2. Findeyes_find searches by role/name/value, or eyes_element_at for coordinate lookup
  3. Acteyes_click, eyes_type, eyes_press_key target elements by their [id]

Connection Tiers

agent-eyes automatically selects the best available connection method:

Tier Method Setup Required Performance Cross-Platform
1 Chrome Extension Bridge Install extension Fastest Yes
2 CDP Persistent Connection --remote-debugging-port=9222 flag Fast Yes
3 Native Fallback None Good Yes

Use eyes_status to see which tier is currently active.

Supported Platforms

Platform Native Adapter Web (Chrome) Shadow Mode
macOS AXUIElement + pyobjc (Tier 3) Extension Bridge (Tier 1) or CDP (Tier 2) Yes
Windows UI Automation + pywinauto (Tier 3) Extension Bridge (Tier 1) or CDP (Tier 2) Yes
Linux AT-SPI2 + pyatspi (Tier 3) Extension Bridge (Tier 1) or CDP (Tier 2) Yes

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_eyes-0.8.0.tar.gz (118.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_eyes-0.8.0-py3-none-any.whl (105.0 kB view details)

Uploaded Python 3

File details

Details for the file agent_eyes-0.8.0.tar.gz.

File metadata

  • Download URL: agent_eyes-0.8.0.tar.gz
  • Upload date:
  • Size: 118.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_eyes-0.8.0.tar.gz
Algorithm Hash digest
SHA256 8b31d88c9dec3fc7a87a64427bd77804f21a77fa5ef29036ab99a3a407caa525
MD5 1df4985e4f15a75994ccbbcce91dc883
BLAKE2b-256 97d528f31231813be12e376132e52fa4a2f148c1f7f1405e875c0d9e181ef239

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_eyes-0.8.0.tar.gz:

Publisher: publish.yml on jellythomas/agent-eyes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_eyes-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: agent_eyes-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 105.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_eyes-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 804db8e7f70eb4a12dc608bfd7cac07f5b4cb23e34760d2881923622d8f148ee
MD5 9cf5f6802374bd0b33df2071265985a1
BLAKE2b-256 3d3beada407b05603391ed5a9329ce25c442c0c77e0f91f89e589da3746b5b5b

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_eyes-0.8.0-py3-none-any.whl:

Publisher: publish.yml on jellythomas/agent-eyes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page