Skip to main content

Accessibility-tree vision for AI agents — see and interact with ANY application without screenshots

Project description

agent-eyes

Accessibility-tree vision for AI agents — see and interact with any application without screenshots.

Instead of pixel-based screen capture, agent-eyes reads the OS accessibility tree to give AI agents a structured, semantic view of every UI element on screen. The tree is the vision.

Key Advantages

  • No screenshots needed — works through accessibility APIs, not pixels
  • Cross-platform — macOS (AXUIElement), Windows (UI Automation), Linux (AT-SPI2)
  • Native + Web — interact with desktop apps and Chrome tabs from one server
  • Shadow mode — control Chrome in the background without stealing window focus
  • Human-like input — real keyboard/mouse events that trigger all event listeners
  • Element IDs — every UI element gets an [id] for precise click/type targeting
  • OCR fallback — for apps with sparse accessibility trees, get text via screen OCR

Installation

Run directly via uvx (no install needed):

uvx agent-eyes

Linux only — install AT-SPI2 via system package manager

apt install python3-pyatspi   # Debian/Ubuntu
dnf install python3-pyatspi   # Fedora

Requirements: Python 3.10+ • Chrome with --remote-debugging-port=9222 for web tools

Quick Start

As an MCP server

Add to your Claude Code config (~/.claude.json):

{
  "mcpServers": {
    "agent-eyes": {
      "command": "uvx",
      "args": ["agent-eyes"]
    }
  }
}

Standalone

agent-eyes

Tools (28)

Orientation

Tool Description
eyes_status Check platform adapter, permissions, CDP availability
eyes_context Quick snapshot — frontmost app, active window, focused element
eyes_list_apps List all running apps with PIDs and window titles
eyes_get_focused Get the currently focused UI element

Reading UI

Tool Description
eyes_get_tree Full accessibility tree of an app by PID
eyes_get_subtree Drill into a specific subtree by element ID
eyes_find Search elements by role, name, or value (regex/contains/exact)
eyes_element_at Identify the element at screen coordinates
eyes_get_ocr_hints OCR fallback — get text blocks with coordinates

Interaction

Tool Description
eyes_click Click an element by ID or screen coordinates
eyes_type Type text into a field with real key events
eyes_press_key Press keys with modifiers (Enter, Tab, Ctrl+C, etc.)
eyes_hover Hover to trigger tooltips and :hover states
eyes_scroll Scroll vertically/horizontally in apps or browser
eyes_drag Drag and drop between coordinates
eyes_fill_form Fill multiple form fields in one call
eyes_file_upload Upload files to a file input element
eyes_wait_for Poll until an element appears (with timeout)

App Management

Tool Description
eyes_app Launch, quit, or focus an application
eyes_window List, focus, minimize, close, move, or resize windows

Chrome / Web

Tool Description
eyes_list_chrome_tabs List all Chrome tabs (title, URL)
eyes_get_web_tree Chrome tab accessibility tree via CDP
eyes_navigate Navigate a tab to a URL
eyes_evaluate Execute JavaScript in a tab
eyes_new_tab Open a new Chrome tab
eyes_close_tab Close a Chrome tab
eyes_handle_dialog Accept/dismiss JS dialogs (alert, confirm, prompt)

Shadow Mode

Tool Description
eyes_shadow Control Chrome without focusing it — click, type, scroll, read, run JS

How It Works

AI Agent
  ↓ MCP
agent-eyes server
  ├── Native Adapter (macOS / Windows / Linux)
  │     └── OS Accessibility API → structured UI tree
  ├── CDP Client (Chrome DevTools Protocol)
  │     └── Chrome tabs → web accessibility tree + JS execution
  └── Input Simulator
        └── Real keyboard/mouse events → human-like interaction
  1. Readeyes_get_tree returns every button, text field, heading, link, etc. as a numbered tree
  2. Findeyes_find searches by role/name/value, or eyes_element_at for coordinate lookup
  3. Acteyes_click, eyes_type, eyes_press_key target elements by their [id]

Supported Platforms

Platform Native Adapter Web (Chrome) Shadow Mode
macOS AXUIElement + pyobjc CDP + AppleScript fallback Yes
Windows UI Automation + pywinauto CDP Yes
Linux AT-SPI2 + pyatspi CDP Yes

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_eyes-0.3.1.tar.gz (77.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_eyes-0.3.1-py3-none-any.whl (87.2 kB view details)

Uploaded Python 3

File details

Details for the file agent_eyes-0.3.1.tar.gz.

File metadata

  • Download URL: agent_eyes-0.3.1.tar.gz
  • Upload date:
  • Size: 77.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agent_eyes-0.3.1.tar.gz
Algorithm Hash digest
SHA256 99340005b981f8b47b23f1b6a861f9d8498a25fc3315add3a56d4bf74d7adae7
MD5 1b31f7ee8754e245fb66195342c4a42f
BLAKE2b-256 ae9727b560d6d66d487838babc7770c63ac4628f74cb071df2b9c325d9901ca4

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_eyes-0.3.1.tar.gz:

Publisher: publish.yml on jellythomas/agent-eyes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_eyes-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: agent_eyes-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 87.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agent_eyes-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1823b42305b90d6774d7d35fc65ba73f75d0fbfb2e8845a75ceb1a904a9a9e7c
MD5 eda5bb7b169546868cad6b202addd706
BLAKE2b-256 d8842b8039b649a5112aa60f11c6953e746df305717564a246cc8467f0b9392d

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_eyes-0.3.1-py3-none-any.whl:

Publisher: publish.yml on jellythomas/agent-eyes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page