Skip to main content

Accessibility-tree vision for AI agents — see and interact with ANY application without screenshots

Project description

agent-eyes

Accessibility-tree vision for AI agents — see and interact with any application without screenshots.

Instead of pixel-based screen capture, agent-eyes reads the OS accessibility tree to give AI agents a structured, semantic view of every UI element on screen. The tree is the vision.

Key Advantages

  • No screenshots needed — works through accessibility APIs, not pixels
  • Cross-platform — macOS (AXUIElement), Windows (UI Automation), Linux (AT-SPI2)
  • Native + Web — interact with desktop apps and Chrome tabs from one server
  • Shadow mode — control Chrome in the background without stealing window focus
  • Human-like input — real keyboard/mouse events that trigger all event listeners
  • Element IDs — every UI element gets an [id] for precise click/type targeting
  • OCR fallback — for apps with sparse accessibility trees, get text via screen OCR

Installation

Run directly via uvx (no install needed):

uvx agent-eyes

Linux only — install AT-SPI2 via system package manager

apt install python3-pyatspi   # Debian/Ubuntu
dnf install python3-pyatspi   # Fedora

Requirements: Python 3.10+ • Chrome with --remote-debugging-port=9222 for web tools

Quick Start

As an MCP server

Add to your Claude Code config (~/.claude.json):

{
  "mcpServers": {
    "agent-eyes": {
      "command": "uvx",
      "args": ["agent-eyes"]
    }
  }
}

Standalone

agent-eyes

Tools (28)

Orientation

Tool Description
eyes_status Check platform adapter, permissions, CDP availability
eyes_context Quick snapshot — frontmost app, active window, focused element
eyes_list_apps List all running apps with PIDs and window titles
eyes_get_focused Get the currently focused UI element

Reading UI

Tool Description
eyes_get_tree Full accessibility tree of an app by PID
eyes_get_subtree Drill into a specific subtree by element ID
eyes_find Search elements by role, name, or value (regex/contains/exact)
eyes_element_at Identify the element at screen coordinates
eyes_get_ocr_hints OCR fallback — get text blocks with coordinates

Interaction

Tool Description
eyes_click Click an element by ID or screen coordinates
eyes_type Type text into a field with real key events
eyes_press_key Press keys with modifiers (Enter, Tab, Ctrl+C, etc.)
eyes_hover Hover to trigger tooltips and :hover states
eyes_scroll Scroll vertically/horizontally in apps or browser
eyes_drag Drag and drop between coordinates
eyes_fill_form Fill multiple form fields in one call
eyes_file_upload Upload files to a file input element
eyes_wait_for Poll until an element appears (with timeout)

App Management

Tool Description
eyes_app Launch, quit, or focus an application
eyes_window List, focus, minimize, close, move, or resize windows

Chrome / Web

Tool Description
eyes_list_chrome_tabs List all Chrome tabs (title, URL)
eyes_get_web_tree Chrome tab accessibility tree via CDP
eyes_navigate Navigate a tab to a URL
eyes_evaluate Execute JavaScript in a tab
eyes_new_tab Open a new Chrome tab
eyes_close_tab Close a Chrome tab
eyes_handle_dialog Accept/dismiss JS dialogs (alert, confirm, prompt)

Shadow Mode

Tool Description
eyes_shadow Control Chrome without focusing it — click, type, scroll, read, run JS

How It Works

AI Agent
  ↓ MCP
agent-eyes server
  ├── Native Adapter (macOS / Windows / Linux)
  │     └── OS Accessibility API → structured UI tree
  ├── CDP Client (Chrome DevTools Protocol)
  │     └── Chrome tabs → web accessibility tree + JS execution
  └── Input Simulator
        └── Real keyboard/mouse events → human-like interaction
  1. Readeyes_get_tree returns every button, text field, heading, link, etc. as a numbered tree
  2. Findeyes_find searches by role/name/value, or eyes_element_at for coordinate lookup
  3. Acteyes_click, eyes_type, eyes_press_key target elements by their [id]

Supported Platforms

Platform Native Adapter Web (Chrome) Shadow Mode
macOS AXUIElement + pyobjc CDP + AppleScript fallback Yes
Windows UI Automation + pywinauto CDP Yes
Linux AT-SPI2 + pyatspi CDP Yes

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_eyes-0.3.0.tar.gz (77.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_eyes-0.3.0-py3-none-any.whl (86.6 kB view details)

Uploaded Python 3

File details

Details for the file agent_eyes-0.3.0.tar.gz.

File metadata

  • Download URL: agent_eyes-0.3.0.tar.gz
  • Upload date:
  • Size: 77.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agent_eyes-0.3.0.tar.gz
Algorithm Hash digest
SHA256 05b82c8c38b9c71cf529f04d187434dd8bad05552440fdbe19faabc7a9ccb328
MD5 fc64fd45aab775baaed7b8fb5d2558ca
BLAKE2b-256 045b9004f0a07f421a31cdc3a4e2be8412c3399fa57f34bcdf48385ea5e6891d

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_eyes-0.3.0.tar.gz:

Publisher: publish.yml on jellythomas/agent-eyes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_eyes-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: agent_eyes-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 86.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agent_eyes-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d71b1b83f74004fc25f405f46909c4ea0bf67e547b08433e938a050082ac9fc4
MD5 c98c3b77714aa0953d991bafacd57f5f
BLAKE2b-256 f99bbfed641d30c919b4f46194377ae146a29914d951f69f4cf86656b3ab1a28

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_eyes-0.3.0-py3-none-any.whl:

Publisher: publish.yml on jellythomas/agent-eyes

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page