Accessibility-tree vision for AI agents — see and interact with ANY application without screenshots
Project description
agent-eyes
Accessibility-tree vision for AI agents — see and interact with any application without screenshots.
Instead of pixel-based screen capture, agent-eyes reads the OS accessibility tree to give AI agents a structured, semantic view of every UI element on screen. The tree is the vision.
Key Advantages
- No screenshots needed — works through accessibility APIs, not pixels
- Cross-platform — macOS (AXUIElement), Windows (UI Automation), Linux (AT-SPI2)
- Native + Web — interact with desktop apps and Chrome tabs from one server
- Shadow mode — control Chrome in the background without stealing window focus
- Human-like input — real keyboard/mouse events that trigger all event listeners
- Element IDs — every UI element gets an
[id]for precise click/type targeting - OCR fallback — for apps with sparse accessibility trees, get text via screen OCR
Installation
Run directly via uvx (no install needed):
uvx agent-eyes
Linux only — install AT-SPI2 via system package manager
apt install python3-pyatspi # Debian/Ubuntu
dnf install python3-pyatspi # Fedora
Requirements: Python 3.10+ • Chrome Extension (recommended) or Chrome with --remote-debugging-port=9222 for web tools
Quick Start
As an MCP server
Add to your Claude Code config (~/.claude.json):
{
"mcpServers": {
"agent-eyes": {
"command": "uvx",
"args": ["agent-eyes"]
}
}
}
Standalone
agent-eyes
First-Time Setup
After adding agent-eyes as an MCP server, run the setup wizard to auto-detect competing servers (Playwright, Puppeteer, etc.) and configure your AI tools:
/agent-eyes-init
This scans your machine for AI coding tools and competing MCP servers, then presents interactive choices to replace them with agent-eyes. All changes are backed up automatically.
Tip: In Claude Code, setup uses native multi-choice prompts. In other AI tools, it falls back to text-based selection.
Tools (28)
Orientation
| Tool | Description |
|---|---|
eyes_status |
Check platform adapter, permissions, CDP availability |
eyes_context |
Quick snapshot — frontmost app, active window, focused element |
eyes_list_apps |
List all running apps with PIDs and window titles |
eyes_get_focused |
Get the currently focused UI element |
Reading UI
| Tool | Description |
|---|---|
eyes_get_tree |
Full accessibility tree of an app by PID |
eyes_get_subtree |
Drill into a specific subtree by element ID |
eyes_find |
Search elements by role, name, or value (regex/contains/exact) |
eyes_element_at |
Identify the element at screen coordinates |
eyes_get_ocr_hints |
OCR fallback — get text blocks with coordinates |
Interaction
| Tool | Description |
|---|---|
eyes_click |
Click an element by ID or screen coordinates |
eyes_type |
Type text into a field with real key events |
eyes_press_key |
Press keys with modifiers (Enter, Tab, Ctrl+C, etc.) |
eyes_hover |
Hover to trigger tooltips and :hover states |
eyes_scroll |
Scroll vertically/horizontally in apps or browser |
eyes_drag |
Drag and drop between coordinates |
eyes_fill_form |
Fill multiple form fields in one call |
eyes_file_upload |
Upload files to a file input element |
eyes_wait_for |
Poll until an element appears (with timeout) |
App Management
| Tool | Description |
|---|---|
eyes_app |
Launch, quit, or focus an application |
eyes_window |
List, focus, minimize, close, move, or resize windows |
Chrome / Web
| Tool | Description |
|---|---|
eyes_list_chrome_tabs |
List all Chrome tabs (title, URL) |
eyes_get_web_tree |
Chrome tab accessibility tree via CDP |
eyes_navigate |
Navigate a tab to a URL |
eyes_evaluate |
Execute JavaScript in a tab |
eyes_new_tab |
Open a new Chrome tab |
eyes_close_tab |
Close a Chrome tab |
eyes_handle_dialog |
Accept/dismiss JS dialogs (alert, confirm, prompt) |
Shadow Mode
| Tool | Description |
|---|---|
eyes_shadow |
Control Chrome without focusing it — click, type, scroll, read, run JS |
How It Works
AI Agent
↓ MCP
agent-eyes server
├── Tier 1: Chrome Extension Bridge (best — no flags, cross-platform)
│ └── chrome.scripting / chrome.tabs → fast web automation
├── Tier 2: CDP Persistent Connection (fast — needs debugging port)
│ └── Single WebSocket + flat sessions → Chrome accessibility tree
├── Tier 3: Native Fallback (always available)
│ ├── OS Accessibility API → structured UI tree
│ ├── AppleScript JS injection → web interaction (macOS)
│ └── Input Simulator → real keyboard/mouse events
└── Desktop/Native Apps
└── Always uses native accessibility (unchanged)
- Read —
eyes_get_treereturns every button, text field, heading, link, etc. as a numbered tree - Find —
eyes_findsearches by role/name/value, oreyes_element_atfor coordinate lookup - Act —
eyes_click,eyes_type,eyes_press_keytarget elements by their[id]
Connection Tiers
agent-eyes automatically selects the best available connection method:
| Tier | Method | Setup Required | Performance | Cross-Platform |
|---|---|---|---|---|
| 1 | Chrome Extension Bridge | Install extension | Fastest | Yes |
| 2 | CDP Persistent Connection | --remote-debugging-port=9222 flag |
Fast | Yes |
| 3 | Native Fallback | None | Good | Yes |
Use eyes_status to see which tier is currently active.
Supported Platforms
| Platform | Native Adapter | Web (Chrome) | Shadow Mode |
|---|---|---|---|
| macOS | AXUIElement + pyobjc (Tier 3) | Extension Bridge (Tier 1) or CDP (Tier 2) | Yes |
| Windows | UI Automation + pywinauto (Tier 3) | Extension Bridge (Tier 1) or CDP (Tier 2) | Yes |
| Linux | AT-SPI2 + pyatspi (Tier 3) | Extension Bridge (Tier 1) or CDP (Tier 2) | Yes |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_eyes-0.4.0.tar.gz.
File metadata
- Download URL: agent_eyes-0.4.0.tar.gz
- Upload date:
- Size: 107.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a22488a3d97cb4ef2fdf4c03adf231202b67eeb8d71c816cb39e180ac8383d4
|
|
| MD5 |
35d6b13f32cb95f8bb27ed486a02b809
|
|
| BLAKE2b-256 |
a3f944b96a2aef0f662ecccec10e3a2a2e32f23a4da1bf3ea71b73e83ca65e4e
|
Provenance
The following attestation bundles were made for agent_eyes-0.4.0.tar.gz:
Publisher:
publish.yml on jellythomas/agent-eyes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_eyes-0.4.0.tar.gz -
Subject digest:
2a22488a3d97cb4ef2fdf4c03adf231202b67eeb8d71c816cb39e180ac8383d4 - Sigstore transparency entry: 1180035399
- Sigstore integration time:
-
Permalink:
jellythomas/agent-eyes@9f90b5ea4ad091cc4497a166e7f2bbdc095f475b -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/jellythomas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f90b5ea4ad091cc4497a166e7f2bbdc095f475b -
Trigger Event:
release
-
Statement type:
File details
Details for the file agent_eyes-0.4.0-py3-none-any.whl.
File metadata
- Download URL: agent_eyes-0.4.0-py3-none-any.whl
- Upload date:
- Size: 103.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c479fb828455479b244474f16513609fc968954bba03a46b15500029afcdfe43
|
|
| MD5 |
1e6f77ab32730e35b72bca677d44246a
|
|
| BLAKE2b-256 |
cd13f463cb3c5f38edec24f9198ee16e27e7726cb7f908ab90d42ede250805bc
|
Provenance
The following attestation bundles were made for agent_eyes-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on jellythomas/agent-eyes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_eyes-0.4.0-py3-none-any.whl -
Subject digest:
c479fb828455479b244474f16513609fc968954bba03a46b15500029afcdfe43 - Sigstore transparency entry: 1180035403
- Sigstore integration time:
-
Permalink:
jellythomas/agent-eyes@9f90b5ea4ad091cc4497a166e7f2bbdc095f475b -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/jellythomas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f90b5ea4ad091cc4497a166e7f2bbdc095f475b -
Trigger Event:
release
-
Statement type: