Accessibility-tree vision for AI agents — see and interact with ANY application without screenshots
Project description
agent-eyes
Accessibility-tree vision for AI agents — see and interact with any application without screenshots.
Instead of pixel-based screen capture, agent-eyes reads the OS accessibility tree to give AI agents a structured, semantic view of every UI element on screen. The tree is the vision.
Key Advantages
- No screenshots needed — works through accessibility APIs, not pixels
- Cross-platform — macOS (AXUIElement), Windows (UI Automation), Linux (AT-SPI2)
- Native + Web — interact with desktop apps and Chrome tabs from one server
- Shadow mode — control Chrome in the background without stealing window focus
- Human-like input — real keyboard/mouse events that trigger all event listeners
- Element IDs — every UI element gets an
[id]for precise click/type targeting - OCR fallback — for apps with sparse accessibility trees, get text via screen OCR
Installation
Run directly via uvx (no install needed):
uvx agent-eyes
Linux only — install AT-SPI2 via system package manager
apt install python3-pyatspi # Debian/Ubuntu
dnf install python3-pyatspi # Fedora
Requirements: Python 3.10+ • Chrome with --remote-debugging-port=9222 for web tools
Quick Start
As an MCP server
Add to your Claude Code config (~/.claude.json):
{
"mcpServers": {
"agent-eyes": {
"command": "uvx",
"args": ["agent-eyes"]
}
}
}
Standalone
agent-eyes
Tools (28)
Orientation
| Tool | Description |
|---|---|
eyes_status |
Check platform adapter, permissions, CDP availability |
eyes_context |
Quick snapshot — frontmost app, active window, focused element |
eyes_list_apps |
List all running apps with PIDs and window titles |
eyes_get_focused |
Get the currently focused UI element |
Reading UI
| Tool | Description |
|---|---|
eyes_get_tree |
Full accessibility tree of an app by PID |
eyes_get_subtree |
Drill into a specific subtree by element ID |
eyes_find |
Search elements by role, name, or value (regex/contains/exact) |
eyes_element_at |
Identify the element at screen coordinates |
eyes_get_ocr_hints |
OCR fallback — get text blocks with coordinates |
Interaction
| Tool | Description |
|---|---|
eyes_click |
Click an element by ID or screen coordinates |
eyes_type |
Type text into a field with real key events |
eyes_press_key |
Press keys with modifiers (Enter, Tab, Ctrl+C, etc.) |
eyes_hover |
Hover to trigger tooltips and :hover states |
eyes_scroll |
Scroll vertically/horizontally in apps or browser |
eyes_drag |
Drag and drop between coordinates |
eyes_fill_form |
Fill multiple form fields in one call |
eyes_file_upload |
Upload files to a file input element |
eyes_wait_for |
Poll until an element appears (with timeout) |
App Management
| Tool | Description |
|---|---|
eyes_app |
Launch, quit, or focus an application |
eyes_window |
List, focus, minimize, close, move, or resize windows |
Chrome / Web
| Tool | Description |
|---|---|
eyes_list_chrome_tabs |
List all Chrome tabs (title, URL) |
eyes_get_web_tree |
Chrome tab accessibility tree via CDP |
eyes_navigate |
Navigate a tab to a URL |
eyes_evaluate |
Execute JavaScript in a tab |
eyes_new_tab |
Open a new Chrome tab |
eyes_close_tab |
Close a Chrome tab |
eyes_handle_dialog |
Accept/dismiss JS dialogs (alert, confirm, prompt) |
Shadow Mode
| Tool | Description |
|---|---|
eyes_shadow |
Control Chrome without focusing it — click, type, scroll, read, run JS |
How It Works
AI Agent
↓ MCP
agent-eyes server
├── Native Adapter (macOS / Windows / Linux)
│ └── OS Accessibility API → structured UI tree
├── CDP Client (Chrome DevTools Protocol)
│ └── Chrome tabs → web accessibility tree + JS execution
└── Input Simulator
└── Real keyboard/mouse events → human-like interaction
- Read —
eyes_get_treereturns every button, text field, heading, link, etc. as a numbered tree - Find —
eyes_findsearches by role/name/value, oreyes_element_atfor coordinate lookup - Act —
eyes_click,eyes_type,eyes_press_keytarget elements by their[id]
Supported Platforms
| Platform | Native Adapter | Web (Chrome) | Shadow Mode |
|---|---|---|---|
| macOS | AXUIElement + pyobjc | CDP + AppleScript fallback | Yes |
| Windows | UI Automation + pywinauto | CDP | Yes |
| Linux | AT-SPI2 + pyatspi | CDP | Yes |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_eyes-0.3.1.tar.gz.
File metadata
- Download URL: agent_eyes-0.3.1.tar.gz
- Upload date:
- Size: 77.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99340005b981f8b47b23f1b6a861f9d8498a25fc3315add3a56d4bf74d7adae7
|
|
| MD5 |
1b31f7ee8754e245fb66195342c4a42f
|
|
| BLAKE2b-256 |
ae9727b560d6d66d487838babc7770c63ac4628f74cb071df2b9c325d9901ca4
|
Provenance
The following attestation bundles were made for agent_eyes-0.3.1.tar.gz:
Publisher:
publish.yml on jellythomas/agent-eyes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_eyes-0.3.1.tar.gz -
Subject digest:
99340005b981f8b47b23f1b6a861f9d8498a25fc3315add3a56d4bf74d7adae7 - Sigstore transparency entry: 1154171815
- Sigstore integration time:
-
Permalink:
jellythomas/agent-eyes@e227d8962a4a52cf6648082d8a9c38673aed3b42 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/jellythomas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e227d8962a4a52cf6648082d8a9c38673aed3b42 -
Trigger Event:
release
-
Statement type:
File details
Details for the file agent_eyes-0.3.1-py3-none-any.whl.
File metadata
- Download URL: agent_eyes-0.3.1-py3-none-any.whl
- Upload date:
- Size: 87.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1823b42305b90d6774d7d35fc65ba73f75d0fbfb2e8845a75ceb1a904a9a9e7c
|
|
| MD5 |
eda5bb7b169546868cad6b202addd706
|
|
| BLAKE2b-256 |
d8842b8039b649a5112aa60f11c6953e746df305717564a246cc8467f0b9392d
|
Provenance
The following attestation bundles were made for agent_eyes-0.3.1-py3-none-any.whl:
Publisher:
publish.yml on jellythomas/agent-eyes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent_eyes-0.3.1-py3-none-any.whl -
Subject digest:
1823b42305b90d6774d7d35fc65ba73f75d0fbfb2e8845a75ceb1a904a9a9e7c - Sigstore transparency entry: 1154171819
- Sigstore integration time:
-
Permalink:
jellythomas/agent-eyes@e227d8962a4a52cf6648082d8a9c38673aed3b42 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/jellythomas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e227d8962a4a52cf6648082d8a9c38673aed3b42 -
Trigger Event:
release
-
Statement type: