Local Windows desktop control for AI agents — Python library and CLI.
Project description
agent-aid
A small Python library and CLI to control the local Windows desktop from AI
agents. Open Interpreter, GPT, Claude, or your own agent — call agent-aid to
read the screen, drive mouse and keyboard, manage windows, and find UI
elements by their accessibility name (no coordinate guessing). Two
dependencies: mss (screen capture) and uiautomation (UI tree).
pip install agent-aid
With uv (recommended — installs Python automatically if needed):
uv tool install agent-aid --python 3.11
Quick look
agent-aid health
agent-aid state
agent-aid screenshot active_window=true save_path=captures/active.png include_base64=false
agent-aid click x=500 y=300
agent-aid type text="hello"
agent-aid press keys=ctrl+s
agent-aid focus_window title_fragment=Chrome
agent-aid open target=https://example.com
List every route:
agent-aid --list
Print the full AI-oriented usage spec to stdout (markdown):
agent-aid --readme
Capabilities
- Screenshots: full desktop, single monitor, active window, specific
hwnd, or rectangular region - PNG
sha256hash returned for every screenshot (use withwait_screen_change) - Mouse: click, double-click, right-click, drag, move, scroll (vertical/horizontal), hold buttons
- Drag with optional
pre_hold_ms,post_hold_ms,stepsfor finicky shape/preview UIs - Keyboard: short text, hotkeys (
ctrl+shift+a), modifier hold/release - Clipboard:
clipboard text=...+press keys=ctrl+vfor long pastes - Windows: find, focus, minimize/maximize/restore/close, move/resize, hide/show
- UI Automation: find/click/read/write elements by accessibility name (no coordinate guessing)
- System: list processes, open file/URL/
shell:target, read pixel color, query state - Verify:
wait_screen_change,wait_pixel,wait_window batchfor atomic multi-step flows in a single CLI call
UI Automation (recommended over raw coordinates)
# Find an element by partial name match (case-insensitive)
agent-aid ui_find name="Save"
agent-aid ui_find name="Yapıştır" control_type=Button hwnd=12345
# Find every match (use index=N to pick the n-th, e.g. for repeated names)
agent-aid ui_find_all control_type=Button hwnd=12345 limit=20
agent-aid ui_click name="Delete" index=1 # 2nd "Delete"
# Find + click in one call
agent-aid ui_click name="OK"
# Read the text/value of a control (uses ValuePattern → TextPattern → Name)
agent-aid ui_text automation_id="UrlBar" hwnd=12345
# Append text into an edit/combo via UIA (safe-by-default: does NOT clear
# existing content, inserts at cursor). Full Unicode, no key races.
agent-aid ui_set_value automation_id="UrlBar" hwnd=12345 text="https://example.com"
# Replace the entire field — opt-in destructive mode
agent-aid ui_set_value automation_id="UrlBar" hwnd=12345 text="..." clear=true
# Dump the accessibility tree (filter by control_type to keep small)
agent-aid ui_dump hwnd=12345 max_depth=4 limit=100
agent-aid ui_dump hwnd=12345 max_depth=10 control_type=ListItem
Works on Paint, Office, Edge, Chrome, native Win32, and most modern UWP apps.
Selector args: name (partial), control_type, automation_id (exact),
class_name (partial), hwnd (root scope), index, max_depth, timeout_ms.
Targeting windows / coordinates
All coordinates are physical screen pixels. To work relative to a specific window:
agent-aid click x=120 y=80 relative_to=active_window
agent-aid click x=120 y=80 hwnd=123456
Use as a Python library
from agent_aid import core
core.set_dpi_aware()
print(core.active_window())
core.click(800, 500)
core.type_text("hello")
Same capabilities as the CLI — just call the core module directly from your
Python code.
Argument formats
# key=value (shortest)
agent-aid click x=500 y=300 button=left
# JSON (for nested fields)
agent-aid screenshot '{"region":{"left":0,"top":0,"width":800,"height":600},"save_path":"r.png"}'
# Pretty-print output
agent-aid --pretty state
Practical AI agent flow
agent-aid state— see what you're looking atagent-aid screenshot active_window=true save_path=captures/now.png include_base64=false- Inspect the image, choose target coordinates
- Act:
click/type/press/clipboard - Verify: another
screenshotorwait_screen_change
Safety notes
- Sends real mouse and keyboard input — types into the focused window.
clipboardoverwrites the user's clipboard.openlaunches a Windows target (same effect as a user double-click).window/manage closepostsWM_CLOSE— apps with unsaved data may prompt.- After any action, prefer to verify with
wait_*or a freshscreenshot.
License
MIT.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_aid-1.7.0.tar.gz.
File metadata
- Download URL: agent_aid-1.7.0.tar.gz
- Upload date:
- Size: 26.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78dfe94a0bb3daf5e840fc2fdc49f6eace25a0a2f6d0d541bf6f6d721a380354
|
|
| MD5 |
794148c9850c5d1e3f2556de08f1af9f
|
|
| BLAKE2b-256 |
44c2b97c0f022cb05bed98cc1df472baee384c47a3de00df5dc92b3ccf1ae3d7
|
File details
Details for the file agent_aid-1.7.0-py3-none-any.whl.
File metadata
- Download URL: agent_aid-1.7.0-py3-none-any.whl
- Upload date:
- Size: 27.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99fd896fa3fa365aa672e4533713167d14aa5fa2ca5e9955c3858f1cb58d52a3
|
|
| MD5 |
e6de538989bc3b260b514ea2f2759ec1
|
|
| BLAKE2b-256 |
e29e7c61869c07c148929f7604ac57728cc5aef93984a81aed77e92f73b9afb5
|