Skip to main content

Local Windows desktop control for AI agents — Python library and CLI.

Project description

agent-aid

A small Python library and CLI to control the local Windows desktop from AI agents. Open Interpreter, GPT, Claude, or your own agent — call agent-aid to read the screen, drive mouse and keyboard, manage windows, and find UI elements by their accessibility name (no coordinate guessing). Two dependencies: mss (screen capture) and uiautomation (UI tree).

pip install agent-aid

With uv (recommended — installs Python automatically if needed):

uv tool install agent-aid --python 3.11

Quick look

agent-aid health
agent-aid state
agent-aid screenshot active_window=true save_path=captures/active.png include_base64=false
agent-aid click x=500 y=300
agent-aid type text="hello"
agent-aid press keys=ctrl+s
agent-aid focus_window title_fragment=Chrome
agent-aid open target=https://example.com

List every route:

agent-aid --list

Print the full AI-oriented usage spec to stdout (markdown):

agent-aid --readme

Capabilities

  • Screenshots: full desktop, single monitor, active window, specific hwnd, or rectangular region
  • PNG sha256 hash returned for every screenshot (use with wait_screen_change)
  • Mouse: click, double-click, right-click, drag, move, scroll (vertical/horizontal), hold buttons
  • Drag with optional pre_hold_ms, post_hold_ms, steps for finicky shape/preview UIs
  • Keyboard: short text, hotkeys (ctrl+shift+a), modifier hold/release
  • Clipboard: clipboard text=... + press keys=ctrl+v for long pastes
  • Windows: find, focus, minimize/maximize/restore/close, move/resize, hide/show
  • UI Automation: find/click/read/write elements by accessibility name (no coordinate guessing)
  • System: list processes, open file/URL/shell: target, read pixel color, query state
  • Verify: wait_screen_change, wait_pixel, wait_window
  • batch for atomic multi-step flows in a single CLI call

UI Automation (recommended over raw coordinates)

# Find an element by partial name match (case-insensitive)
agent-aid ui_find name="Save"
agent-aid ui_find name="Yapıştır" control_type=Button hwnd=12345

# Find every match (use index=N to pick the n-th, e.g. for repeated names)
agent-aid ui_find_all control_type=Button hwnd=12345 limit=20
agent-aid ui_click name="Delete" index=1                       # 2nd "Delete"

# Find + click in one call
agent-aid ui_click name="OK"

# Read the text/value of a control (uses ValuePattern → TextPattern → Name)
agent-aid ui_text automation_id="UrlBar" hwnd=12345

# Append text into an edit/combo via UIA (safe-by-default: does NOT clear
# existing content, inserts at cursor). Full Unicode, no key races.
agent-aid ui_set_value automation_id="UrlBar" hwnd=12345 text="https://example.com"

# Replace the entire field — opt-in destructive mode
agent-aid ui_set_value automation_id="UrlBar" hwnd=12345 text="..." clear=true

# Dump the accessibility tree (filter by control_type to keep small)
agent-aid ui_dump hwnd=12345 max_depth=4 limit=100
agent-aid ui_dump hwnd=12345 max_depth=10 control_type=ListItem

Works on Paint, Office, Edge, Chrome, native Win32, and most modern UWP apps. Selector args: name (partial), control_type, automation_id (exact), class_name (partial), hwnd (root scope), index, max_depth, timeout_ms.

Targeting windows / coordinates

All coordinates are physical screen pixels. To work relative to a specific window:

agent-aid click x=120 y=80 relative_to=active_window
agent-aid click x=120 y=80 hwnd=123456

Use as a Python library

from agent_aid import core

core.set_dpi_aware()
print(core.active_window())
core.click(800, 500)
core.type_text("hello")

Same capabilities as the CLI — just call the core module directly from your Python code.

Argument formats

# key=value (shortest)
agent-aid click x=500 y=300 button=left

# JSON (for nested fields)
agent-aid screenshot '{"region":{"left":0,"top":0,"width":800,"height":600},"save_path":"r.png"}'

# Pretty-print output
agent-aid --pretty state

Practical AI agent flow

  1. agent-aid state — see what you're looking at
  2. agent-aid screenshot active_window=true save_path=captures/now.png include_base64=false
  3. Inspect the image, choose target coordinates
  4. Act: click / type / press / clipboard
  5. Verify: another screenshot or wait_screen_change

Safety notes

  • Sends real mouse and keyboard input — types into the focused window.
  • clipboard overwrites the user's clipboard.
  • open launches a Windows target (same effect as a user double-click).
  • window/manage close posts WM_CLOSE — apps with unsaved data may prompt.
  • After any action, prefer to verify with wait_* or a fresh screenshot.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_aid-1.7.0.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_aid-1.7.0-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file agent_aid-1.7.0.tar.gz.

File metadata

  • Download URL: agent_aid-1.7.0.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_aid-1.7.0.tar.gz
Algorithm Hash digest
SHA256 78dfe94a0bb3daf5e840fc2fdc49f6eace25a0a2f6d0d541bf6f6d721a380354
MD5 794148c9850c5d1e3f2556de08f1af9f
BLAKE2b-256 44c2b97c0f022cb05bed98cc1df472baee384c47a3de00df5dc92b3ccf1ae3d7

See more details on using hashes here.

File details

Details for the file agent_aid-1.7.0-py3-none-any.whl.

File metadata

  • Download URL: agent_aid-1.7.0-py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_aid-1.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 99fd896fa3fa365aa672e4533713167d14aa5fa2ca5e9955c3858f1cb58d52a3
MD5 e6de538989bc3b260b514ea2f2759ec1
BLAKE2b-256 e29e7c61869c07c148929f7604ac57728cc5aef93984a81aed77e92f73b9afb5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page