Skip to main content

Local Windows desktop control for AI agents — Python library and CLI.

Project description

agent-aid

A small Python library and CLI to control the local Windows desktop from AI agents. Open Interpreter, GPT, Claude, or your own agent — call agent-aid to read the screen, drive mouse and keyboard, manage windows, and find UI elements by their accessibility name (no coordinate guessing). Two dependencies: mss (screen capture) and uiautomation (UI tree).

pip install agent-aid

With uv (recommended — installs Python automatically if needed):

uv tool install agent-aid --python 3.11

Quick look

agent-aid health
agent-aid state
agent-aid screenshot active_window=true save_path=captures/active.png include_base64=false
agent-aid click x=500 y=300
agent-aid type text="hello"
agent-aid press keys=ctrl+s
agent-aid focus_window title_fragment=Chrome
agent-aid open target=https://example.com

List every route:

agent-aid --list

Print the full AI-oriented usage spec to stdout (markdown):

agent-aid --readme

Capabilities

  • Screenshots: full desktop, single monitor, active window, specific hwnd, or rectangular region
  • PNG sha256 hash returned for every screenshot (use with wait_screen_change)
  • Mouse: click, double-click, right-click, drag, move, scroll (vertical/horizontal), hold buttons
  • Drag with optional pre_hold_ms, post_hold_ms, steps for finicky shape/preview UIs
  • Keyboard: short text, hotkeys (ctrl+shift+a), modifier hold/release
  • Clipboard: clipboard text=... + press keys=ctrl+v for long pastes
  • Windows: find, focus, minimize/maximize/restore/close, move/resize, hide/show
  • UI Automation: find/click elements by accessibility name (no coordinate guessing)
  • System: list processes, open file/URL/shell: target, read pixel color, query state
  • Verify: wait_screen_change, wait_pixel, wait_window
  • batch for atomic multi-step flows in a single CLI call

UI Automation (recommended over raw coordinates)

# Find an element by partial name match (case-insensitive)
agent-aid ui_find name="Save"
agent-aid ui_find name="Yapıştır" control_type=Button hwnd=12345

# Find + click in one call
agent-aid ui_click name="OK"

# Read the text/value of a control
agent-aid ui_text automation_id="UrlBar" hwnd=12345

# Dump the accessibility tree for exploration
agent-aid ui_dump hwnd=12345 max_depth=4 limit=100

Works on Paint, Office, Edge, Chrome, native Win32, and most modern UWP apps. Selector args: name (partial), control_type, automation_id (exact), class_name (partial), hwnd (root scope), max_depth, timeout_ms.

Targeting windows / coordinates

All coordinates are physical screen pixels. To work relative to a specific window:

agent-aid click x=120 y=80 relative_to=active_window
agent-aid click x=120 y=80 hwnd=123456

Use as a Python library

from agent_aid import core

core.set_dpi_aware()
print(core.active_window())
core.click(800, 500)
core.type_text("hello")

Same capabilities as the CLI — just call the core module directly from your Python code.

Argument formats

# key=value (shortest)
agent-aid click x=500 y=300 button=left

# JSON (for nested fields)
agent-aid screenshot '{"region":{"left":0,"top":0,"width":800,"height":600},"save_path":"r.png"}'

# Pretty-print output
agent-aid --pretty state

Practical AI agent flow

  1. agent-aid state — see what you're looking at
  2. agent-aid screenshot active_window=true save_path=captures/now.png include_base64=false
  3. Inspect the image, choose target coordinates
  4. Act: click / type / press / clipboard
  5. Verify: another screenshot or wait_screen_change

Safety notes

  • Sends real mouse and keyboard input — types into the focused window.
  • clipboard overwrites the user's clipboard.
  • open launches a Windows target (same effect as a user double-click).
  • window/manage close posts WM_CLOSE — apps with unsaved data may prompt.
  • After any action, prefer to verify with wait_* or a fresh screenshot.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_aid-1.6.1.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_aid-1.6.1-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file agent_aid-1.6.1.tar.gz.

File metadata

  • Download URL: agent_aid-1.6.1.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_aid-1.6.1.tar.gz
Algorithm Hash digest
SHA256 30cb817e173388670903827de2d27c502cb51add0337c69867cd72a7cd7b963b
MD5 135b08e7e773fc5e5f90793fcbbabafb
BLAKE2b-256 179cfe39088dd2267d2303ab8d746ff5030fef349e97ca80aab89a6b3e4d07e6

See more details on using hashes here.

File details

Details for the file agent_aid-1.6.1-py3-none-any.whl.

File metadata

  • Download URL: agent_aid-1.6.1-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_aid-1.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 82f2ddd85ce23215442cc11d091b36691792a6220b1ea8b677aa7b949f37199f
MD5 67d6adee85aea722ec64bea3a26fb7cf
BLAKE2b-256 08fa141e6730495c6bf96e53be7e462fc81d433d5ba019778a94a7d4166b7e4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page