Skip to main content

Windows GUI Automation MCP server for AI coding agents

Project description

windows-gui-mcp

Windows GUI Automation MCP server for AI coding agents.

windows-gui-mcp helps agents operate Windows desktop applications through semantic UI Automation instead of brittle coordinate clicks. It is designed for agent workflows that need to inspect a live Windows UI, act on stable identifiers, verify every action, and turn successful sessions into reusable scripts.

Why this exists

AI agents can work reliably with web pages because browsers expose structured DOM state. Windows desktop applications are harder: the visible UI is often stateful, asynchronous, and easy to break with raw coordinates.

This project exposes a small MCP toolset that keeps the agent in a safer loop:

  1. Discover visible windows.
  2. Focus the target window.
  3. Dump the UI Automation tree.
  4. Find controls by stable identifiers.
  5. Act with post-action verification.
  6. Use OCR or image fallback only after semantic lookup fails.
  7. Generate a pywinauto replay script from the trace.

Tooling model

AI coding agent
      |
      | MCP stdio
      v
windows_gui_mcp.server
      |
      v
tools/dispatch + trace recorder
      |
      +-- window / element / input / verify / wait
      +-- screenshot / OCR / fallback / trace-to-script
      |
      v
Windows backend ladder
      |
      +-- pywinauto UIA      first choice
      +-- pywinauto win32    legacy fallback
      +-- pyautogui          image/coordinate last resort

MCP tools

Tool Purpose
list_windows Enumerate visible top-level windows.
focus_window Bring a title-matching window to the foreground and verify focus.
dump_ui_tree Dump the UIA tree so the agent can choose stable identifiers.
find_element Locate one control by automation_id, name, control_type, or class_name.
click_element Click a semantically identified control and verify the post-condition.
type_text Type into a target control and optionally verify the value.
hotkey Send a pywinauto-style key chord such as ^s or %{F4}.
screenshot Capture the screen, a window, or a region.
wait_until_element Wait for a control to exist, become visible, or become enabled.
verify_text_exists Verify text through UIA first, OCR only when requested.
fallback_click_by_image_or_ocr Last-resort click by image template or OCR anchor.
generate_stable_script_from_trace Convert the current trace into a pywinauto replay script.

Install

Python 3.12 or newer is required.

For normal Windows agent use:

py -3.12 -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install "windows-gui-mcp[windows,ocr]"

For local development from this repository:

python -m venv .venv
./.venv/bin/python -m pip install --upgrade pip
./.venv/bin/python -m pip install -e ".[dev]"

On Windows, install the optional runtime extras when you want live GUI control:

.\.venv\Scripts\python -m pip install -e ".[dev,windows,ocr]"

OCR support is optional. If you use Tesseract OCR, install the Windows package separately and make sure tesseract.exe is on PATH.

Run

Start the MCP server on the Windows machine that owns the desktop session:

windows-gui-mcp

Check CLI metadata without starting the MCP stdio transport:

windows-gui-mcp --help
windows-gui-mcp --version

Example local MCP client config:

{
  "mcpServers": {
    "windows-gui": {
      "command": "windows-gui-mcp"
    }
  }
}

Example SSH-based config from another machine:

{
  "mcpServers": {
    "windows-gui": {
      "command": "ssh",
      "args": [
        "user@windows-host",
        "C:\\path\\to\\windows-gui-mcp\\.venv\\Scripts\\windows-gui-mcp.exe"
      ]
    }
  }
}

Example workflow

This is the intended agent loop for a Notepad or Calculator task:

1. list_windows()
2. focus_window(title_regex="Notepad|Calculator")
3. dump_ui_tree(window_handle=...)
4. find_element(spec={"name": "Save", "control_type": "Button"})
5. click_element(
     spec={"name": "Save", "control_type": "Button"},
     expect_element_after={"class_name": "#32770"}
   )
6. type_text(
     spec={"automation_id": "1001"},
     text="agent-notes.txt",
     verify_value_contains="agent-notes.txt"
   )
7. hotkey("%{ENTER}")
8. generate_stable_script_from_trace()

See examples/notepad_calculator.md for a longer walkthrough.

Safety rules

  • Prefer automation_id, then name, then control_type, then class_name.
  • Do not start with screen coordinates.
  • Verify every click or text entry with a concrete post-condition.
  • Re-dump the UI tree after a failed verification instead of retrying blindly.
  • Treat OCR and image matching as fallbacks, not the primary automation path.

Development checks

python -m compileall -q src tests
python -m pytest -q
ruff check .
python -m build
twine check dist/*

Contributing and security

See CONTRIBUTING.md for development workflow and automation design rules. See SECURITY.md for vulnerability reporting and desktop automation safety expectations.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

windows_gui_mcp-0.1.0.tar.gz (27.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

windows_gui_mcp-0.1.0-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file windows_gui_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: windows_gui_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 27.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for windows_gui_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ad435ff0cbaf0f2f489d484cb0380a317a05f20936104af4032faed580668448
MD5 5663c8279e688f66118322d2003bd06f
BLAKE2b-256 7971908db9a285bce0815435572ec0b36906a2736a5677368c81562e252d00b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for windows_gui_mcp-0.1.0.tar.gz:

Publisher: publish.yml on dcl632/windows-gui-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file windows_gui_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for windows_gui_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 28ff9b4e985dbe645581af78f0e3a639b07d6cd4c3865b4b65be726c6852bcb5
MD5 7338aeecfce8ce398a4ea58473e9c308
BLAKE2b-256 2e83d4464bf0ca8dae39a37335a52ab2a614641e8cad5e08e80606cd121b8a30

See more details on using hashes here.

Provenance

The following attestation bundles were made for windows_gui_mcp-0.1.0-py3-none-any.whl:

Publisher: publish.yml on dcl632/windows-gui-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page