Windows GUI Automation MCP server for AI coding agents
Project description
windows-gui-mcp
Windows GUI Automation MCP server for AI coding agents.
windows-gui-mcp helps agents operate Windows desktop applications through
semantic UI Automation instead of brittle coordinate clicks. It is designed for
agent workflows that need to inspect a live Windows UI, act on stable
identifiers, verify every action, and turn successful sessions into reusable
scripts.
Why this exists
AI agents can work reliably with web pages because browsers expose structured DOM state. Windows desktop applications are harder: the visible UI is often stateful, asynchronous, and easy to break with raw coordinates.
This project exposes a small MCP toolset that keeps the agent in a safer loop:
- Discover visible windows.
- Focus the target window.
- Dump the UI Automation tree.
- Find controls by stable identifiers.
- Act with post-action verification.
- Use OCR or image fallback only after semantic lookup fails.
- Generate a pywinauto replay script from the trace.
Tooling model
AI coding agent
|
| MCP stdio
v
windows_gui_mcp.server
|
v
tools/dispatch + trace recorder
|
+-- window / element / input / verify / wait
+-- screenshot / OCR / fallback / trace-to-script
|
v
Windows backend ladder
|
+-- pywinauto UIA first choice
+-- pywinauto win32 legacy fallback
+-- pyautogui image/coordinate last resort
MCP tools
| Tool | Purpose |
|---|---|
list_windows |
Enumerate visible top-level windows. |
focus_window |
Bring a title-matching window to the foreground and verify focus. |
dump_ui_tree |
Dump the UIA tree so the agent can choose stable identifiers. |
find_element |
Locate one control by automation_id, name, control_type, or class_name. |
click_element |
Click a semantically identified control and verify the post-condition. |
type_text |
Type into a target control and optionally verify the value. |
hotkey |
Send a pywinauto-style key chord such as ^s or %{F4}. |
screenshot |
Capture the screen, a window, or a region. |
wait_until_element |
Wait for a control to exist, become visible, or become enabled. |
verify_text_exists |
Verify text through UIA first, OCR only when requested. |
fallback_click_by_image_or_ocr |
Last-resort click by image template or OCR anchor. |
generate_stable_script_from_trace |
Convert the current trace into a pywinauto replay script. |
Install
Python 3.12 or newer is required.
For normal Windows agent use:
py -3.12 -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install "windows-gui-mcp[windows,ocr]"
For local development from this repository:
python -m venv .venv
./.venv/bin/python -m pip install --upgrade pip
./.venv/bin/python -m pip install -e ".[dev]"
On Windows, install the optional runtime extras when you want live GUI control:
.\.venv\Scripts\python -m pip install -e ".[dev,windows,ocr]"
OCR support is optional. If you use Tesseract OCR, install the Windows package
separately and make sure tesseract.exe is on PATH.
Run
Start the MCP server on the Windows machine that owns the desktop session:
windows-gui-mcp
Check CLI metadata without starting the MCP stdio transport:
windows-gui-mcp --help
windows-gui-mcp --version
Example local MCP client config:
{
"mcpServers": {
"windows-gui": {
"command": "windows-gui-mcp"
}
}
}
Example SSH-based config from another machine:
{
"mcpServers": {
"windows-gui": {
"command": "ssh",
"args": [
"user@windows-host",
"C:\\path\\to\\windows-gui-mcp\\.venv\\Scripts\\windows-gui-mcp.exe"
]
}
}
}
Example workflow
This is the intended agent loop for a Notepad or Calculator task:
1. list_windows()
2. focus_window(title_regex="Notepad|Calculator")
3. dump_ui_tree(window_handle=...)
4. find_element(spec={"name": "Save", "control_type": "Button"})
5. click_element(
spec={"name": "Save", "control_type": "Button"},
expect_element_after={"class_name": "#32770"}
)
6. type_text(
spec={"automation_id": "1001"},
text="agent-notes.txt",
verify_value_contains="agent-notes.txt"
)
7. hotkey("%{ENTER}")
8. generate_stable_script_from_trace()
See examples/notepad_calculator.md for a longer walkthrough.
Safety rules
- Prefer
automation_id, thenname, thencontrol_type, thenclass_name. - Do not start with screen coordinates.
- Verify every click or text entry with a concrete post-condition.
- Re-dump the UI tree after a failed verification instead of retrying blindly.
- Treat OCR and image matching as fallbacks, not the primary automation path.
Development checks
python -m compileall -q src tests
python -m pytest -q
ruff check .
python -m build
twine check dist/*
Contributing and security
See CONTRIBUTING.md for development workflow and automation design rules. See SECURITY.md for vulnerability reporting and desktop automation safety expectations.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file windows_gui_mcp-0.1.0.tar.gz.
File metadata
- Download URL: windows_gui_mcp-0.1.0.tar.gz
- Upload date:
- Size: 27.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad435ff0cbaf0f2f489d484cb0380a317a05f20936104af4032faed580668448
|
|
| MD5 |
5663c8279e688f66118322d2003bd06f
|
|
| BLAKE2b-256 |
7971908db9a285bce0815435572ec0b36906a2736a5677368c81562e252d00b7
|
Provenance
The following attestation bundles were made for windows_gui_mcp-0.1.0.tar.gz:
Publisher:
publish.yml on dcl632/windows-gui-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
windows_gui_mcp-0.1.0.tar.gz -
Subject digest:
ad435ff0cbaf0f2f489d484cb0380a317a05f20936104af4032faed580668448 - Sigstore transparency entry: 1713523478
- Sigstore integration time:
-
Permalink:
dcl632/windows-gui-mcp@69336fc98995784d57d94fdb48c1e23dee7c0dfe -
Branch / Tag:
refs/heads/main - Owner: https://github.com/dcl632
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@69336fc98995784d57d94fdb48c1e23dee7c0dfe -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file windows_gui_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: windows_gui_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28ff9b4e985dbe645581af78f0e3a639b07d6cd4c3865b4b65be726c6852bcb5
|
|
| MD5 |
7338aeecfce8ce398a4ea58473e9c308
|
|
| BLAKE2b-256 |
2e83d4464bf0ca8dae39a37335a52ab2a614641e8cad5e08e80606cd121b8a30
|
Provenance
The following attestation bundles were made for windows_gui_mcp-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on dcl632/windows-gui-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
windows_gui_mcp-0.1.0-py3-none-any.whl -
Subject digest:
28ff9b4e985dbe645581af78f0e3a639b07d6cd4c3865b4b65be726c6852bcb5 - Sigstore transparency entry: 1713523571
- Sigstore integration time:
-
Permalink:
dcl632/windows-gui-mcp@69336fc98995784d57d94fdb48c1e23dee7c0dfe -
Branch / Tag:
refs/heads/main - Owner: https://github.com/dcl632
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@69336fc98995784d57d94fdb48c1e23dee7c0dfe -
Trigger Event:
workflow_dispatch
-
Statement type: