Skip to main content

Windows desktop automation for AI agents - screenshot, mouse, keyboard, windows, UI automation, OCR

Project description

win-computer-use

Windows 桌面自动化工具包 — 让 AI Agent 像人一样操控 Windows 桌面应用。

English

[!TIP] 现在支持 pip install 快速安装! 详见下方 快速安装

[!IMPORTANT] Windows Only! 本工具依赖 pywinauto(Windows UI Automation),不支持 macOS / Linux


这是什么?

对标 OpenAI Codex / Anthropic Claude 的 Computer Use 能力,但是:

  • 跑在你自己的 Windows 机器上,不需要远程 VM
  • 零按 token 计费,截图/操作不限次数
  • 支持中文输入(通过剪贴板)
  • 支持任意 Windows 应用:Win32 / WinForms / WPF / Qt / Electron / UWP

当 AI Agent 遇到浏览器覆盖不到的场景(原生桌面应用),可加载本工具进行自动化操作。


能做什么?

能力 典型命令
📸 截图 全屏截图、区域截图、返回 base64
🖱️ 鼠标 移动、单击、双击、右键、拖拽、滚轮
⌨️ 键盘 输入文本(含中文)、组合快捷键、按键按下/抬起
🪟 窗口管理 列出所有窗口、激活、最小化、最大化、关闭
🔍 UI 自动化 按控件名/类型查找元素、点击、填文本、等待元素出现
🖼️ 图像模板匹配 OpenCV 找图、点击图标、等待图像出现(NMS 去重)
🔤 OCR(可选) 屏幕文字识别(需安装 Tesseract)
🛡️ 安全机制 Emergency Stop、Failsafe(鼠标甩到角落急停)

快速安装

方式一:pip install(推荐)

pip install win-computer-use

安装完成后,可以直接使用命令:

win-computer-use --help
win-computer-use screenshot --output test.png
win-computer-use click 500 500

方式二:从源码安装(开发用)

# 克隆仓库
git clone https://github.com/CarlosShao/win-computer-use.git
cd win-computer-use

# Windows: 运行安装脚本
install.bat

# Linux/macOS: 运行安装脚本
bash install.sh

# 或手动安装
python -m venv .venv
.venv/Scripts/pip install -e .

安装(详细)

1. 克隆仓库

git clone https://github.com/CarlosShao/win-computer-use.git
cd win-computer-use

2. 安装(Windows)

推荐:运行安装脚本(自动创建虚拟环境并安装依赖)

install.bat

或手动安装

# 进入工具目录
cd win-computer-use

# 创建隔离虚拟环境
python -m venv .venv

# 安装依赖(仅 Windows)
.venv\Scripts\pip.exe install pyautogui pywinauto opencv-python numpy mss pillow pytesseract rapidocr-onnxruntime

注意:Python 版本要求 ≥ 3.10。如果系统没有 python 命令,请用 python3 替代。

⚠️ 本工具仅支持 Windowspywinauto 是 Windows 专属依赖,无法在 macOS / Linux 上运行。

OCR 说明:默认使用 RapidOCR 后端(无需安装 Tesseract),首次运行会自动下载模型(~40MB)。如需使用 Tesseract,可加 --backend tesseract 参数。

3.(可选)安装 Tesseract OCR

如果需要 OCR 功能:


使用示例

示例 1:自动填表(记事本)

用户:帮我在记事本里输入"你好世界",然后保存为 test.txt

AI 加载本工具后自动执行:

  1. list-windows 找到记事本
  2. activate-window 激活窗口
  3. type "你好世界" 输入中文
  4. hotkey ctrl s 触发保存
  5. set-text 填文件名
  6. click-element 点保存按钮

示例 2:找图点击

用户:屏幕上有个"确定"按钮的图标,帮我找到并点击它
  1. 先从截图中裁剪出"确定"图标另存为 ok_btn.png
  2. find-image ok_btn.png → 返回坐标
  3. click-image ok_btn.png → 自动移动鼠标并点击

示例 3:UI Automation(计算器)

用户:打开计算器,算一下 123 * 456
  1. find-window --title "计算器" 找窗口
  2. find-element --title "计算器" --control_type Button --name "一" 找按钮
  3. click-element 依次点击数字和运算符
  4. element-text 读取结果

CLI 全命令参考

点击展开全部命令
python cli.py --help

# 截图
screenshot --output <path> [--base64]
screen-size
pixel --x <n> --y <n>

# 鼠标
mouse-position
move --x <n> --y <n> [--duration <s>]
click [--x <n> --y <n>] [--button left|middle|right]
double-click [--x <n> --y <n>]
right-click [--x <n> --y <n>]
drag --x1 <n> --y1 <n> --x2 <n> --y2 <n>
scroll [--clicks <n>] [--x <n> --y <n>]

# 键盘
type --text <str>
hotkey <key1> [key2 ...]
key-press --key <str>
key-down --key <str>
key-up --key <str>
wait [--seconds <n>]

# 窗口
list-windows [--filter <str>]
find-window --title <str>
activate-window --title <str>
minimize --title <str>
maximize --title <str>
restore --title <str>
close-window --title <str>
window-rect --title <str>

# UI Automation (pywinauto)
find-element --title <str> --control_type <str> [--name <str>] [--auto_id <str>]
click-element ...
set-text --title <str> --control_type <str> --auto_id <str> --value <str>
element-text ...
wait-element ...

# 图像匹配 (OpenCV)
find-image <template_path> [--threshold <0-1>] [--region x,y,w,h]
click-image <template_path> [--threshold <0-1>]
wait-image <template_path> [--timeout <s>]
count-image <template_path> [--threshold <0-1>]

# OCR (需 Tesseract)
ocr [--region x,y,w,h]
ocr-words [--region x,y,w,h]

# 安全
emergency-stop
clear-stop
failsafe [on|off]
stop-status

安全机制

机制 说明
Emergency Stop 调用后所有操作命令被拒绝,直到 clear-stop
Failsafe 鼠标快速甩到屏幕四角任一角,立即终止所有进行中的操作
坐标越界保护 所有鼠标移动前校验坐标在屏幕范围内

测试

本仓库包含 TEST_PLAN.md,覆盖 Lv1(冒烟)→ Lv5(安全+E2E)共 55+ 测试用例。

建议逐级执行,确认每级通过后再正式使用或发布。


技术栈

  • PyAutoGUI — 鼠标/键盘底层控制
  • OpenCV (cv2) — 图像模板匹配 + NMS 去重
  • pywinauto — Windows UI Automation(控件树遍历)
  • mss — 高性能截图(比 PyAutoGUI 快 3-5x)
  • pytesseract — OCR(可选,需 Tesseract 二进制)
  • Pillow — 图像处理

License

MIT


English

[!WARNING] Not an install-and-use tool! Requires Python (>= 3.10) + pip dependencies on your local machine. See environment setup below.

[!IMPORTANT] Windows Only! This tool depends on pywinauto (Windows UI Automation) and does not support macOS / Linux.

What is this?

A Windows desktop automation toolkit that mirrors the "Computer Use" capability of OpenAI Codex / Anthropic Claude — but runs entirely on your own Windows machine, no remote VM, no per-token cost.

Supports any Windows app: Win32, WinForms, WPF, Qt, Electron, UWP.

Features

  • Screenshot (mss, 3-5x faster than PyAutoGUI)
  • Mouse control (move, click, drag, scroll)
  • Keyboard input (supports Chinese via clipboard)
  • Window management (list, activate, minimize, close)
  • UI Automation via pywinauto (find elements by name/automation id)
  • OpenCV template matching with NMS (Non-Maximum Suppression)
  • OCR via Tesseract (optional)
  • Safety: emergency stop + failsafe (mouse corner kill switch)

Install (Windows Only)

git clone https://github.com/CarlosShao/win-computer-use.git
cd win-computer-use
python -m venv .venv

# Install Python deps (Windows only)
.venv\Scripts\pip.exe install pyautogui pywinauto opencv-python numpy mss pillow pytesseract rapidocr-onnxruntime

Note: Requires Python >= 3.10. Use python3 if python is not available.

⚠️ Windows Only! pywinauto is Windows-specific and will not work on macOS / Linux.

OCR: Uses RapidOCR by default (no Tesseract needed). Models auto-download on first use (~40MB). Use --backend tesseract to switch.

Quick Start

User: Open Notepad, type "hello world", and save as test.txt

The agent will auto-load this toolkit and execute the full desktop automation flow.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

win_computer_use-1.0.0.tar.gz (53.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

win_computer_use-1.0.0-py3-none-any.whl (53.9 kB view details)

Uploaded Python 3

File details

Details for the file win_computer_use-1.0.0.tar.gz.

File metadata

  • Download URL: win_computer_use-1.0.0.tar.gz
  • Upload date:
  • Size: 53.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for win_computer_use-1.0.0.tar.gz
Algorithm Hash digest
SHA256 21acdc17fd601fed93766f83933c402197b556552d02e33028a1f4018195b374
MD5 f70c0e0cf302910fe4cccb61b9c7ae07
BLAKE2b-256 b28bed526a62e22f99a9ff9f44bd7a02b5de9d11e7820cb5b4a36868b251e01e

See more details on using hashes here.

File details

Details for the file win_computer_use-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for win_computer_use-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50d0ff0e69b3fbc6b4fa569c65179482b02de4d25ce4fcd1bbc0d99d4b7e2eb6
MD5 c2171a1ae14d37e13c6a64d55fc5aceb
BLAKE2b-256 f78569bcf7134c777c768718e48057851fb9d616ee8f1e4ae8ef4b03c67a048a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page