Skip to main content

Windows 桌面自动化 MCP 服务器 - 让 AI 代理能看见和操作 Windows 桌面应用

Project description

PeekabooWin MCP

Windows 桌面自动化 MCP 服务器 — 让 AI 代理能看见和操作 Windows 桌面应用。

基于 Windows UI Automation (UIA) + SendInput + PaddleOCR,提供完整的桌面交互能力:元素发现、输入模拟、截图、窗口管理、剪贴板、OCR 识别。


系统要求

  • 操作系统: Windows 10 或 Windows 11
  • 工具: 推荐 uv(Python 包管理)
  • 权限: 管理员身份运行可解锁所有功能(向提升权限窗口发送输入)
  • OCR: 可选,需安装 PaddleOCR(约 1GB)

安装

方式一:一键安装(推荐)

irm https://raw.githubusercontent.com/wangneal/PeekabooWin/main/install.ps1 | iex

脚本自动检测/安装 uv,然后安装 PeekabooWin。

方式二:手动安装

# 需要先安装 uv: https://docs.astral.sh/uv/#installation

# 从 GitHub 安装
uv tool install git+https://github.com/wangneal/PeekabooWin.git

# 或从源码安装
git clone https://github.com/wangneal/PeekabooWin.git
cd PeekabooWin
uv pip install -e .

# 可选:OCR 扩展(约 1GB)
uv pip install -e ".[ocr]"

MCP 客户端配置

安装后,在不同客户端中添加以下配置。

OpenCode

配置文件: C:\Users\<用户名>\.config\opencode\opencode.json

{
  "mcp": {
    "peekaboowin": {
      "type": "local",
      "command": ["peekaboowin"],
      "enabled": true
    }
  }
}

如果尚未安装,可让 opencode 自动通过 uvx 拉取:

{
  "mcp": {
    "peekaboowin": {
      "type": "local",
      "command": ["uvx", "peekaboowin", "-y"],
      "enabled": true
    }
  }
}

注意:uvx peekaboowin 方式需要先将包发布到 PyPI。在发布之前,请先用 uv tool install git+https://github.com/wangneal/PeekabooWin.git 安装,然后使用 "command": ["peekaboowin"]

Claude Desktop

配置文件: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "peekaboowin": {
      "command": "peekaboowin",
      "env": {
        "PEEKABOOWIN_LOG_LEVEL": "info"
      }
    }
  }
}

Cursor

设置路径: Settings → Features → MCP Servers → Add new MCP server

{
  "mcpServers": {
    "peekaboowin": {
      "command": "peekaboowin"
    }
  }
}

Windsurf

配置文件: ~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "peekaboowin": {
      "command": "peekaboowin"
    }
  }
}

GitHub Copilot (VS Code)

配置文件: .vscode/mcp.json

{
  "servers": {
    "peekaboowin": {
      "command": "peekaboowin"
    }
  }
}

工具参考

共 32 个工具,按功能分类:

截图与屏幕

工具 参数 说明
screenshot monitor?, region?, image_format?, quality? 捕获屏幕或区域截图
capture_window hwnd, image_format?, quality? 捕获指定窗口截图
list_monitors 列出所有显示器
health_check 系统诊断:DPI、权限、OCR 状态

UI 元素发现

工具 参数 说明
find_element name?, class_name?, automation_id?, scope? 按名称/类名/ID 查找元素;UIA 无结果时自动 OCR 降级
get_element_info element_ref 获取元素详细属性
get_children element_ref, depth? 获取子元素树
get_desktop depth? 获取桌面元素树
harvest_ui target?, depth? 一站式 UI 发现,UIA 稀疏时自动 OCR 补充

元素引用格式: hwnd:12345 / point:100,200 / desktop

harvest_ui target: desktop(所有窗口)/ foreground(前台窗口)/ 12345(指定 HWND)

输入模拟

工具 参数 说明
click x, y, button?, double? 鼠标点击(自动聚焦目标窗口)
move_mouse x, y 移动鼠标(自动聚焦目标窗口)
drag x1, y1, x2, y2, duration? 拖拽(自动聚焦起始窗口)
scroll x, y, delta? 滚动(自动聚焦目标窗口)
type_text text 输入 Unicode 文本(盲发,需先激活窗口)
press_keys keys 按键组合:ctrl+s, alt+tab, win+r
click_element element_ref 语义点击(自动聚焦目标窗口)
type_into_element element_ref, text 语义输入(自动聚焦目标窗口)

press_keys 支持的键名: 修饰键(ctrl/alt/shift/win)、导航(enter/tab/escape/方向键/delete/backspace)、功能键(f1-f24)、编辑(home/end/pageup/pagedown/insert)、小键盘(num0-num9/numpad 操作符)、符号单字符(= + - [ ] \ ; ' , . / `)自动通过 VkKeyScanW 解析。

窗口管理

工具 说明
list_windows 列出所有可见窗口
get_foreground_window 获取前台窗口信息
activate_window hwnd 激活窗口到前台
move_window / resize_window 移动/调整窗口
minimize_window / maximize_window / restore_window 最小化/最大化/还原
close_window 发送 WM_CLOSE 关闭窗口

等待/轮询

工具 参数 说明
wait_for_element name?, automation_id?, class_name?, control_type?, timeout?, interval?, visible? 等待元素出现/消失
wait_for_window title?, class_name?, timeout?, interval?, appear? 等待窗口出现/消失

剪贴板

工具 参数 说明
read_clipboard 读取剪贴板文本
write_clipboard text 写入剪贴板
paste_text text 写入剪贴板 + Ctrl+V

OCR

工具 参数 说明
ocr_read source?, hwnd?, region?, lang? OCR 文本识别,支持屏幕/窗口/区域

配置

通过环境变量配置,前缀 PEEKABOOWIN_

变量 默认值 说明
PEEKABOOWIN_LOG_LEVEL info 日志级别
PEEKABOOWIN_LOG_FORMAT json 日志格式
PEEKABOOWIN_SCREENSHOT_FORMAT png 截图格式
PEEKABOOWIN_SCREENSHOT_QUALITY 85 JPEG 质量
PEEKABOOWIN_SCREENSHOT_MAX_WIDTH 1920 截图最大宽度
PEEKABOOWIN_OCR_ENABLED false OCR 开关(有 paddleocr 时自动开启)
PEEKABOOWIN_OCR_LANGUAGE ch OCR 语言
PEEKABOOWIN_UIA_TIMEOUT 2.0 UIA 操作超时(秒)
PEEKABOOWIN_UIA_MAX_DEPTH 10 UIA 树遍历最大深度
PEEKABOOWIN_INPUT_CLICK_DELAY 0.05 点击后延迟(秒)
PEEKABOOWIN_INPUT_TYPE_DELAY 0.02 按键间延迟(秒)

示例:

{
  "mcpServers": {
    "peekaboowin": {
      "command": "peekaboowin",
      "env": {
        "PEEKABOOWIN_LOG_LEVEL": "debug",
        "PEEKABOOWIN_SCREENSHOT_MAX_WIDTH": "3840"
      }
    }
  }
}

AI Agent 提示词指南

核心规则

  1. 每步操作后必须等待确认press_keys("win+r") 后必须 wait_for_window(title="运行"),不能连续发操作
  2. 先观察后操作 — 用 screenshot / harvest_ui 了解界面状态再执行
  3. 语义操作优先click_element / type_into_element 优于坐标操作
  4. UWP 应用特殊处理 — UIA 树稀疏时 find_element 自动降级到 OCR,结果带 "source": "ocr_fallback" 标记

示例:打开记事本

press_keys("win+r")
wait_for_window(title="运行", timeout=3)
type_text("notepad")
press_keys("enter")
wait_for_window(title="记事本", timeout=5)
type_text("Hello from AI!")
screenshot()

自动聚焦说明

  • click / click_element / type_into_element / drag / scroll / move_mouse 会自动 AttachThreadInput + SetForegroundWindow 激活目标窗口
  • type_text / press_keys 是盲发操作,不会自动聚焦,使用前需 activate_window

架构

Tool Layer      — 32 个 @mcp.tool(),统一 @tool_error_handler 装饰器
Service Layer   — 单例服务,业务逻辑编排
Platform Layer  — comtypes (UIA) / ctypes (SendInput) / mss (截图) / PaddleOCR

三层职责清晰:Tool 层做参数校验,Service 层编排逻辑,Platform 层封装 Win32 API。


限制

  • 仅 Windows 10/11 — 依赖 Windows UI Automation API
  • UWP 应用 UIA 稀疏 — 自动降级到 OCR,但 OCR 元素缺少 automation_id / hwnd,无法用于 click_element
  • 窗口截图非真实窗口内容capture_window 使用 BitBlt 屏幕裁剪,UWP/DirectX 窗口显示黑屏
  • 后台点击有限 — UIA InvokePattern 可实现后台操作,当前尚未实现
  • UAC 隔离 — 非管理员进程无法向提升权限窗口发送输入

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peekaboowin-0.1.0.tar.gz (40.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peekaboowin-0.1.0-py3-none-any.whl (49.5 kB view details)

Uploaded Python 3

File details

Details for the file peekaboowin-0.1.0.tar.gz.

File metadata

  • Download URL: peekaboowin-0.1.0.tar.gz
  • Upload date:
  • Size: 40.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peekaboowin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 23316f42161d59fe1342283071fb04897a11ed712d94fe67f20ff451fd92ec11
MD5 2c0a839141430c855b6ffc27428899d4
BLAKE2b-256 c255ae72a0def416c907c0641cad0b2602409c98dd17c83459715a96503301fc

See more details on using hashes here.

File details

Details for the file peekaboowin-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: peekaboowin-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 49.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peekaboowin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a6a36a43ab3f515f83efcd1432e69dc61a092edf16b844e9c4795662f993183d
MD5 7ebac9f8a851d819c29db0aad3e1cf3a
BLAKE2b-256 a1aa219e54a95d2b7bd84821923f8aa3102e2b37c1e465fdd39f5645b03eb5d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page