Skip to main content

Windows 桌面自动化 MCP 服务器 - 让 AI 代理能看见和操作 Windows 桌面应用

Project description

PeekabooWin MCP

Windows 桌面自动化 MCP 服务器 — 让 AI 代理能看见和操作 Windows 桌面应用。

基于 Windows UI Automation (UIA) + SendInput + PaddleOCR,提供完整的桌面交互能力:元素发现、输入模拟、截图、窗口管理、剪贴板、OCR 识别。


系统要求

  • 操作系统: Windows 10 或 Windows 11
  • 工具: 推荐 uv(Python 包管理)
  • 权限: 管理员身份运行可解锁所有功能(向提升权限窗口发送输入)
  • OCR: 可选,需安装 PaddleOCR(约 1GB)

安装

方式一:一键安装(推荐)

irm https://raw.githubusercontent.com/wangneal/PeekabooWin/main/install.ps1 | iex

脚本自动安装 uv 和 PeekabooWin,打印 MCP 配置。

方式二:uvx 直接运行(无需安装)

uvx peekaboowin

第一次自动从 PyPI 下载缓存,之后秒级启动。

方式三:安装到本地

# 需要先安装 uv: https://docs.astral.sh/uv/#installation

# 从 PyPI 安装(推荐)
uv tool install peekaboowin

# 可选:OCR 扩展(约 1GB)
uv tool install peekaboowin[ocr]

# 或从源码安装
git clone https://github.com/wangneal/PeekabooWin.git
cd PeekabooWin
uv pip install -e ".[ocr]"

安装后可用 peekaboowin 命令直接启动。


MCP 客户端配置

安装后,在不同客户端中添加以下配置。

OpenCode

配置文件: C:\Users\<用户名>\.config\opencode\opencode.json

uvx 方式(推荐,无需安装):

{
  "mcp": {
    "peekaboowin": {
      "type": "local",
      "command": ["uvx", "peekaboowin", "-y"],
      "enabled": true
    }
  }
}

安装后直接运行:

{
  "mcp": {
    "peekaboowin": {
      "type": "local",
      "command": ["peekaboowin"],
      "enabled": true
    }
  }
}

两种方式任选一种,uvx 方式会自动从 PyPI 下载并缓存,无需手动安装。

Claude Desktop

配置文件: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "peekaboowin": {
      "command": "uvx",
      "args": ["peekaboowin", "-y"]
    }
  }
}

Cursor

设置路径: Settings → Features → MCP Servers → Add custom MCP

{
  "mcpServers": {
    "peekaboowin": {
      "command": "uvx",
      "args": ["peekaboowin", "-y"]
    }
  }
}

Windsurf

配置文件: ~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "peekaboowin": {
      "command": "uvx",
      "args": ["peekaboowin", "-y"]
    }
  }
}

GitHub Copilot (VS Code)

配置文件: .vscode/mcp.json

{
  "servers": {
    "peekaboowin": {
      "command": "uvx",
      "args": ["peekaboowin", "-y"]
    }
  }
}

所有客户端均已统一为 uvx 方式,自动从 PyPI 拉取,无需手动安装 pip。


工具参考

共 32 个工具,按功能分类:

截图与屏幕

工具 参数 说明
screenshot monitor?, region?, image_format?, quality? 捕获屏幕或区域截图
capture_window hwnd, image_format?, quality? 捕获指定窗口截图
list_monitors 列出所有显示器
health_check 系统诊断:DPI、权限、OCR 状态

UI 元素发现

工具 参数 说明
find_element name?, class_name?, automation_id?, scope? 按名称/类名/ID 查找元素;UIA 无结果时自动 OCR 降级
get_element_info element_ref 获取元素详细属性
get_children element_ref, depth? 获取子元素树
get_desktop depth? 获取桌面元素树
harvest_ui target?, depth? 一站式 UI 发现,UIA 稀疏时自动 OCR 补充

元素引用格式: hwnd:12345 / point:100,200 / desktop

harvest_ui target: desktop(所有窗口)/ foreground(前台窗口)/ 12345(指定 HWND)

输入模拟

工具 参数 说明
click x, y, button?, double? 鼠标点击(自动聚焦目标窗口)
move_mouse x, y 移动鼠标(自动聚焦目标窗口)
drag x1, y1, x2, y2, duration? 拖拽(自动聚焦起始窗口)
scroll x, y, delta? 滚动(自动聚焦目标窗口)
type_text text 输入 Unicode 文本(盲发,需先激活窗口)
press_keys keys 按键组合:ctrl+s, alt+tab, win+r
click_element element_ref 语义点击(自动聚焦目标窗口)
type_into_element element_ref, text 语义输入(自动聚焦目标窗口)

press_keys 支持的键名(128+ 个):

格式: press_keys("组合"),多个键用 + 连接,如 ctrl+salt+tabwin+rctrl+shift+escape。不区分大小写。

类别 键名 别名
修饰键 ctrl, alt, shift, win control, lctrl/rctrl, lalt/ralt, lshift/rshift, lwin/rwin
导航 enter, tab, escape, space, backspace, delete return, esc, back, del, forward_delete
方向 up, down, left, right
功能键 f1 ~ f24
编辑 home, end, pageup, pagedown, insert pgup, pgdn, ins
锁定 capslock, numlock, scrolllock
小键盘 num0 ~ num9, numpad0 ~ numpad9 numseparator
小键盘运算 numpad_add, numpad_subtract, numpad_multiply, numpad_divide, numpad_decimal num+, num-, num*, num/, num.
多媒体 volumeup, volumedown, volumemute, nexttrack, prevtrack, playpause, stop
浏览器 browserback, browserforward, browserrefresh, browserstop, browsersearch, browserfavorites, browserhome
启动 launchmail, launchcalculator, launchmedia, launchapp1, launchapp2
其他 printscreen, pause, break, apps, sleep, clear, help, select, execute, print prtsc
符号单字符 = + - [ ] \ ; ' , . / `` 自动通过 VkKeyScanW 解析,+ 会自动按下 Shift

示例:

  • press_keys("win") — 打开开始菜单
  • press_keys("win+r") — 打开运行对话框
  • press_keys("ctrl+shift+escape") — 打开任务管理器
  • press_keys("alt+f4") — 关闭当前窗口
  • press_keys("win+d") — 显示桌面

窗口管理

工具 说明
list_windows 列出所有可见窗口
get_foreground_window 获取前台窗口信息
activate_window hwnd 激活窗口到前台
move_window / resize_window 移动/调整窗口
minimize_window / maximize_window / restore_window 最小化/最大化/还原
close_window 发送 WM_CLOSE 关闭窗口

等待/轮询

工具 参数 说明
wait_for_element name?, automation_id?, class_name?, control_type?, timeout?, interval?, visible? 等待元素出现/消失
wait_for_window title?, class_name?, timeout?, interval?, appear? 等待窗口出现/消失

剪贴板

工具 参数 说明
read_clipboard 读取剪贴板文本
write_clipboard text 写入剪贴板
paste_text text 写入剪贴板 + Ctrl+V

OCR

工具 参数 说明
ocr_read source?, hwnd?, region?, lang? OCR 文本识别,支持屏幕/窗口/区域

配置

通过环境变量配置,前缀 PEEKABOOWIN_

变量 默认值 说明
PEEKABOOWIN_LOG_LEVEL info 日志级别
PEEKABOOWIN_LOG_FORMAT json 日志格式
PEEKABOOWIN_SCREENSHOT_FORMAT png 截图格式
PEEKABOOWIN_SCREENSHOT_QUALITY 85 JPEG 质量
PEEKABOOWIN_SCREENSHOT_MAX_WIDTH 1920 截图最大宽度
PEEKABOOWIN_OCR_ENABLED false OCR 开关(有 paddleocr 时自动开启)
PEEKABOOWIN_OCR_LANGUAGE ch OCR 语言
PEEKABOOWIN_UIA_TIMEOUT 2.0 UIA 操作超时(秒)
PEEKABOOWIN_UIA_MAX_DEPTH 10 UIA 树遍历最大深度
PEEKABOOWIN_INPUT_CLICK_DELAY 0.05 点击后延迟(秒)
PEEKABOOWIN_INPUT_TYPE_DELAY 0.02 按键间延迟(秒)

示例:

{
  "mcpServers": {
    "peekaboowin": {
      "command": "peekaboowin",
      "env": {
        "PEEKABOOWIN_LOG_LEVEL": "debug",
        "PEEKABOOWIN_SCREENSHOT_MAX_WIDTH": "3840"
      }
    }
  }
}

AI Agent 提示词指南

核心规则

  1. 每步操作后必须等待确认press_keys("win+r") 后必须 wait_for_window(title="运行"),不能连续发操作
  2. 先观察后操作 — 用 screenshot / harvest_ui 了解界面状态再执行
  3. 语义操作优先click_element / type_into_element 优于坐标操作
  4. UWP 应用特殊处理 — UIA 树稀疏时 find_element 自动降级到 OCR,结果带 "source": "ocr_fallback" 标记

示例:打开记事本

press_keys("win+r")
wait_for_window(title="运行", timeout=3)
type_text("notepad")
press_keys("enter")
wait_for_window(title="记事本", timeout=5)
type_text("Hello from AI!")
screenshot()

自动聚焦说明

  • click / click_element / type_into_element / drag / scroll / move_mouse 会自动 AttachThreadInput + SetForegroundWindow 激活目标窗口
  • type_text / press_keys 是盲发操作,不会自动聚焦,使用前需 activate_window

架构

Tool Layer      — 32 个 @mcp.tool(),统一 @tool_error_handler 装饰器
Service Layer   — 单例服务,业务逻辑编排
Platform Layer  — comtypes (UIA) / ctypes (SendInput) / mss (截图) / PaddleOCR

三层职责清晰:Tool 层做参数校验,Service 层编排逻辑,Platform 层封装 Win32 API。


限制

  • 仅 Windows 10/11 — 依赖 Windows UI Automation API
  • UWP 应用 UIA 稀疏 — 自动降级到 OCR,但 OCR 元素缺少 automation_id / hwnd,无法用于 click_element
  • 窗口截图非真实窗口内容capture_window 使用 BitBlt 屏幕裁剪,UWP/DirectX 窗口显示黑屏
  • 后台点击有限 — UIA InvokePattern 可实现后台操作,当前尚未实现
  • UAC 隔离 — 非管理员进程无法向提升权限窗口发送输入

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peekaboowin-0.1.2.tar.gz (43.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peekaboowin-0.1.2-py3-none-any.whl (51.6 kB view details)

Uploaded Python 3

File details

Details for the file peekaboowin-0.1.2.tar.gz.

File metadata

  • Download URL: peekaboowin-0.1.2.tar.gz
  • Upload date:
  • Size: 43.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peekaboowin-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d38dee9cdd4fc7b9211cfbbdfa17889a52adb59a8ce7d247cc3ed5f5be66fa5f
MD5 15e55223b680533883d5e0d1c8b54fd4
BLAKE2b-256 c72aca12cb0dc672587d7d1312745d1b8e0f1b1974332c0a2c55ff608f7cc60b

See more details on using hashes here.

File details

Details for the file peekaboowin-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: peekaboowin-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 51.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for peekaboowin-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d4d0067a4a70dd2fb718a9f67aab6042dfca9c4776be209ba36029c70c71f3a1
MD5 e9da2be6920b9829fc8ea50bfec41763
BLAKE2b-256 41a294c1c4538408192a22f599791be8c615e176dd4c927c80d7a1a7835e47cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page