Skill-oriented desktop GUI vision automation CLI for LLM agents with machine-readable JSON output.
Project description
migi
migi is a task-oriented desktop GUI vision automation CLI focused on skill-style integration for LLM agents.
English
Navigation
- What It Does
- Current Model Support
- Install
- Quick Start
- CLI Usage
- Configuration
- Advanced: Custom Action Parser
- JSON Output Contract
- Platform and Dependencies
- Troubleshooting
- FAQ
- Roadmap
What It Does
- Uses screenshots + multimodal model inference to understand current desktop UI
- Supports instruction-driven automation via
see(analyze only) andact(analyze + execute) - Supports local image understanding via
image/vision - Returns machine-readable JSON for every command
- Includes a skill installer for agent platforms
Current Model Support
migi currently ships with a Doubao-oriented action parser by default (doubao), and at this stage only doubao-seed is officially supported.
Notes:
- You can still pass custom model/base URL values, but the built-in action parsing logic is currently tuned for Doubao-style action outputs.
- If you need a different model, use
--action-parser customwith your own parser callable.
Install
pip install migi-cli
or:
uv pip install migi-cli
Quick Start
- Configure credentials and model:
migi setup --api-key "YOUR_API_KEY" --model "doubao-seed" --base-url "https://ark.cn-beijing.volces.com/api/v3"
- Analyze current screen (no execution):
migi see "What apps are visible on the screen?"
- Analyze and execute:
migi act "Click the search box and type Li Bai"
If you prefer lower latency, use the faster runtime profile:
migi act --performance fast "Click the search box and type Li Bai"
- Install skill package:
migi install --target cursor
- Understand a local image file:
migi image ./example.png "Describe the key objects and visible text."
CLI Usage
migi <command> [options]
Core commands:
setup/init: initialize or update model configstatus: show effective runtime config and dependency statusconfig show: alias ofstatussee <instruction>: analyze screen onlyact <instruction>: analyze and execute actionsimage <image_path> [instruction]/vision: analyze a local image fileinstall: install skill package(s)
Performance profile:
--performance balanced(default): a faster default balance for most GUI tasks--performance fast: smaller screenshots, tighter limits, lowest latency--performance accurate: larger screenshots and more generous outputs for tiny text / dense UIs
Multi-step execution:
migi actnow supports--max-steps Nand defaults to3- Use higher values for cross-screen tasks such as opening an app, searching, then sending a message
- App-targeted tasks such as "send a WeChat message" now try to bring the target app to the foreground before visual steps begin
- Recipient-targeted messaging instructions now carry the recipient hint forward so the model is less likely to send into the currently open chat by mistake
- Non-essential close/quit shortcuts such as
Cmd+Ware now blocked unless the instruction explicitly asks to close or quit something - After the target app is brought to the foreground,
autocapture can narrow back down to the front window for the remaining steps - WeChat text-message instructions in the form
给 <recipient> 发送微信消息,说 <content>now use a specialized flow: foreground WeChat, search recipient, confirm the chat, then send - That specialized flow now tries
Enteron the first search result before falling back to visual contact clicking
Capture mode:
--capture-mode auto(default): prefer the front window for in-app tasks, but keep full-screen capture for app-launch flows--capture-mode window: focus perception on the current front window--capture-mode screen: keep full-screen capture when you need Dock / desktop / cross-app context
Configuration
Config Sources and Priority
For runtime values, priority is:
- CLI flags (
--api-key,--model,--base-url, etc.) - Config file (
~/.migi/config.json)
Config File Location
Default path:
~/.migi/config.json
Run migi setup to write the config interactively, or set fields via CLI flags:
migi setup --api-key "YOUR_API_KEY" --model "doubao-seed" --base-url "https://ark.cn-beijing.volces.com/api/v3"
Advanced: Custom Action Parser
When using non-Doubao model outputs, provide your own parser:
migi act "..." \
--action-parser custom \
--action-parser-callable "your_module:your_parser"
Your parser callable should accept:
def your_parser(response: str, img_width: int, img_height: int, scale_factor: int):
...
Coordinate behavior in executor:
- Recommended: normalized
0..1000coordinate space (independent of screen resolution) - Also accepted:
0..1ratio coordinates - Also accepted: absolute screenshot pixel coordinates
(migiremaps screenshot coordinates to the actual pyautogui control space for DPI/scaling differences)
JSON Output Contract
All commands print exactly one JSON object to stdout.
compact(default, token-efficient):- success:
ok,cmd,code,data - failure:
ok,cmd,code,error(anddatawhen needed)
- success:
full(debug-friendly):ok,command,code,message,data,error,meta
Switch mode:
migi status --json full
Platform and Dependencies
Target runtime:
- Python:
>=3.11 - OS: macOS / Linux / Windows (desktop environment required)
Runtime dependencies:
- Required package dependency:
httpx - Local image understanding (
image/vision) requires:pillow - Optional but practically required for GUI automation:
mss,pyautogui,pyperclip,pillow
Install optional GUI dependencies:
pip install mss pyautogui pyperclip pillow
Troubleshooting
CONFIG_MISSINGfor API key/model/base URL- Run
migi setupagain, or set env vars directly.
- Run
- No action executed after
act- Start with
migi see "..."to inspect response first. - Ensure model is
doubao-seedand parser isdoubao.
- Start with
act/imagefeels slow- Run with
--performance fastfirst. miginow downsizes screenshots and local images before upload;accuratekeeps larger inputs when you need more detail.- Use
--json fulland inspecttiming.inference_msvstiming.screenshot_msto see whether the slowdown is model-side or local.
- Run with
- Complex tasks stop after only one visible step
- Increase
--max-steps, for example:migi act --max-steps 3 "..." miginow carries forward action history between steps, but cross-screen flows still depend heavily on model quality and visible UI confirmation.
- Increase
- The model keeps clicking the wrong small control in the active app
- Prefer
--capture-mode windowso the model sees only the front window instead of the whole desktop. - Use
--capture-mode screenonly when you explicitly need desktop-wide context.
- Prefer
- Dependency error for GUI modules
- Install missing packages:
mss pyautogui pyperclip pillow.
- Install missing packages:
which <app>/where <app>returns not found (exit code 1)- This is expected for many GUI apps (they are not in PATH).
- For app launch tasks,
miginow uses a 3-stage fallback chain:- Direct command launch first (macOS
open, WindowsStart-Process) - Then shortcut search (macOS
Command+Space, WindowsWin+S) - Then GUI-visible search fallback if shortcut action fails
- macOS:
Command+Space-> type app name -> select the app entry under Applications -> Enter - Windows:
Win+S-> type app name -> Enter
- Direct command launch first (macOS
- Config path permission issue
- Use
--config-pathto specify a writable location.
- Use
- Need to use another model
- Switch to
--action-parser customand implementmodule:function.
- Switch to
FAQ
- Is
migiproduction-ready?- Current release is alpha and focuses on a stable CLI/JSON contract.
- Can I use OpenAI-compatible providers directly?
- Yes, request transport is OpenAI-compatible, but built-in parsing is currently optimized for Doubao-style outputs.
- Why only
doubao-seedis officially supported now?- The default parser backend is Doubao-oriented; parser behavior for other models is not officially guaranteed yet.
- How to integrate with agents?
- Use the stable compact JSON mode and install skills via
migi install.
- Use the stable compact JSON mode and install skills via
Roadmap
- Multi-model official parser support
- Safer and richer action execution controls
- More robust cross-platform test coverage
- Better parser debug tooling and evaluation suites
中文
导航
项目简介
migi 是一个面向任务的桌面 GUI 视觉自动化 CLI,重点用于 LLM Agent 的 skill 化集成与调用。
- 通过截图 + 多模态模型理解当前界面
- 支持
see(只分析)与act(分析并执行) - 支持
image/vision(针对本地图片做图像理解) - 全部命令输出稳定 JSON,方便程序消费
- 内置技能安装能力(如 Cursor)
当前模型支持说明(重要)
目前项目默认只实现了豆包方向的动作解析器(doubao),因此当前仅官方支持 doubao-seed 模型。
- 你仍可传入其他模型参数,但内置解析逻辑目前针对 Doubao 风格动作输出
- 若要接入其他模型,请使用
custom解析器自行适配
安装
pip install migi-cli
或:
uv pip install migi-cli
快速开始
- 初始化配置(推荐):
migi setup --api-key "你的密钥" --model "doubao-seed" --base-url "https://ark.cn-beijing.volces.com/api/v3"
- 仅分析当前屏幕:
migi see "屏幕上有哪些应用?"
- 分析并执行动作:
migi act "点击搜索框并输入 李白"
如果你更在意响应速度,可以直接切到快速档:
migi act --performance fast "点击搜索框并输入 李白"
- 安装 Cursor 技能包:
migi install --target cursor
- 理解一张本地图片:
migi image ./example.png "这张图里有哪些关键元素和文字?"
命令总览
migi <command> [options]
setup/init:初始化或更新模型配置status:查看当前生效配置与依赖状态config show:status的别名see <instruction>:只做视觉分析,不执行动作act <instruction>:视觉分析并执行动作image <image_path> [instruction]/vision:分析本地图片内容install:安装技能包
性能档位:
--performance balanced(默认):兼顾速度与识别稳定性--performance fast:更小的截图、更紧的输出限制,延迟最低--performance accurate:更大的截图和更宽松的输出上限,适合小字或复杂界面
多步执行:
migi act现在支持--max-steps N,默认是3- 对于“打开应用 -> 搜索 -> 发送消息”这类跨界面任务,可以适当调高
- 像“发送微信消息”这类明确点名应用的任务,现在会在视觉步骤开始前优先尝试把目标应用切到前台
- 像“给某人发微信消息”这类带收件人的指令,现在会把收件人提示带进后续推理,降低误发到当前会话的概率
- 像
Cmd+W这类非必要的关闭/退出快捷键现在会默认被拦截,除非指令明确要求关闭或退出 - 当目标应用已经被切到前台后,
auto截图模式会优先收回到前台窗口,减少后续步骤的整屏干扰 - 像
给 <收件人> 发送微信消息,说 <内容>这样的微信纯文字指令,现在会优先命中专用流程:切前台、搜索联系人、确认会话、再发送 - 这个专用流程在输入联系人后会先尝试用回车打开首个搜索结果,不行再回退到视觉点选
截图模式:
--capture-mode auto(默认):应用内任务优先看前台窗口,打开应用这类任务仍保留全屏截图--capture-mode window:只看当前前台窗口,适合点小控件、搜索框、输入框--capture-mode screen:保留全屏截图,适合需要看 Dock、桌面、跨应用上下文的任务
配置方式
配置优先级(高到低)
- 命令行参数(CLI)
- 配置文件(
~/.migi/config.json)
配置文件路径
默认:
~/.migi/config.json
通过 migi setup 交互式写入配置,或通过命令行参数设置:
migi setup --api-key "你的密钥" --model "doubao-seed" --base-url "https://ark.cn-beijing.volces.com/api/v3"
高级用法:自定义解析器
接入非 Doubao 风格输出时,可使用自定义解析器:
migi act "..." \
--action-parser custom \
--action-parser-callable "你的模块:你的函数"
函数签名建议:
def your_parser(response: str, img_width: int, img_height: int, scale_factor: int):
...
执行器坐标兼容策略:
- 推荐使用
0..1000归一化坐标(与屏幕分辨率无关) - 兼容
0..1比例坐标 - 兼容截图像素绝对坐标
(migi会把截图坐标重映射到 pyautogui 实际控制坐标,适配 DPI/缩放差异)
JSON 输出协议
所有命令都只向标准输出打印一个 JSON 对象。
compact(默认,节省 token):- 成功:
ok,cmd,code,data - 失败:
ok,cmd,code,error(必要时含data)
- 成功:
full(调试模式):ok,command,code,message,data,error,meta
切换方式:
migi status --json full
平台与依赖
运行环境建议:
- Python:
>=3.11 - 操作系统:macOS / Linux / Windows(需要桌面环境)
依赖说明:
- 必需包依赖:
httpx - 本地图片理解(
image/vision)依赖:pillow - GUI 自动化常用依赖:
mss、pyautogui、pyperclip、pillow
安装 GUI 相关依赖:
pip install mss pyautogui pyperclip pillow
故障排查
- 提示
CONFIG_MISSING(缺少 key/model/base_url)- 重新执行
migi setup,或直接设置环境变量。
- 重新执行
- 执行
act没有动作- 先用
migi see "..."检查模型输出。 - 确保模型使用
doubao-seed,解析器使用doubao。
- 先用
act/image运行偏慢- 先试试
--performance fast。 - 现在
migi会在上传前自动缩小截图和本地图片;如果你需要更细的小字识别,再切回--performance accurate。 - 用
--json full查看timing.inference_ms和timing.screenshot_ms,可以快速判断是模型推理慢还是本地处理慢。
- 先试试
- 复杂任务只走了一步就停了
- 可以调高
--max-steps,例如:migi act --max-steps 3 "..." - 现在
migi会把前一步动作历史带进下一轮推理,但跨界面任务依然很依赖模型质量和界面是否清晰可见。
- 可以调高
- 模型总是点偏当前应用里的小控件
- 优先使用
--capture-mode window,让模型只看前台窗口而不是整个桌面。 - 只有明确需要桌面全局信息时,再切回
--capture-mode screen。
- 优先使用
- 出现 GUI 依赖缺失报错
- 安装:
mss pyautogui pyperclip pillow。
- 安装:
which <app>/where <app>返回未找到(exit code 1)- 这是常见现象,很多 GUI 应用并不在 PATH 中。
migi对“打开应用”默认使用三段式回退链路:- 先命令直启(macOS
open,WindowsStart-Process) - 再快捷键搜索(macOS
Command+Space,WindowsWin+S) - 若快捷键动作失败,再自动走 GUI 可见入口回退流程
- macOS:
Command+Space-> 输入应用名 -> 先选中“应用程序”分组中的目标应用 -> 回车 - Windows:
Win+S-> 输入应用名 -> 回车
- 先命令直启(macOS
- 配置文件写入失败(权限问题)
- 使用
--config-path指向可写目录。
- 使用
- 想接入其他模型
- 使用
--action-parser custom并实现module:function自定义解析器。
- 使用
常见问题(FAQ)
- 现在可以直接用于生产吗?
- 当前版本为 alpha,优先保证 CLI 与 JSON 协议稳定。
- 是否兼容 OpenAI 接口格式?
- 传输层兼容,但内置动作解析目前主要针对豆包输出风格。
- 为什么当前只官方支持
doubao-seed?- 因为默认解析器是豆包方向实现,其他模型暂未给出官方解析保证。
- 如何与 Agent 集成?
- 推荐使用默认 compact JSON 输出,并通过
migi install安装技能。
- 推荐使用默认 compact JSON 输出,并通过
路线图
- 增加多模型官方解析支持
- 增强动作执行安全与控制能力
- 完善跨平台自动化测试覆盖
- 提供更强的解析调试与评估工具
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file migi_cli-0.3.3.tar.gz.
File metadata
- Download URL: migi_cli-0.3.3.tar.gz
- Upload date:
- Size: 37.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34a5443b59dcee932986b52ddf119d53773980a241b0476f758fa199543f00b3
|
|
| MD5 |
77eca7b17277370257fef16afc79b2a3
|
|
| BLAKE2b-256 |
c1e939ea8d85d3d99f2a9321ea74bf45f73cf34310e5373fcb76e32299f25d5a
|
Provenance
The following attestation bundles were made for migi_cli-0.3.3.tar.gz:
Publisher:
publish.yml on mikuh/migi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
migi_cli-0.3.3.tar.gz -
Subject digest:
34a5443b59dcee932986b52ddf119d53773980a241b0476f758fa199543f00b3 - Sigstore transparency entry: 1106712663
- Sigstore integration time:
-
Permalink:
mikuh/migi@349b782fdeb8534a4d6dafda86cdb4e0c1a889e6 -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/mikuh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@349b782fdeb8534a4d6dafda86cdb4e0c1a889e6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file migi_cli-0.3.3-py3-none-any.whl.
File metadata
- Download URL: migi_cli-0.3.3-py3-none-any.whl
- Upload date:
- Size: 38.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8dd575f8ef537ca5d91014ae04fe45d3e747129085627c9aa7261920bc055fd2
|
|
| MD5 |
40709977f14d90714cc698ac4b98eb6f
|
|
| BLAKE2b-256 |
30701a43e9740f4b3039e2c06f2e4a82a6d2283d31959ef717aed5b20f0bde70
|
Provenance
The following attestation bundles were made for migi_cli-0.3.3-py3-none-any.whl:
Publisher:
publish.yml on mikuh/migi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
migi_cli-0.3.3-py3-none-any.whl -
Subject digest:
8dd575f8ef537ca5d91014ae04fe45d3e747129085627c9aa7261920bc055fd2 - Sigstore transparency entry: 1106712718
- Sigstore integration time:
-
Permalink:
mikuh/migi@349b782fdeb8534a4d6dafda86cdb4e0c1a889e6 -
Branch / Tag:
refs/tags/v0.3.3 - Owner: https://github.com/mikuh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@349b782fdeb8534a4d6dafda86cdb4e0c1a889e6 -
Trigger Event:
release
-
Statement type: