Zero-config desktop automation MCP server — give any LLM hands and eyes to control your desktop
Project description
autoMate
🤖 Desktop Automation for Apps Without APIs
Give Claude hands and eyes — automate any desktop app, even if it has no API
https://github.com/user-attachments/assets/bf27f8bd-136b-402e-bc7d-994b99bcc368
💡 What is autoMate?
autoMate is an MCP server that gives AI assistants (Claude, GPT, etc.) the ability to control any desktop application — even apps with no API, no plugin system, and no automation support.
What makes it different from filesystem / browser / Windows MCP:
| MCP Server | What it automates |
|---|---|
| filesystem MCP | Files and folders |
| browser MCP | Web pages |
| Windows MCP | OS settings and system calls |
| autoMate | Any desktop GUI app with no API — 剪映, Photoshop, AutoCAD, WeChat, SAP, internal tools… |
Two modes:
| Mode | How it works | Requires |
|---|---|---|
| Basic | Claude sees the screen, autoMate clicks/types | Nothing — zero config |
| Cloud Vision | autoMate parses UI itself + reasons via cloud VLM | HuggingFace token + endpoints |
✨ Features
- 🖥️ Automates apps with no API — if it has a GUI, autoMate can drive it
- 📚 Reusable script library — save workflows once, run forever; install community scripts in one command
- ☁️ Cloud Vision — screen parsing via OmniParser + action reasoning via UI-TARS, all in the cloud, zero local GPU
- 🧠 Claude knows when to use it — clear identity prevents autoMate from being bypassed by other MCPs
- 🤖 Zero config for basic use — no API keys, no env vars needed to get started
- 🌍 Cross-platform — Windows, macOS, Linux
🔌 Setup
Prerequisite:
pip install uv
Claude Desktop
Open Settings → Developer → Edit Config, then add:
{
"mcpServers": {
"automate": {
"command": "uvx",
"args": ["automate-mcp@latest"]
}
}
}
Restart Claude Desktop — done. @latest keeps autoMate up to date automatically.
OpenClaw
Edit ~/.openclaw/openclaw.json:
{
"mcpServers": {
"automate": {
"command": "uvx",
"args": ["automate-mcp@latest"]
}
}
}
openclaw gateway restart
Cursor / Windsurf / Cline
Settings → MCP Servers → Add:
{
"automate": {
"command": "uvx",
"args": ["automate-mcp@latest"]
}
}
☁️ Cloud Vision (Optional)
Cloud Vision adds autonomous screen parsing and action reasoning to autoMate — no local GPU required.
It uses two HuggingFace Inference Endpoints:
- OmniParser V2 — detects all UI elements (icons, buttons, text) from a screenshot
- UI-TARS / Qwen-VL — vision-language model that decides what action to take next
Setup
Add these env vars to your MCP config:
{
"mcpServers": {
"automate": {
"command": "uvx",
"args": ["automate-mcp@latest"],
"env": {
"AUTOMATE_HF_TOKEN": "hf_...",
"AUTOMATE_SCREEN_PARSER_URL": "https://your-omniparser-endpoint.aws.endpoints.huggingface.cloud",
"AUTOMATE_ACTION_MODEL_URL": "https://your-uitars-endpoint.aws.endpoints.huggingface.cloud",
"AUTOMATE_ACTION_MODEL_NAME": "ByteDance-Seed/UI-TARS-1.5-7B",
"AUTOMATE_HF_NAMESPACE": "your-hf-username",
"AUTOMATE_SCREEN_PARSER_ENDPOINT": "omniparser-v2",
"AUTOMATE_ACTION_MODEL_ENDPOINT": "ui-tars-1-5-7b"
}
}
}
}
See .env.example in the repo for the full reference.
Cloud Vision workflow
1. warm_endpoints — wake up scaled-to-zero endpoints (1–5 min)
2. parse_screen — detect all UI elements via cloud OmniParser
3. reason_action — ask VLM what to click/type next
— or —
smart_act — full autonomous loop: parse → reason → execute → repeat
🛠️ MCP Tools
Script library — save once, run forever:
| Tool | Description |
|---|---|
list_scripts |
Show all saved automation scripts |
run_script |
Run a saved script by name |
save_script |
Save the current workflow as a reusable script |
show_script |
View a script's contents |
delete_script |
Delete a script |
install_script |
Install a script from a URL or the community library |
Cloud Vision — autonomous UI understanding (requires HF config):
| Tool | Description |
|---|---|
cloud_vision_config |
Show current cloud vision configuration status |
warm_endpoints |
Wake up scaled-to-zero HF endpoints before use |
parse_screen |
Detect all UI elements via cloud OmniParser |
reason_action |
Ask a VLM what GUI action to take next |
smart_act |
Full autonomous loop: parse → reason → execute → repeat |
Low-level desktop control — used when building or executing scripts:
| Tool | Description |
|---|---|
screenshot |
Capture the screen and return as base64 PNG |
click |
Click at screen coordinates |
double_click |
Double-click at screen coordinates |
type_text |
Type text (full Unicode / CJK support) |
press_key |
Press a key or combo (e.g. ctrl+c, win) |
scroll |
Scroll up or down |
mouse_move |
Move cursor without clicking |
drag |
Drag from one position to another |
📚 Script Library
Scripts are saved as .md files in ~/.automate/scripts/ — human-readable, git-friendly, shareable.
---
name: jianying_export_douyin
description: Export the current 剪映 project as a 9:16 Douyin video
created: 2025-01-01
---
## Steps
1. Open export dialog [key:ctrl+e]
2. Select resolution 1080×1920 [click:coord=320,480]
3. Set format to MP4 [click:coord=320,560]
4. Click export [click:coord=800,650]
5. Wait for export to finish [wait:5]
Inline hint syntax:
| Hint | Action |
|---|---|
[click:coord=320,240] |
Click at absolute screen coordinates |
[type:hello] |
Type text |
[key:ctrl+s] |
Press keyboard shortcut |
[wait:2] |
Wait 2 seconds |
[scroll_up] / [scroll_down] |
Scroll the page |
Steps without hints are interpreted by the AI vision model at runtime.
📝 FAQ
Q: How is this different from just using Claude's computer-use capability?
autoMate provides persistent, reusable scripts. Once you automate a task, it's saved and runs instantly next time. Cloud Vision mode also lets autoMate do its own screen parsing without relying on Claude's vision.
Q: Why does Claude sometimes use Windows MCP / filesystem MCP instead of autoMate?
Update to v0.4.0+ — the server description now explicitly tells Claude when to use autoMate vs other MCPs.
Q: Do I need a GPU for Cloud Vision?
No — everything runs on HuggingFace Inference Endpoints in the cloud. You only need a HF token and deployed endpoints.
Q: Does it work on macOS / Linux?
Yes — all three platforms. This is the main advantage over Quicker (Windows-only).
🤝 Contributing
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file automate_mcp-0.5.0.tar.gz.
File metadata
- Download URL: automate_mcp-0.5.0.tar.gz
- Upload date:
- Size: 53.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72578eab76b4b11b8aa839c2dbd82434f26ac6259a5275c2d1b3813ba4b810c2
|
|
| MD5 |
07d2012eb67c69fa70474b2c2019bd1f
|
|
| BLAKE2b-256 |
81f1de7bf6c7d2d9ced10546e110541d71a69e5fb446214f231470b7b7f9f87e
|
Provenance
The following attestation bundles were made for automate_mcp-0.5.0.tar.gz:
Publisher:
publish.yml on yuruotong1/autoMate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
automate_mcp-0.5.0.tar.gz -
Subject digest:
72578eab76b4b11b8aa839c2dbd82434f26ac6259a5275c2d1b3813ba4b810c2 - Sigstore transparency entry: 1356677384
- Sigstore integration time:
-
Permalink:
yuruotong1/autoMate@7c5e79577be6af89e00a7c520042d7581e269e61 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/yuruotong1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7c5e79577be6af89e00a7c520042d7581e269e61 -
Trigger Event:
push
-
Statement type:
File details
Details for the file automate_mcp-0.5.0-py3-none-any.whl.
File metadata
- Download URL: automate_mcp-0.5.0-py3-none-any.whl
- Upload date:
- Size: 57.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f30d8e06fa8de55c602dc55f35ddf50fc473046cec73a8a53ca9b4675af726d5
|
|
| MD5 |
41419679df7dd596541428c39161ed42
|
|
| BLAKE2b-256 |
b35313d2bd44d4ee0b8532d4097b7991b3405f33379706ee912436521c227ef7
|
Provenance
The following attestation bundles were made for automate_mcp-0.5.0-py3-none-any.whl:
Publisher:
publish.yml on yuruotong1/autoMate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
automate_mcp-0.5.0-py3-none-any.whl -
Subject digest:
f30d8e06fa8de55c602dc55f35ddf50fc473046cec73a8a53ca9b4675af726d5 - Sigstore transparency entry: 1356677434
- Sigstore integration time:
-
Permalink:
yuruotong1/autoMate@7c5e79577be6af89e00a7c520042d7581e269e61 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/yuruotong1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7c5e79577be6af89e00a7c520042d7581e269e61 -
Trigger Event:
push
-
Statement type: