Zero-config desktop automation MCP server — give any LLM hands and eyes to control your desktop
Project description
autoMate
🤖 AI-Powered Desktop Automation | Let Your Computer Work for You
"Automate the tedious, give time back to life"
https://github.com/user-attachments/assets/bf27f8bd-136b-402e-bc7d-994b99bcc368
Note: autoMate is in active development. Deeper design thinking, technical discussions, and AI+RPA research notes are shared in Knowledge Planet "AI Tongmu and His Noble Friends".
💡 What is autoMate?
autoMate is an AI + RPA automation tool that controls your desktop through natural language. Unlike traditional RPA, it learns from your demonstrations — when it can't find a button, just click it once and it remembers forever.
Two ways to use it:
| Mode | Best for |
|---|---|
| 🔌 MCP Server | Claude Desktop, OpenClaw, Cursor, Windsurf — plug in and go |
| 💻 CLI | Scripts, terminals, power users |
✨ Features
- 🔮 No-Code Automation — Describe tasks in natural language; AI writes and runs the script
- 🧠 Human-in-the-Loop Learning — AI can't find an element? Click it once, it remembers forever
- 📝 Markdown Scripts — Stored as readable
.mdfiles; edit them directly, no rigid JSON schema - 🌐 Universal LLM Support — OpenAI, Azure, OpenRouter, Groq, Ollama, DeepSeek, any OpenAI-compatible API
- 🔌 MCP Server — Published on PyPI, one-line setup for Claude Desktop, OpenClaw, Cursor, Windsurf, Cline
- 🖥️ Cross-Platform — Windows, macOS, Linux
🔌 MCP Server Setup
Prerequisite: Install
uvonce —pip install uv
Zero configuration — no API keys, no environment variables. The host LLM (Claude, GPT, etc.) does the thinking; autoMate provides the hands and eyes.
Claude Desktop
Config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"automate": {
"command": "uvx",
"args": ["automate-mcp"]
}
}
}
Restart Claude Desktop — done!
OpenClaw
Edit ~/.openclaw/openclaw.json:
{
"mcpServers": {
"automate": {
"command": "uvx",
"args": ["automate-mcp"]
}
}
}
Then restart the gateway:
openclaw gateway restart
Cursor / Windsurf / Cline
Settings → MCP Servers → Add:
{
"automate": {
"command": "uvx",
"args": ["automate-mcp"]
}
}
After connecting
Say in any client:
"Use automate to open Chrome and search for the latest AI news"
MCP Tools
| Tool | Description |
|---|---|
screenshot |
Capture the screen (or a region) and return as base64 PNG |
click |
Click at screen coordinates (left / right / middle) |
double_click |
Double-click at screen coordinates |
type_text |
Type text at the current cursor position |
press_key |
Press a key or key combo (e.g. ctrl+c, enter) |
scroll |
Scroll up or down |
mouse_move |
Move the cursor without clicking |
drag |
Drag from one position to another |
get_screen_size |
Get the screen resolution |
get_cursor_position |
Get the current cursor position |
🚀 CLI
Install
git clone https://github.com/yuruotong1/autoMate.git
cd autoMate
conda create -n automate python=3.12
conda activate automate
python install.py
Usage
export OPENAI_API_KEY=sk-...
export OPENAI_MODEL=gpt-4o
# Describe a task — AI generates a Markdown script and executes it
python cli.py run "open Notepad and type Hello World"
# Re-run a saved script
python cli.py exec open_notepad
# List all saved scripts
python cli.py list
# Inspect a script
python cli.py show open_notepad
📝 Markdown Scripts
autoMate saves automation scripts as .md files in ~/.automate/scripts/. Human-readable, version-controllable, and AI-interpretable at runtime.
---
name: open_notepad
description: Open Notepad and type a message
---
## Steps
1. Press the Windows key to open Start Menu `[key:win]`
2. Type "notepad" in the search box `[type:notepad]`
3. Click the Notepad result `[click:Notepad]`
4. Type the greeting `[type:Hello, World!]`
5. Save with Ctrl+S `[key:ctrl+s]`
## Notes
Notepad usually opens within 1–2 seconds.
## Code
```python
# Optional: custom Python runs as a step
import time
time.sleep(1)
### Inline hint syntax
| Hint | Action |
|------|--------|
| `[click:OK]` | Click element whose label contains "OK" (OCR-based) |
| `[click:coord=320,240]` | Click at absolute screen coordinates |
| `[type:hello world]` | Type text |
| `[key:ctrl+s]` | Press keyboard shortcut |
| `[wait:2]` | Wait 2 seconds |
| `[scroll_up]` / `[scroll_down]` | Scroll the page |
Steps **without** hints are interpreted by the AI vision model at runtime.
### Human-in-the-loop learning
When the AI can't locate an element, it pauses and asks:
[autoMate] Step 3: 'Click the Submit button' Please click the target element now…
[autoMate] Got click at (842, 631) — learning… [autoMate] Learned hint: [click:Submit] Resuming.
The learned hint is automatically written back into the Markdown file — next run needs no human help.
---
## 🌐 Supported LLM Providers
| Provider | Base URL | Example Models |
|----------|----------|----------------|
| [OpenAI](https://platform.openai.com) | `https://api.openai.com/v1` | gpt-4o, gpt-4.1, o3 |
| [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service) | your Azure endpoint | gpt-4o |
| [OpenRouter](https://openrouter.ai) | `https://openrouter.ai/api/v1` | claude-3.7-sonnet, gemini-2.5-pro |
| [DeepSeek](https://platform.deepseek.com) | `https://api.deepseek.com/v1` | deepseek-chat, deepseek-reasoner |
| [Groq](https://console.groq.com) | `https://api.groq.com/openai/v1` | llama-3.3-70b-versatile |
| [Ollama](https://ollama.com) (local) | `http://localhost:11434/v1` | qwen2.5-vl, gemma3-tools:27b |
| [yeka](https://2233.ai/api) (CN proxy) | `https://api.2233.ai/v1` | gpt-4o, o1 |
> **Recommended:** Use a multimodal model with vision — `gpt-4o`, `claude-3.7-sonnet` via OpenRouter, or `qwen2.5-vl` via Ollama.
```bash
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
export OPENAI_MODEL=anthropic/claude-3.7-sonnet
📝 FAQ
Q: Why is execution slow without a GPU?
OmniParser (YOLO-based UI detection) is GPU-intensive. With an NVIDIA GPU (4 GB+ VRAM):
pip3 uninstall -y torch torchvision
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu124
Q: Can I edit the Markdown scripts manually?
Yes — they live in ~/.automate/scripts/*.md. The AI reads natural-language descriptions at runtime; hints just make execution faster and more reliable.
Q: Does it work on macOS / Linux?
Yes. MCP server and CLI work on all three platforms. The YOLO model requires Python 3.10–3.12.
🤝 Contributing
Every excellent open-source project embodies collective wisdom. Whether it's fixing bugs, adding features, or improving documentation — your contribution helps thousands of people escape repetitive work.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file automate_mcp-0.3.0.tar.gz.
File metadata
- Download URL: automate_mcp-0.3.0.tar.gz
- Upload date:
- Size: 42.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff6953057bc0afd2f922050741eb344823c70bea8b9b686865eae65634339891
|
|
| MD5 |
198c7bad1acf78ad3e81982525c936c7
|
|
| BLAKE2b-256 |
987b06f2ca49493251b99266d43785eb6a748e0e0a098fec95946b3cf6a71612
|
Provenance
The following attestation bundles were made for automate_mcp-0.3.0.tar.gz:
Publisher:
publish.yml on yuruotong1/autoMate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
automate_mcp-0.3.0.tar.gz -
Subject digest:
ff6953057bc0afd2f922050741eb344823c70bea8b9b686865eae65634339891 - Sigstore transparency entry: 1289872048
- Sigstore integration time:
-
Permalink:
yuruotong1/autoMate@734557263e5275c7079dd9fe04305d7f3e007932 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/yuruotong1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@734557263e5275c7079dd9fe04305d7f3e007932 -
Trigger Event:
push
-
Statement type:
File details
Details for the file automate_mcp-0.3.0-py3-none-any.whl.
File metadata
- Download URL: automate_mcp-0.3.0-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc825ae5387f1215ac66435ac7b991e969452a89687fa290d8946b970801b098
|
|
| MD5 |
e91ae8cd5b7d70658004841ae4ec5864
|
|
| BLAKE2b-256 |
c19f586ecbb90c2c78a547f4eae438279727349c5b95ef1591f27ca6436a94c3
|
Provenance
The following attestation bundles were made for automate_mcp-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on yuruotong1/autoMate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
automate_mcp-0.3.0-py3-none-any.whl -
Subject digest:
dc825ae5387f1215ac66435ac7b991e969452a89687fa290d8946b970801b098 - Sigstore transparency entry: 1289872132
- Sigstore integration time:
-
Permalink:
yuruotong1/autoMate@734557263e5275c7079dd9fe04305d7f3e007932 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/yuruotong1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@734557263e5275c7079dd9fe04305d7f3e007932 -
Trigger Event:
push
-
Statement type: