Skip to main content

Zero-config desktop automation MCP server — give any LLM hands and eyes to control your desktop

Project description

autoMate logo

autoMate

🤖 AI-Powered Desktop Automation | Let Your Computer Work for You

中文 | 日本語

PyPI License

"Automate the tedious, give time back to life"

https://github.com/user-attachments/assets/bf27f8bd-136b-402e-bc7d-994b99bcc368

Note: autoMate is in active development. Deeper design thinking, technical discussions, and AI+RPA research notes are shared in Knowledge Planet "AI Tongmu and His Noble Friends".


💡 What is autoMate?

autoMate is an AI + RPA automation tool that controls your desktop through natural language. Unlike traditional RPA, it learns from your demonstrations — when it can't find a button, just click it once and it remembers forever.

Two ways to use it:

Mode Best for
🔌 MCP Server Claude Desktop, OpenClaw, Cursor, Windsurf — plug in and go
💻 CLI Scripts, terminals, power users

✨ Features

  • 🔮 No-Code Automation — Describe tasks in natural language; AI writes and runs the script
  • 🧠 Human-in-the-Loop Learning — AI can't find an element? Click it once, it remembers forever
  • 📝 Markdown Scripts — Stored as readable .md files; edit them directly, no rigid JSON schema
  • 🌐 Universal LLM Support — OpenAI, Azure, OpenRouter, Groq, Ollama, DeepSeek, any OpenAI-compatible API
  • 🔌 MCP Server — Published on PyPI, one-line setup for Claude Desktop, OpenClaw, Cursor, Windsurf, Cline
  • 🖥️ Cross-Platform — Windows, macOS, Linux

🔌 MCP Server Setup

Prerequisite: Install uv once — pip install uv

Zero configuration — no API keys, no environment variables. The host LLM (Claude, GPT, etc.) does the thinking; autoMate provides the hands and eyes.

Claude Desktop

Config file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "automate": {
      "command": "uvx",
      "args": ["automate-mcp"]
    }
  }
}

Restart Claude Desktop — done!

OpenClaw

Edit ~/.openclaw/openclaw.json:

{
  "mcpServers": {
    "automate": {
      "command": "uvx",
      "args": ["automate-mcp"]
    }
  }
}

Then restart the gateway:

openclaw gateway restart

Cursor / Windsurf / Cline

Settings → MCP Servers → Add:

{
  "automate": {
    "command": "uvx",
    "args": ["automate-mcp"]
  }
}

After connecting

Say in any client:

"Use automate to open Chrome and search for the latest AI news"

MCP Tools

Tool Description
screenshot Capture the screen (or a region) and return as base64 PNG
click Click at screen coordinates (left / right / middle)
double_click Double-click at screen coordinates
type_text Type text at the current cursor position
press_key Press a key or key combo (e.g. ctrl+c, enter)
scroll Scroll up or down
mouse_move Move the cursor without clicking
drag Drag from one position to another
get_screen_size Get the screen resolution
get_cursor_position Get the current cursor position

🚀 CLI

Install

git clone https://github.com/yuruotong1/autoMate.git
cd autoMate
conda create -n automate python=3.12
conda activate automate
python install.py

Usage

export OPENAI_API_KEY=sk-...
export OPENAI_MODEL=gpt-4o

# Describe a task — AI generates a Markdown script and executes it
python cli.py run "open Notepad and type Hello World"

# Re-run a saved script
python cli.py exec open_notepad

# List all saved scripts
python cli.py list

# Inspect a script
python cli.py show open_notepad

📝 Markdown Scripts

autoMate saves automation scripts as .md files in ~/.automate/scripts/. Human-readable, version-controllable, and AI-interpretable at runtime.

---
name: open_notepad
description: Open Notepad and type a message
---

## Steps

1. Press the Windows key to open Start Menu `[key:win]`
2. Type "notepad" in the search box `[type:notepad]`
3. Click the Notepad result `[click:Notepad]`
4. Type the greeting `[type:Hello, World!]`
5. Save with Ctrl+S `[key:ctrl+s]`

## Notes
Notepad usually opens within 1–2 seconds.

## Code
```python
# Optional: custom Python runs as a step
import time
time.sleep(1)

### Inline hint syntax

| Hint | Action |
|------|--------|
| `[click:OK]` | Click element whose label contains "OK" (OCR-based) |
| `[click:coord=320,240]` | Click at absolute screen coordinates |
| `[type:hello world]` | Type text |
| `[key:ctrl+s]` | Press keyboard shortcut |
| `[wait:2]` | Wait 2 seconds |
| `[scroll_up]` / `[scroll_down]` | Scroll the page |

Steps **without** hints are interpreted by the AI vision model at runtime.

### Human-in-the-loop learning

When the AI can't locate an element, it pauses and asks:

[autoMate] Step 3: 'Click the Submit button' Please click the target element now…

[autoMate] Got click at (842, 631) — learning… [autoMate] Learned hint: [click:Submit] Resuming.


The learned hint is automatically written back into the Markdown file — next run needs no human help.

---

## 🌐 Supported LLM Providers

| Provider | Base URL | Example Models |
|----------|----------|----------------|
| [OpenAI](https://platform.openai.com) | `https://api.openai.com/v1` | gpt-4o, gpt-4.1, o3 |
| [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service) | your Azure endpoint | gpt-4o |
| [OpenRouter](https://openrouter.ai) | `https://openrouter.ai/api/v1` | claude-3.7-sonnet, gemini-2.5-pro |
| [DeepSeek](https://platform.deepseek.com) | `https://api.deepseek.com/v1` | deepseek-chat, deepseek-reasoner |
| [Groq](https://console.groq.com) | `https://api.groq.com/openai/v1` | llama-3.3-70b-versatile |
| [Ollama](https://ollama.com) (local) | `http://localhost:11434/v1` | qwen2.5-vl, gemma3-tools:27b |
| [yeka](https://2233.ai/api) (CN proxy) | `https://api.2233.ai/v1` | gpt-4o, o1 |

> **Recommended:** Use a multimodal model with vision — `gpt-4o`, `claude-3.7-sonnet` via OpenRouter, or `qwen2.5-vl` via Ollama.

```bash
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
export OPENAI_MODEL=anthropic/claude-3.7-sonnet

📝 FAQ

Q: Why is execution slow without a GPU?
OmniParser (YOLO-based UI detection) is GPU-intensive. With an NVIDIA GPU (4 GB+ VRAM):

pip3 uninstall -y torch torchvision
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu124

Q: Can I edit the Markdown scripts manually?
Yes — they live in ~/.automate/scripts/*.md. The AI reads natural-language descriptions at runtime; hints just make execution faster and more reliable.

Q: Does it work on macOS / Linux?
Yes. MCP server and CLI work on all three platforms. The YOLO model requires Python 3.10–3.12.


🤝 Contributing

Every excellent open-source project embodies collective wisdom. Whether it's fixing bugs, adding features, or improving documentation — your contribution helps thousands of people escape repetitive work.


⭐ Every star encourages the creators and helps more people discover autoMate ⭐

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automate_mcp-0.3.0.tar.gz (42.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

automate_mcp-0.3.0-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file automate_mcp-0.3.0.tar.gz.

File metadata

  • Download URL: automate_mcp-0.3.0.tar.gz
  • Upload date:
  • Size: 42.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for automate_mcp-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ff6953057bc0afd2f922050741eb344823c70bea8b9b686865eae65634339891
MD5 198c7bad1acf78ad3e81982525c936c7
BLAKE2b-256 987b06f2ca49493251b99266d43785eb6a748e0e0a098fec95946b3cf6a71612

See more details on using hashes here.

Provenance

The following attestation bundles were made for automate_mcp-0.3.0.tar.gz:

Publisher: publish.yml on yuruotong1/autoMate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file automate_mcp-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: automate_mcp-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 45.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for automate_mcp-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc825ae5387f1215ac66435ac7b991e969452a89687fa290d8946b970801b098
MD5 e91ae8cd5b7d70658004841ae4ec5864
BLAKE2b-256 c19f586ecbb90c2c78a547f4eae438279727349c5b95ef1591f27ca6436a94c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for automate_mcp-0.3.0-py3-none-any.whl:

Publisher: publish.yml on yuruotong1/autoMate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page