Skip to main content

AI-powered desktop automation as an MCP server — control your desktop from Claude Desktop, Cursor, Windsurf and more

Project description

autoMate logo

autoMate

🤖 AI-Powered Desktop Automation | Let Your Computer Work for You

中文 | 日本語

"Automate the tedious, give time back to life"

https://github.com/user-attachments/assets/bf27f8bd-136b-402e-bc7d-994b99bcc368

Note: autoMate is in active development. Deeper design thinking, technical discussions, and AI+RPA research notes are shared in Knowledge Planet "AI Tongmu and His Noble Friends".


💡 What is autoMate?

autoMate is an AI + RPA automation tool that controls your desktop through natural language. Unlike traditional RPA, it learns from your demonstrations — when it can't find a button, just click it once and it remembers forever.

Two ways to use it:

Mode Best for
🔌 MCP Server Claude Desktop, OpenClaw, Cursor, Windsurf — plug in and go
💻 CLI Scripts, terminals, power users

✨ Features

  • 🔮 No-Code Automation — Describe tasks in natural language; AI writes and runs the script
  • 🧠 Human-in-the-Loop Learning — AI can't find an element? Click it once, it remembers forever
  • 📝 Markdown Scripts — Stored as readable .md files; edit them directly, no rigid JSON schema
  • 🌐 Universal LLM Support — OpenAI, Azure, OpenRouter, Groq, Ollama, DeepSeek, any OpenAI-compatible API
  • 🔌 MCP Server — Works with Claude Desktop, OpenClaw, Cursor, Windsurf, Cline and more
  • 🖥️ Cross-Platform — Windows, macOS, Linux

🔌 MCP Server Setup

Prerequisite: Install uv once — pip install uv

Claude Desktop

Config file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "automate": {
      "command": "uvx",
      "args": ["automate-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "OPENAI_BASE_URL": "https://api.openai.com/v1",
        "OPENAI_MODEL": "gpt-4o"
      }
    }
  }
}

Restart Claude Desktop — you'll see run_task and screenshot in the tools list.

OpenClaw

Edit ~/.openclaw/openclaw.json:

{
  "mcpServers": {
    "automate": {
      "command": "uvx",
      "args": ["automate-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "OPENAI_BASE_URL": "https://api.openai.com/v1",
        "OPENAI_MODEL": "gpt-4o"
      }
    }
  }
}

Then restart the gateway:

openclaw gateway restart

Cursor / Windsurf / Cline

Settings → MCP Servers → Add:

{
  "automate": {
    "command": "uvx",
    "args": ["automate-mcp"],
    "env": {
      "OPENAI_API_KEY": "sk-...",
      "OPENAI_MODEL": "gpt-4o"
    }
  }
}

After connecting

Say in any client:

"Use automate to open Chrome and search for the latest AI news"

MCP Tools

Tool Description
run_task Execute a desktop automation task in natural language
screenshot Capture the screen (or a region) and return as base64 PNG

🚀 CLI

Install

git clone https://github.com/yuruotong1/autoMate.git
cd autoMate
conda create -n automate python=3.12
conda activate automate
python install.py

Usage

export OPENAI_API_KEY=sk-...
export OPENAI_MODEL=gpt-4o

# Describe a task — AI generates a Markdown script and executes it
python cli.py run "open Notepad and type Hello World"

# Re-run a saved script
python cli.py exec open_notepad

# List all saved scripts
python cli.py list

# Inspect a script
python cli.py show open_notepad

📝 Markdown Scripts

autoMate saves automation scripts as .md files in ~/.automate/scripts/. Human-readable, version-controllable, and AI-interpretable at runtime.

---
name: open_notepad
description: Open Notepad and type a message
---

## Steps

1. Press the Windows key to open Start Menu `[key:win]`
2. Type "notepad" in the search box `[type:notepad]`
3. Click the Notepad result `[click:Notepad]`
4. Type the greeting `[type:Hello, World!]`
5. Save with Ctrl+S `[key:ctrl+s]`

## Notes
Notepad usually opens within 1–2 seconds.

## Code
```python
# Optional: custom Python runs as a step
import time
time.sleep(1)

### Inline hint syntax

| Hint | Action |
|------|--------|
| `[click:OK]` | Click element whose label contains "OK" (OCR-based) |
| `[click:coord=320,240]` | Click at absolute screen coordinates |
| `[type:hello world]` | Type text |
| `[key:ctrl+s]` | Press keyboard shortcut |
| `[wait:2]` | Wait 2 seconds |
| `[scroll_up]` / `[scroll_down]` | Scroll the page |

Steps **without** hints are interpreted by the AI vision model at runtime.

### Human-in-the-loop learning

When the AI can't locate an element, it pauses and asks:

[autoMate] Step 3: 'Click the Submit button' Please click the target element now…

[autoMate] Got click at (842, 631) — learning… [autoMate] Learned hint: [click:Submit] Resuming.


The learned hint is automatically written back into the Markdown file — next run needs no human help.

---

## 🌐 Supported LLM Providers

| Provider | Base URL | Example Models |
|----------|----------|----------------|
| [OpenAI](https://platform.openai.com) | `https://api.openai.com/v1` | gpt-4o, gpt-4.1, o3 |
| [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service) | your Azure endpoint | gpt-4o |
| [OpenRouter](https://openrouter.ai) | `https://openrouter.ai/api/v1` | claude-3.7-sonnet, gemini-2.5-pro |
| [DeepSeek](https://platform.deepseek.com) | `https://api.deepseek.com/v1` | deepseek-chat, deepseek-reasoner |
| [Groq](https://console.groq.com) | `https://api.groq.com/openai/v1` | llama-3.3-70b-versatile |
| [Ollama](https://ollama.com) (local) | `http://localhost:11434/v1` | qwen2.5-vl, gemma3-tools:27b |
| [yeka](https://2233.ai/api) (CN proxy) | `https://api.2233.ai/v1` | gpt-4o, o1 |

> **Recommended:** Use a multimodal model with vision — `gpt-4o`, `claude-3.7-sonnet` via OpenRouter, or `qwen2.5-vl` via Ollama.

```bash
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
export OPENAI_MODEL=anthropic/claude-3.7-sonnet

📝 FAQ

Q: Why is execution slow without a GPU?
OmniParser (YOLO-based UI detection) is GPU-intensive. With an NVIDIA GPU (4 GB+ VRAM):

pip3 uninstall -y torch torchvision
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu124

Q: Can I edit the Markdown scripts manually?
Yes — they live in ~/.automate/scripts/*.md. The AI reads natural-language descriptions at runtime; hints just make execution faster and more reliable.

Q: Does it work on macOS / Linux?
Yes. MCP server and CLI work on all three platforms. The YOLO model requires Python 3.10–3.12.


🤝 Contributing

Every excellent open-source project embodies collective wisdom. Whether it's fixing bugs, adding features, or improving documentation — your contribution helps thousands of people escape repetitive work.


⭐ Every star encourages the creators and helps more people discover autoMate ⭐

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automate_mcp-0.2.0.tar.gz (41.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

automate_mcp-0.2.0-py3-none-any.whl (45.2 kB view details)

Uploaded Python 3

File details

Details for the file automate_mcp-0.2.0.tar.gz.

File metadata

  • Download URL: automate_mcp-0.2.0.tar.gz
  • Upload date:
  • Size: 41.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for automate_mcp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 957a2c62b4f93b8acf418a244aaa450769caf05ed6e1732536f711a8f842d7cc
MD5 a9379fee7a05aa8834985e732a9e8bd4
BLAKE2b-256 61d1841065c997c69485c90e57907ccec4248400ee51e8cd6ae06dfe95cf89a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for automate_mcp-0.2.0.tar.gz:

Publisher: publish.yml on yuruotong1/autoMate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file automate_mcp-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: automate_mcp-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 45.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for automate_mcp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b178df4a0bf7df48076c641a8cfcde171a306a6eaca8b78fe0f672db649c5f0c
MD5 97db5e2260a6f76a3aa1082b03f94a6d
BLAKE2b-256 1d27ba036b74d571c9a4a83045dc8a33382a24f6466f2201d07938ef727de9b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for automate_mcp-0.2.0-py3-none-any.whl:

Publisher: publish.yml on yuruotong1/autoMate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page