Skip to main content

Zero-config desktop automation MCP server — give any LLM hands and eyes to control your desktop

Project description

autoMate logo

autoMate

🤖 Desktop Automation for Apps Without APIs

中文 | 日本語

PyPI License

Give Claude hands and eyes — automate any desktop app, even if it has no API

https://github.com/user-attachments/assets/bf27f8bd-136b-402e-bc7d-994b99bcc368


💡 What is autoMate?

autoMate is an MCP server that gives AI assistants (Claude, GPT, etc.) the ability to control any desktop application — even apps with no API, no plugin system, and no automation support.

What makes it different from filesystem / browser / Windows MCP:

MCP Server What it automates
filesystem MCP Files and folders
browser MCP Web pages
Windows MCP OS settings and system calls
autoMate Any desktop GUI app with no API — 剪映, Photoshop, AutoCAD, WeChat, SAP, internal tools…

Two modes:

Mode How it works Requires
Basic Claude sees the screen, autoMate clicks/types Nothing — zero config
Cloud Vision autoMate parses UI itself + reasons via cloud VLM HuggingFace token + endpoints

✨ Features

  • 🖥️ Automates apps with no API — if it has a GUI, autoMate can drive it
  • 📚 Reusable script library — save workflows once, run forever; install community scripts in one command
  • ☁️ Cloud Vision — screen parsing via OmniParser + action reasoning via UI-TARS, all in the cloud, zero local GPU
  • 🧠 Claude knows when to use it — clear identity prevents autoMate from being bypassed by other MCPs
  • 🤖 Zero config for basic use — no API keys, no env vars needed to get started
  • 🌍 Cross-platform — Windows, macOS, Linux

🔌 Setup

Prerequisite: pip install uv

Claude Desktop

Open Settings → Developer → Edit Config, then add:

{
  "mcpServers": {
    "automate": {
      "command": "uvx",
      "args": ["automate-mcp@latest"]
    }
  }
}

Restart Claude Desktop — done. @latest keeps autoMate up to date automatically.

OpenClaw

Edit ~/.openclaw/openclaw.json:

{
  "mcpServers": {
    "automate": {
      "command": "uvx",
      "args": ["automate-mcp@latest"]
    }
  }
}
openclaw gateway restart

Cursor / Windsurf / Cline

Settings → MCP Servers → Add:

{
  "automate": {
    "command": "uvx",
    "args": ["automate-mcp@latest"]
  }
}

☁️ Cloud Vision (Optional)

Cloud Vision adds autonomous screen parsing and action reasoning to autoMate — no local GPU required.

It uses two HuggingFace Inference Endpoints:

  • OmniParser V2 — detects all UI elements (icons, buttons, text) from a screenshot
  • UI-TARS / Qwen-VL — vision-language model that decides what action to take next

Setup

Add these env vars to your MCP config:

{
  "mcpServers": {
    "automate": {
      "command": "uvx",
      "args": ["automate-mcp@latest"],
      "env": {
        "AUTOMATE_HF_TOKEN": "hf_...",
        "AUTOMATE_SCREEN_PARSER_URL": "https://your-omniparser-endpoint.aws.endpoints.huggingface.cloud",
        "AUTOMATE_ACTION_MODEL_URL": "https://your-uitars-endpoint.aws.endpoints.huggingface.cloud",
        "AUTOMATE_ACTION_MODEL_NAME": "ByteDance-Seed/UI-TARS-1.5-7B",
        "AUTOMATE_HF_NAMESPACE": "your-hf-username",
        "AUTOMATE_SCREEN_PARSER_ENDPOINT": "omniparser-v2",
        "AUTOMATE_ACTION_MODEL_ENDPOINT": "ui-tars-1-5-7b"
      }
    }
  }
}

See .env.example in the repo for the full reference.

Cloud Vision workflow

1. warm_endpoints   — wake up scaled-to-zero endpoints (1–5 min)
2. parse_screen     — detect all UI elements via cloud OmniParser
3. reason_action    — ask VLM what to click/type next
   — or —
   smart_act        — full autonomous loop: parse → reason → execute → repeat

🛠️ MCP Tools

Script library — save once, run forever:

Tool Description
list_scripts Show all saved automation scripts
run_script Run a saved script by name
save_script Save the current workflow as a reusable script
show_script View a script's contents
delete_script Delete a script
install_script Install a script from a URL or the community library

Cloud Vision — autonomous UI understanding (requires HF config):

Tool Description
cloud_vision_config Show current cloud vision configuration status
warm_endpoints Wake up scaled-to-zero HF endpoints before use
parse_screen Detect all UI elements via cloud OmniParser
reason_action Ask a VLM what GUI action to take next
smart_act Full autonomous loop: parse → reason → execute → repeat

Low-level desktop control — used when building or executing scripts:

Tool Description
screenshot Capture the screen and return as base64 PNG
click Click at screen coordinates
double_click Double-click at screen coordinates
type_text Type text (full Unicode / CJK support)
press_key Press a key or combo (e.g. ctrl+c, win)
scroll Scroll up or down
mouse_move Move cursor without clicking
drag Drag from one position to another

📚 Script Library

Scripts are saved as .md files in ~/.automate/scripts/ — human-readable, git-friendly, shareable.

---
name: jianying_export_douyin
description: Export the current 剪映 project as a 9:16 Douyin video
created: 2025-01-01
---

## Steps

1. Open export dialog [key:ctrl+e]
2. Select resolution 1080×1920 [click:coord=320,480]
3. Set format to MP4 [click:coord=320,560]
4. Click export [click:coord=800,650]
5. Wait for export to finish [wait:5]

Inline hint syntax:

Hint Action
[click:coord=320,240] Click at absolute screen coordinates
[type:hello] Type text
[key:ctrl+s] Press keyboard shortcut
[wait:2] Wait 2 seconds
[scroll_up] / [scroll_down] Scroll the page

Steps without hints are interpreted by the AI vision model at runtime.


📝 FAQ

Q: How is this different from just using Claude's computer-use capability?
autoMate provides persistent, reusable scripts. Once you automate a task, it's saved and runs instantly next time. Cloud Vision mode also lets autoMate do its own screen parsing without relying on Claude's vision.

Q: Why does Claude sometimes use Windows MCP / filesystem MCP instead of autoMate?
Update to v0.4.0+ — the server description now explicitly tells Claude when to use autoMate vs other MCPs.

Q: Do I need a GPU for Cloud Vision?
No — everything runs on HuggingFace Inference Endpoints in the cloud. You only need a HF token and deployed endpoints.

Q: Does it work on macOS / Linux?
Yes — all three platforms. This is the main advantage over Quicker (Windows-only).


🤝 Contributing


⭐ Every star encourages the creators and helps more people discover autoMate ⭐

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

automate_mcp-0.5.0.tar.gz (53.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

automate_mcp-0.5.0-py3-none-any.whl (57.7 kB view details)

Uploaded Python 3

File details

Details for the file automate_mcp-0.5.0.tar.gz.

File metadata

  • Download URL: automate_mcp-0.5.0.tar.gz
  • Upload date:
  • Size: 53.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for automate_mcp-0.5.0.tar.gz
Algorithm Hash digest
SHA256 72578eab76b4b11b8aa839c2dbd82434f26ac6259a5275c2d1b3813ba4b810c2
MD5 07d2012eb67c69fa70474b2c2019bd1f
BLAKE2b-256 81f1de7bf6c7d2d9ced10546e110541d71a69e5fb446214f231470b7b7f9f87e

See more details on using hashes here.

Provenance

The following attestation bundles were made for automate_mcp-0.5.0.tar.gz:

Publisher: publish.yml on yuruotong1/autoMate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file automate_mcp-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: automate_mcp-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 57.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for automate_mcp-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f30d8e06fa8de55c602dc55f35ddf50fc473046cec73a8a53ca9b4675af726d5
MD5 41419679df7dd596541428c39161ed42
BLAKE2b-256 b35313d2bd44d4ee0b8532d4097b7991b3405f33379706ee912436521c227ef7

See more details on using hashes here.

Provenance

The following attestation bundles were made for automate_mcp-0.5.0-py3-none-any.whl:

Publisher: publish.yml on yuruotong1/autoMate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page