Skip to main content

AI-driven screen automation MCP Server

Project description

中文 | English

AutoGUI

AI-driven screen automation MCP Server. Send natural language tasks, and the internal AI captures screenshots, analyzes them, and performs mouse/keyboard actions autonomously.

Architecture

MCP Client (Claude Code, Claude Desktop, Cursor, etc.)
    |  stdio
    v
server.py (FastMCP async orchestration loop)
    |
    v
agent.py (ScreenAgent toolkit: capture, execute, parse, safety)

Quick Start

git clone https://github.com/stellariums/AutoGUI.git
cd AutoGUI
pip install -r requirements.txt

Configuration

AutoGUI supports layered configuration: environment variables (highest priority) > config.json > defaults.

Option A: Environment Variables Only (Simplest)

Only 3 variables needed to get started:

set AUTOGUI_API_KEY=your-api-key
set AUTOGUI_BASE_URL=https://api.openai.com/v1
set AUTOGUI_MODEL=gpt-4o

Or copy .env.example to .env and pass via your MCP client config (see below).

Option B: Config File

cp config.json.example config.json

Edit config.json — only the api section is required, everything else has sensible defaults:

{
  "api": {
    "base_url": "https://your-api-endpoint/v1",
    "api_key": "your-api-key",
    "model": "your-model-name"
  }
}
Advanced config options
{
  "screen": {
    "max_width": 1280,
    "max_height": 720,
    "allowed_region": null
  },
  "agent": {
    "max_iterations": 20,
    "delay_between_actions": 1.0,
    "max_history_rounds": 10
  },
  "safety": {
    "enable_confirmation": true,
    "fallback_action": "block",
    "dangerous_keys": ["delete", "backspace", "escape"],
    "dangerous_hotkeys": [["ctrl", "w"], ["alt", "f4"]],
    "dangerous_patterns": ["rm ", "del ", "format ", "shutdown"]
  }
}

MCP Client Setup

Claude Code

claude mcp add AutoGUI -- python /path/to/AutoGUI/server.py

Or add to .mcp.json:

{
  "mcpServers": {
    "AutoGUI": {
      "command": "python",
      "args": ["/path/to/AutoGUI/server.py"],
      "env": {
        "AUTOGUI_API_KEY": "your-api-key",
        "AUTOGUI_BASE_URL": "https://api.openai.com/v1",
        "AUTOGUI_MODEL": "gpt-4o"
      }
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "AutoGUI": {
      "command": "python",
      "args": ["C:/path/to/AutoGUI/server.py"],
      "env": {
        "AUTOGUI_API_KEY": "your-api-key",
        "AUTOGUI_BASE_URL": "https://api.openai.com/v1",
        "AUTOGUI_MODEL": "gpt-4o"
      }
    }
  }
}

Cursor

Add to Cursor MCP settings (.cursor/mcp.json):

{
  "mcpServers": {
    "AutoGUI": {
      "command": "python",
      "args": ["/path/to/AutoGUI/server.py"],
      "env": {
        "AUTOGUI_API_KEY": "your-api-key",
        "AUTOGUI_BASE_URL": "https://api.openai.com/v1",
        "AUTOGUI_MODEL": "gpt-4o"
      }
    }
  }
}

MCP Inspector (Testing)

npx @modelcontextprotocol/inspector python server.py

Tool

Tool Description
autogui_execute_task Execute a screen automation task via natural language

Supported Actions

Action Description
click Click at position
double_click Double click
right_click Right click
type Input text (supports CJK via clipboard)
press Key combination
scroll Scroll
drag Drag
move Move cursor
wait Wait
task_complete Mark task as done

Safety

  • Dangerous action detection (rule-based + AI self-labeling)
  • Configurable dangerous keys, hotkeys, and text patterns
  • Optional region restriction (allowed_region)
  • Elicit-based confirmation for dangerous operations
  • Configurable fallback: block or allow

FAQ

Q: Screenshot is black or empty A: Make sure the screen is not locked. On Windows, pyautogui/mss cannot capture the lock screen.

Q: Chinese input not working A: AutoGUI uses clipboard (pyperclip + Ctrl+V) for text input, which supports CJK characters. Make sure pyperclip is installed.

Q: "API key required" error A: Set AUTOGUI_API_KEY env var or add api.api_key in config.json.

Q: "Another task is already running" error A: AutoGUI processes one task at a time. Wait for the current task to finish.

Requirements

  • Windows 10/11
  • Python >= 3.10
  • An OpenAI-compatible vision API (GPT-4o, Qwen-VL, etc.)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autogui_mcp-0.1.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autogui_mcp-0.1.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file autogui_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: autogui_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for autogui_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e4cdc105f7825b1dad42fa570534738b25e3ac6644bd6d58690e5014ea062fc3
MD5 d886205fd05a71c1b5a020769a66f751
BLAKE2b-256 1637fce0c21d8b87f0c2fe7314cdf7874d5056496f38fa77a057ecf44137dc8b

See more details on using hashes here.

File details

Details for the file autogui_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: autogui_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for autogui_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d3fd9f02683eeaff40dc5cd1c20ba2713aa433924d2d35beb076b98a508be485
MD5 7dae4473aa38866de8b2d174549461e0
BLAKE2b-256 bd9706b671bfac2038a3d0d6fbdf8a4e36d4258f6a247b1f3d6260c433e4c9db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page