Skip to main content

AI-driven screen automation MCP Server

Project description

中文 | English

AutoGUI

AI-driven screen automation MCP Server. Send natural language tasks, and the internal AI captures screenshots, analyzes them, and performs mouse/keyboard actions autonomously.

Architecture

MCP Client (Claude Code, Claude Desktop, Cursor, etc.)
    |  stdio
    v
server.py (FastMCP async orchestration loop)
    |
    v
agent.py (ScreenAgent toolkit: capture, execute, parse, safety)

Quick Start

git clone https://github.com/stellariums/AutoGUI.git
cd AutoGUI
pip install -r requirements.txt

Configuration

AutoGUI supports layered configuration: environment variables (highest priority) > config.json > defaults.

Option A: Environment Variables Only (Simplest)

Only 3 variables needed to get started:

set AUTOGUI_API_KEY=your-api-key
set AUTOGUI_BASE_URL=https://api.openai.com/v1
set AUTOGUI_MODEL=gpt-4o

Or copy .env.example to .env and pass via your MCP client config (see below).

Option B: Config File

cp config.json.example config.json

Edit config.json — only the api section is required, everything else has sensible defaults:

{
  "api": {
    "base_url": "https://your-api-endpoint/v1",
    "api_key": "your-api-key",
    "model": "your-model-name"
  }
}
Advanced config options
{
  "screen": {
    "max_width": 1280,
    "max_height": 720,
    "allowed_region": null
  },
  "agent": {
    "max_iterations": 20,
    "delay_between_actions": 1.0,
    "max_history_rounds": 10
  },
  "safety": {
    "enable_confirmation": true,
    "fallback_action": "block",
    "dangerous_keys": ["delete", "backspace", "escape"],
    "dangerous_hotkeys": [["ctrl", "w"], ["alt", "f4"]],
    "dangerous_patterns": ["rm ", "del ", "format ", "shutdown"]
  }
}

MCP Client Setup

Claude Code

claude mcp add AutoGUI -- python /path/to/AutoGUI/server.py

Or add to .mcp.json:

{
  "mcpServers": {
    "AutoGUI": {
      "command": "python",
      "args": ["/path/to/AutoGUI/server.py"],
      "env": {
        "AUTOGUI_API_KEY": "your-api-key",
        "AUTOGUI_BASE_URL": "https://api.openai.com/v1",
        "AUTOGUI_MODEL": "gpt-4o"
      }
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "AutoGUI": {
      "command": "python",
      "args": ["C:/path/to/AutoGUI/server.py"],
      "env": {
        "AUTOGUI_API_KEY": "your-api-key",
        "AUTOGUI_BASE_URL": "https://api.openai.com/v1",
        "AUTOGUI_MODEL": "gpt-4o"
      }
    }
  }
}

Cursor

Add to Cursor MCP settings (.cursor/mcp.json):

{
  "mcpServers": {
    "AutoGUI": {
      "command": "python",
      "args": ["/path/to/AutoGUI/server.py"],
      "env": {
        "AUTOGUI_API_KEY": "your-api-key",
        "AUTOGUI_BASE_URL": "https://api.openai.com/v1",
        "AUTOGUI_MODEL": "gpt-4o"
      }
    }
  }
}

MCP Inspector (Testing)

npx @modelcontextprotocol/inspector python server.py

Tool

Tool Description
autogui_execute_task Execute a screen automation task via natural language

Supported Actions

Action Description
click Click at position
double_click Double click
right_click Right click
type Input text (supports CJK via clipboard)
press Key combination
scroll Scroll
drag Drag
move Move cursor
wait Wait
task_complete Mark task as done

Safety

  • Dangerous action detection (rule-based + AI self-labeling)
  • Configurable dangerous keys, hotkeys, and text patterns
  • Optional region restriction (allowed_region)
  • Elicit-based confirmation for dangerous operations
  • Configurable fallback: block or allow

FAQ

Q: Screenshot is black or empty A: Make sure the screen is not locked. On Windows, pyautogui/mss cannot capture the lock screen.

Q: Chinese input not working A: AutoGUI uses clipboard (pyperclip + Ctrl+V) for text input, which supports CJK characters. Make sure pyperclip is installed.

Q: "API key required" error A: Set AUTOGUI_API_KEY env var or add api.api_key in config.json.

Q: "Another task is already running" error A: AutoGUI processes one task at a time. Wait for the current task to finish.

Requirements

  • Windows 10/11
  • Python >= 3.10
  • An OpenAI-compatible vision API (GPT-4o, Qwen-VL, etc.)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autogui_mcp-0.1.1.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autogui_mcp-0.1.1-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file autogui_mcp-0.1.1.tar.gz.

File metadata

  • Download URL: autogui_mcp-0.1.1.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for autogui_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 892e5b115debbf9c57f1bef6a6c8b7d8c521fe9c267876c2e4528944d9240fcd
MD5 42c953b3b752650d47ba8da2a070066e
BLAKE2b-256 8c3d9cd7ade93f442a294ca5b43ea401f4c7aca019b5fe520675d172ef0db1aa

See more details on using hashes here.

File details

Details for the file autogui_mcp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: autogui_mcp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for autogui_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 967fbaa17bd2e278798151b2e6728f942cb8fc7d1c67ef92fd308c37c42bf2e3
MD5 f0d94a01b71251bf38d9539146abd087
BLAKE2b-256 0a6c266ce201b530fe4e0fddb58b976d040eda82e354b8cc3625610fded0ad0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page