Skip to main content

AI-driven screen automation MCP Server

Project description

中文 | English

AutoGUI

AI-driven screen automation MCP Server. Send natural language tasks, and the internal AI captures screenshots, analyzes them, and performs mouse/keyboard actions autonomously.

Architecture

MCP Client (Claude Code, Claude Desktop, Cursor, etc.)
    |  stdio
    v
server.py (FastMCP async orchestration loop)
    |
    v
agent.py (ScreenAgent toolkit: capture, execute, parse, safety)

Quick Start

git clone https://github.com/stellariums/AutoGUI.git
cd AutoGUI
pip install -r requirements.txt

Configuration

AutoGUI supports layered configuration: environment variables (highest priority) > config.json > defaults.

Option A: Environment Variables Only (Simplest)

Only 3 variables needed to get started:

set AUTOGUI_API_KEY=your-api-key
set AUTOGUI_BASE_URL=https://api.openai.com/v1
set AUTOGUI_MODEL=gpt-4o

Or copy .env.example to .env and pass via your MCP client config (see below).

Option B: Config File

cp config.json.example config.json

Edit config.json — only the api section is required, everything else has sensible defaults:

{
  "api": {
    "base_url": "https://your-api-endpoint/v1",
    "api_key": "your-api-key",
    "model": "your-model-name"
  }
}
Advanced config options
{
  "screen": {
    "max_width": 1280,
    "max_height": 720,
    "allowed_region": null
  },
  "agent": {
    "max_iterations": 20,
    "delay_between_actions": 1.0,
    "max_history_rounds": 10
  },
  "safety": {
    "enable_confirmation": true,
    "fallback_action": "block",
    "dangerous_keys": ["delete", "backspace", "escape"],
    "dangerous_hotkeys": [["ctrl", "w"], ["alt", "f4"]],
    "dangerous_patterns": ["rm ", "del ", "format ", "shutdown"]
  }
}

MCP Client Setup

Claude Code

claude mcp add AutoGUI -- python /path/to/AutoGUI/server.py

Or add to .mcp.json:

{
  "mcpServers": {
    "AutoGUI": {
      "command": "python",
      "args": ["/path/to/AutoGUI/server.py"],
      "env": {
        "AUTOGUI_API_KEY": "your-api-key",
        "AUTOGUI_BASE_URL": "https://api.openai.com/v1",
        "AUTOGUI_MODEL": "gpt-4o"
      }
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "AutoGUI": {
      "command": "python",
      "args": ["C:/path/to/AutoGUI/server.py"],
      "env": {
        "AUTOGUI_API_KEY": "your-api-key",
        "AUTOGUI_BASE_URL": "https://api.openai.com/v1",
        "AUTOGUI_MODEL": "gpt-4o"
      }
    }
  }
}

Cursor

Add to Cursor MCP settings (.cursor/mcp.json):

{
  "mcpServers": {
    "AutoGUI": {
      "command": "python",
      "args": ["/path/to/AutoGUI/server.py"],
      "env": {
        "AUTOGUI_API_KEY": "your-api-key",
        "AUTOGUI_BASE_URL": "https://api.openai.com/v1",
        "AUTOGUI_MODEL": "gpt-4o"
      }
    }
  }
}

MCP Inspector (Testing)

npx @modelcontextprotocol/inspector python server.py

Tool

Tool Description
autogui_execute_task Execute a screen automation task via natural language

Supported Actions

Action Description
click Click at position
double_click Double click
right_click Right click
type Input text (supports CJK via clipboard)
press Key combination
scroll Scroll
drag Drag
move Move cursor
wait Wait
task_complete Mark task as done

Safety

  • Dangerous action detection (rule-based + AI self-labeling)
  • Configurable dangerous keys, hotkeys, and text patterns
  • Optional region restriction (allowed_region)
  • Elicit-based confirmation for dangerous operations
  • Configurable fallback: block or allow

FAQ

Q: Screenshot is black or empty A: Make sure the screen is not locked. On Windows, pyautogui/mss cannot capture the lock screen.

Q: Chinese input not working A: AutoGUI uses clipboard (pyperclip + Ctrl+V) for text input, which supports CJK characters. Make sure pyperclip is installed.

Q: "API key required" error A: Set AUTOGUI_API_KEY env var or add api.api_key in config.json.

Q: "Another task is already running" error A: AutoGUI processes one task at a time. Wait for the current task to finish.

Requirements

  • Windows 10/11
  • Python >= 3.10
  • An OpenAI-compatible vision API (GPT-4o, Qwen-VL, etc.)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autogui_mcp-0.1.2.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autogui_mcp-0.1.2-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file autogui_mcp-0.1.2.tar.gz.

File metadata

  • Download URL: autogui_mcp-0.1.2.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for autogui_mcp-0.1.2.tar.gz
Algorithm Hash digest
SHA256 93bc02dd5fa9898bdae97e26d42e4f046bacc6479e31becbe0f27f3c6cda5cfd
MD5 55c9104e0a65b3444787f4e93bcebe34
BLAKE2b-256 ff6e63eebea1332bd538d1db53855ab033f27534f838413bc588f8ce1631bf1f

See more details on using hashes here.

File details

Details for the file autogui_mcp-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: autogui_mcp-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for autogui_mcp-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a173381eb7bfac866e88b85473e915db759d12b7525b5cf3a72c9752afbc229f
MD5 8d95ed1d2f954deccc7a39462331db9f
BLAKE2b-256 09a0541e354f2bb66b54c5499d986618c50b0eb6db4562545f902d9046a01d6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page