Skip to main content

Cross-platform MCP server for computer automation and control

Project description

Computer MCP Server

A cross-platform Model Context Protocol (MCP) server for computer automation and control. Provides tools for mouse/keyboard automation, screenshot capture (included by default in all responses), and comprehensive state tracking including accessibility tree support.

Features

  • Mouse Control: Click, double-click, triple-click, button down/up, drag operations
  • Keyboard Control: Type text, key down/up/press
  • Screenshot Capture: Fast cross-platform screenshot using mss, returns base64-encoded PNG (included by default)
  • State Tracking: Configurable tracking of mouse position/buttons, keyboard keys, focused app, and accessibility tree
  • Accessibility Tree: Full platform-specific implementation for Windows, macOS, and Linux/Ubuntu
  • Zero Config: Screenshots included by default - no need to call screenshot tool separately

Installation

# Install core dependencies
pip install -e .

# Platform-specific optional dependencies (for enhanced features)
pip install -e ".[windows]"   # Windows: pywin32 for accessibility tree
pip install -e ".[macos]"      # macOS: pyobjc for native accessibility (AppleScript fallback available)
pip install -e ".[linux]"      # Linux: PyGObject for AT-SPI (requires: sudo apt install python3-gi gir1.2-atspi-2.0)

Usage

As MCP Server

Configure this server in your MCP client (e.g., Cursor, Claude Desktop):

{
  "mcpServers": {
    "computer-mcp": {
      "command": "python",
      "args": ["path/to/computer-mcp/main.py"]
    }
  }
}

Available Tools

Mouse Tools

  • click(button='left'|'middle'|'right') - Click at current cursor position
  • double_click(button='left'|'middle'|'right') - Double-click at current cursor position
  • triple_click(button='left'|'middle'|'right') - Triple-click at current cursor position
  • button_down(button='left'|'middle'|'right') - Press and hold a mouse button
  • button_up(button='left'|'middle'|'right') - Release a mouse button
  • drag(start={x, y}, end={x, y}, button='left') - Drag from start to end position
  • mouse_move(x, y) - Move cursor to specified coordinates

Keyboard Tools

  • type(text) - Type text string
  • key_down(key) - Press and hold a key
  • key_up(key) - Release a key
  • key_press(key) - Press and release a key (convenience)

Screenshot

  • screenshot() - Explicitly capture screenshot (but screenshots are included by default in all responses)

Configuration

  • set_config(...) - Configure observation options:
    • observe_screen (bool, default: true): Include screenshots in all responses
    • observe_mouse_position (bool, default: false): Track and include mouse position
    • observe_mouse_button_states (bool, default: false): Track and include mouse button states
    • observe_keyboard_key_states (bool, default: false): Track and include keyboard key states
    • observe_focused_app (bool, default: false): Include focused application information
    • observe_accessibility_tree (bool, default: false): Include accessibility tree

Example Tool Calls

# Click at current cursor position (screenshot included automatically)
click(button="left")

# Drag operation
drag(start={"x": 100, "y": 200}, end={"x": 300, "y": 400}, button="left")

# Type text
type(text="Hello World")

# Move mouse then click
mouse_move(x=500, y=500)
click(button="right")

# Enable full observation
set_config(
    observe_screen=True,              # Default true
    observe_mouse_position=True,
    observe_mouse_button_states=True,
    observe_keyboard_key_states=True,
    observe_focused_app=True,
    observe_accessibility_tree=True
)

# Now all tool responses include comprehensive state
click(button="left")  # Includes: screenshot, mouse position, button states, keyboard states, focused app, accessibility tree

Key Names

Special keys can be specified as strings:

  • "ctrl", "alt", "shift", "cmd" (or "win" on Windows)
  • "space", "enter", "tab", "esc", "backspace"
  • Arrow keys: "up", "down", "left", "right"
  • Function keys: "f1" through "f12"
  • Regular characters: "a", "b", etc.

Platform Support

Windows

  • Full Support: All mouse/keyboard operations work
  • Focused App: Requires pywin32 (install with pip install -e ".[windows]")
  • Accessibility Tree: Uses Windows UI Automation API (requires pywin32)

macOS

  • Full Support: All mouse/keyboard operations work
  • Focused App: Uses AppleScript (no dependencies)
  • Accessibility Tree:
    • Native: Uses AXUIElement via pyobjc (install with pip install -e ".[macos]")
    • Fallback: Uses AppleScript (works without dependencies, limited tree depth)

Linux/Ubuntu

  • Full Support: All mouse/keyboard operations work
  • Focused App: Uses xdotool (install: sudo apt install xdotool)
  • Accessibility Tree:
    • Native: Uses AT-SPI via PyGObject (install: sudo apt install python3-gi gir1.2-atspi-2.0, then pip install -e ".[linux]")
    • Fallback: Basic window info via xdotool

Configuration Schema

The set_config tool accepts the following options:

{
  "observe_screen": true,                // Include screenshots (default: true)
  "observe_mouse_position": false,       // Track mouse position
  "observe_mouse_button_states": false,  // Track mouse button states
  "observe_keyboard_key_states": false,  // Track keyboard key states
  "observe_focused_app": false,          // Include focused app info
  "observe_accessibility_tree": false    // Include accessibility tree
}

Response Format

By default (with observe_screen: true), all tool responses include a screenshot:

{
  "success": true,
  "action": "click",
  "button": "left",
  "screenshot": {
    "format": "base64_png",
    "data": "iVBORw0KGgoAAAANSUhEUgAA...",
    "width": 1920,
    "height": 1080
  }
}

With full observation enabled:

{
  "success": true,
  "action": "click",
  "button": "left",
  "screenshot": {
    { "format": "base64_png", "data": "...", "width": 1920, "height": 1080 }
  },
  "mouse_position": {"x": 500, "y": 300},
  "mouse_button_states": ["Button.left"],
  "keyboard_key_states": ["ctrl"],
  "focused_app": {
    "name": "Code",
    "pid": 12345,
    "title": "main.py - computer-mcp"
  },
  "accessibility_tree": {
    "tree": {
      "name": "Application",
      "control_type": "...",
      "bounds": {"x": 0, "y": 0, "width": 1920, "height": 1080},
      "children": [...]
    }
  }
}

Architecture

  • Uses pynput for cross-platform mouse/keyboard control and state tracking
  • Uses mss for fast screenshot capture
  • Uses mcp Python SDK for MCP server implementation
  • State listeners start/stop dynamically based on configuration to minimize overhead
  • Screenshots captured on-demand but included automatically in all responses (when enabled)

Accessibility Tree Details

Windows

  • Uses Windows UI Automation API via win32com
  • Provides full control tree with names, types, bounds, and children
  • Focuses on the currently focused window
  • Limited to 50 children per element and max depth of 5 levels to prevent huge responses

macOS

  • Native: Uses AXUIElement API via pyobjc for full accessibility tree
  • Fallback: Uses AppleScript with System Events for basic UI element enumeration
  • AppleScript fallback works without dependencies but has limited depth

Linux/Ubuntu

  • Uses AT-SPI (Assistive Technology Service Provider Interface) via PyGObject
  • Provides desktop-wide accessibility tree
  • Requires system packages: python3-gi and gir1.2-atspi-2.0

Notes

  • Screenshots are included by default in all tool responses (when observe_screen: true)
  • Mouse tools operate at the current cursor position unless you explicitly move the mouse first
  • State tracking listeners are automatically started/stopped based on configuration
  • Accessibility tree implementations may vary in depth and detail across platforms
  • Some platform-specific features require optional dependencies or system packages

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

computer_mcp-0.0.1.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

computer_mcp-0.0.1-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file computer_mcp-0.0.1.tar.gz.

File metadata

  • Download URL: computer_mcp-0.0.1.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for computer_mcp-0.0.1.tar.gz
Algorithm Hash digest
SHA256 91633c11092345b222bdd2bf3028c66888b1bac2fa5ea6c180692e06bba898fd
MD5 b46878b7603c89dc56258b4dbbc3d18e
BLAKE2b-256 46d3eb36f7e035c973b151a4c7843178f4ef2a98fcd1293087eacc6fde7422a8

See more details on using hashes here.

File details

Details for the file computer_mcp-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: computer_mcp-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for computer_mcp-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bdad78577e46c5bd09e200617ce5c7efc61d98b9c79fb4e2b1055fdba4035a34
MD5 b857b5482eeedd5e286967be0db276c7
BLAKE2b-256 223b87e235057e0e97786895828ba2ee7e1ff273116356cc751034ef1a49babe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page