Cross-platform MCP server for computer automation and control
Project description
Computer MCP Server
A cross-platform Model Context Protocol (MCP) server for computer automation and control. Provides tools for mouse/keyboard automation, screenshot capture (included by default in all responses), and comprehensive state tracking including accessibility tree support.
Features
- Mouse Control: Click, double-click, triple-click, button down/up, drag operations
- Keyboard Control: Type text, key down/up/press
- Screenshot Capture: Fast cross-platform screenshot using
mss, returns images as MCP ImageContent (included by default) - State Tracking: Configurable tracking of mouse position/buttons, keyboard keys, focused app, and accessibility tree
- Accessibility Tree: Full platform-specific implementation for Windows, macOS, and Linux/Ubuntu
- Zero Config: Screenshots included by default - no need to call screenshot tool separately
Installation
# Install core dependencies
pip install -e .
# Platform-specific optional dependencies (for enhanced features)
pip install -e ".[windows]" # Windows: pywin32 for accessibility tree
pip install -e ".[macos]" # macOS: pyobjc for native accessibility (AppleScript fallback available)
pip install -e ".[linux]" # Linux: PyGObject for AT-SPI (requires: sudo apt install python3-gi gir1.2-atspi-2.0)
Usage
As MCP Server
Configure this server in your MCP client (e.g., Cursor, Claude Desktop):
{
"mcpServers": {
"computer-mcp": {
"command": "uvx",
"args": ["computer-mcp"]
}
}
}
Note: uvx automatically installs and runs the package if not already installed. Make sure you have uv installed.
Available Tools
Mouse Tools
click(button='left'|'middle'|'right')- Click at current cursor positiondouble_click(button='left'|'middle'|'right')- Double-click at current cursor positiontriple_click(button='left'|'middle'|'right')- Triple-click at current cursor positionbutton_down(button='left'|'middle'|'right')- Press and hold a mouse buttonbutton_up(button='left'|'middle'|'right')- Release a mouse buttondrag(start={x, y}, end={x, y}, button='left')- Drag from start to end positionmouse_move(x, y)- Move cursor to specified coordinates
Keyboard Tools
type(text)- Type text stringkey_down(key)- Press and hold a keykey_up(key)- Release a keykey_press(key)- Press and release a key (convenience)
Screenshot
screenshot()- Explicitly capture screenshot (but screenshots are included by default in all responses)
Configuration
set_config(...)- Configure observation options:observe_screen(bool, default:true): Include screenshots in all responsesobserve_mouse_position(bool, default:false): Track and include mouse positionobserve_mouse_button_states(bool, default:false): Track and include mouse button statesobserve_keyboard_key_states(bool, default:false): Track and include keyboard key statesobserve_focused_app(bool, default:false): Include focused application informationobserve_accessibility_tree(bool, default:false): Include accessibility tree
Example Tool Calls
# Click at current cursor position (screenshot included automatically)
click(button="left")
# Drag operation
drag(start={"x": 100, "y": 200}, end={"x": 300, "y": 400}, button="left")
# Type text
type(text="Hello World")
# Move mouse then click
mouse_move(x=500, y=500)
click(button="right")
# Enable full observation
set_config(
observe_screen=True, # Default true
observe_mouse_position=True,
observe_mouse_button_states=True,
observe_keyboard_key_states=True,
observe_focused_app=True,
observe_accessibility_tree=True
)
# Now all tool responses include comprehensive state
click(button="left") # Includes: screenshot, mouse position, button states, keyboard states, focused app, accessibility tree
Key Names
Special keys can be specified as strings:
"ctrl","alt","shift","cmd"(or"win"on Windows)"space","enter","tab","esc","backspace"- Arrow keys:
"up","down","left","right" - Function keys:
"f1"through"f12" - Regular characters:
"a","b", etc.
Platform Support
Windows
- Full Support: All mouse/keyboard operations work
- Focused App: Requires
pywin32(install withpip install -e ".[windows]") - Accessibility Tree: Uses Windows UI Automation API (requires
pywin32)
macOS
- Full Support: All mouse/keyboard operations work
- Focused App: Uses AppleScript (no dependencies)
- Accessibility Tree:
- Native: Uses AXUIElement via
pyobjc(install withpip install -e ".[macos]") - Fallback: Uses AppleScript (works without dependencies, limited tree depth)
- Native: Uses AXUIElement via
Linux/Ubuntu
- Full Support: All mouse/keyboard operations work
- Focused App: Uses
xdotool(install:sudo apt install xdotool) - Accessibility Tree:
- Native: Uses AT-SPI via PyGObject (install:
sudo apt install python3-gi gir1.2-atspi-2.0, thenpip install -e ".[linux]") - Fallback: Basic window info via
xdotool
- Native: Uses AT-SPI via PyGObject (install:
Configuration Schema
The set_config tool accepts the following options:
{
"observe_screen": true, // Include screenshots (default: true)
"observe_mouse_position": false, // Track mouse position
"observe_mouse_button_states": false, // Track mouse button states
"observe_keyboard_key_states": false, // Track keyboard key states
"observe_focused_app": false, // Include focused app info
"observe_accessibility_tree": false // Include accessibility tree
}
Response Format
By default (with observe_screen: true), all tool responses include a screenshot as MCP ImageContent, which displays as an actual image in MCP clients:
Response Structure:
ImageContent(type: "image"): Contains the screenshot as base64-encoded PNG with mimeType "image/png"TextContent(type: "text"): Contains JSON with action results and screenshot metadata:
{
"success": true,
"action": "click",
"button": "left",
"screenshot": {
"format": "base64_png",
"width": 1920,
"height": 1080
}
}
With full observation enabled, the TextContent includes additional state:
{
"success": true,
"action": "click",
"button": "left",
"screenshot": {
"format": "base64_png",
"width": 1920,
"height": 1080
},
"mouse_position": {"x": 500, "y": 300},
"mouse_button_states": ["Button.left"],
"keyboard_key_states": ["ctrl"],
"focused_app": {
"name": "Code",
"pid": 12345,
"title": "main.py - computer-mcp"
},
"accessibility_tree": {
"tree": {
"name": "Application",
"control_type": "...",
"bounds": {"x": 0, "y": 0, "width": 1920, "height": 1080},
"children": [...]
}
}
}
Note: Screenshots are returned as ImageContent objects that display as actual images in MCP clients. The base64 image data is only included in the ImageContent, not in the JSON metadata.
Architecture
- Uses
pynputfor cross-platform mouse/keyboard control and state tracking - Uses
mssfor fast screenshot capture - Uses
mcpPython SDK for MCP server implementation - State listeners start/stop dynamically based on configuration to minimize overhead
- Screenshots captured on-demand but included automatically in all responses (when enabled)
Accessibility Tree Details
Windows
- Uses Windows UI Automation API via
win32com - Provides full control tree with names, types, bounds, and children
- Focuses on the currently focused window
- Limited to 50 children per element and max depth of 5 levels to prevent huge responses
macOS
- Native: Uses AXUIElement API via
pyobjcfor full accessibility tree - Fallback: Uses AppleScript with System Events for basic UI element enumeration
- AppleScript fallback works without dependencies but has limited depth
Linux/Ubuntu
- Uses AT-SPI (Assistive Technology Service Provider Interface) via PyGObject
- Provides desktop-wide accessibility tree
- Requires system packages:
python3-giandgir1.2-atspi-2.0
Notes
- Screenshots are included by default in all tool responses (when
observe_screen: true) - Mouse tools operate at the current cursor position unless you explicitly move the mouse first
- State tracking listeners are automatically started/stopped based on configuration
- Accessibility tree implementations may vary in depth and detail across platforms
- Some platform-specific features require optional dependencies or system packages
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file computer_mcp-0.0.3.tar.gz.
File metadata
- Download URL: computer_mcp-0.0.3.tar.gz
- Upload date:
- Size: 329.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2b43eed7ced1ab0b751629adac58aab7f4919b09926e9781766fba1dbec0f33
|
|
| MD5 |
73e31a0e64f221ec3b0edcce1d4e360e
|
|
| BLAKE2b-256 |
843ba2ee7dcc99d92bcfd895b3cd732e92b1cb870ceddfa6cfe45ef446a62d42
|
File details
Details for the file computer_mcp-0.0.3-py3-none-any.whl.
File metadata
- Download URL: computer_mcp-0.0.3-py3-none-any.whl
- Upload date:
- Size: 180.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a38460b6803655daf60a8a2fd5d02e7aaeb12bea317b45edfdedc39453ae5855
|
|
| MD5 |
ee270559ea728524e278b73da643c9f2
|
|
| BLAKE2b-256 |
4fc80ee17c5af9556e3cc10e4903aef2124fb9ab000b1bf0fbf66253b313ea77
|