Skip to main content

MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies.

Project description

Computer Control MCP

MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies.

Discord License: MIT PyPi

MCP Computer Control Demo

Quick Usage (MCP Setup Using uvx)

Note: Running uvx computer-control-mcp@latest for the first time will download python dependencies (around 70MB) which may take some time. Recommended to run this in a terminal before using it as MCP. Subsequent runs will be instant.

{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "uvx",
      "args": ["computer-control-mcp@latest"]
    }
  }
}

OR install globally with pip:

pip install computer-control-mcp

Then run the server with:

computer-control-mcp # instead of uvx computer-control-mcp, so you can use the latest version, also you can `uv cache clean` to clear the cache and `uvx` again to use latest version.

Features

  • Control mouse movements and clicks
  • Type text at the current cursor position
  • Take screenshots of the entire screen or specific windows with optional saving to downloads directory
  • Extract text from screenshots using OCR (Optical Character Recognition)
  • List and activate windows
  • Press keyboard keys
  • Drag and drop operations
  • Enhanced screenshot capture for GPU-accelerated windows (Windows only)

Why Windows Graphics Capture (WGC) is Needed

Traditional screenshot methods like GDI/PrintWindow cannot capture the content of GPU-accelerated windows, resulting in black screens. This affects:

  • Games and 3D applications that use DirectX/OpenGL
  • Media players with hardware-accelerated video decoding
  • Electron applications like Discord, WhatsApp, and Slack
  • Browsers with GPU acceleration enabled
  • Streaming/recording software like OBS Studio
  • CAD and design software that utilize GPU rendering

WGC solves this by using the modern Windows Graphics Capture API, which can capture frames directly from the GPU composition surface, bypassing the limitations of traditional capture methods.

Configuration

Custom Screenshot Directory

By default, screenshots are saved to the OS downloads directory. You can customize this by setting the COMPUTER_CONTROL_MCP_SCREENSHOT_DIR environment variable:

{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "uvx",
      "args": ["computer-control-mcp@latest"],
      "env": {
        "COMPUTER_CONTROL_MCP_SCREENSHOT_DIR": "C:\\Users\\YourName\\Pictures\\Screenshots"
      }
    }
  }
}

Or set it system-wide:

# Windows (PowerShell)
$env:COMPUTER_CONTROL_MCP_SCREENSHOT_DIR = "C:\Users\YourName\Pictures\Screenshots"

# macOS/Linux
export COMPUTER_CONTROL_MCP_SCREENSHOT_DIR="/home/yourname/Pictures/Screenshots"

If the specified directory doesn't exist, the server will fall back to the default downloads directory.

Automatic WGC for Specific Windows

You can configure the system to automatically use Windows Graphics Capture (WGC) for specific windows by setting the COMPUTER_CONTROL_MCP_WGC_PATTERNS environment variable. This variable should contain comma-separated patterns that match window titles:

{
  "mcpServers": {
    "computer-control-mcp": {
      "command": "uvx",
      "args": ["computer-control-mcp@latest"],
      "env": {
        "COMPUTER_CONTROL_MCP_WGC_PATTERNS": "obs, discord, game, steam"
      }
    }
  }
}

Or set it system-wide:

# Windows (PowerShell)
$env:COMPUTER_CONTROL_MCP_WGC_PATTERNS = "obs, discord, game, steam"

# macOS/Linux
export COMPUTER_CONTROL_MCP_WGC_PATTERNS="obs, discord, game, steam"

When this variable is set, any window whose title contains any of the specified patterns will automatically use WGC for screenshot capture, eliminating black screens for GPU-accelerated applications.

Windows Graphics Capture (WGC) Support

On Windows 10 version 1803 and later, this package supports the Windows Graphics Capture (WGC) API for enhanced screenshot capabilities. This is particularly useful for capturing GPU-accelerated windows such as:

  • OBS Studio
  • Games and 3D applications
  • Discord, WhatsApp (Electron applications)
  • Video players with hardware decode
  • Browsers with GPU acceleration

To enable WGC support, install the optional dependency:

pip install windows-capture

When WGC is available, the take_screenshot tool will automatically attempt to use it for window captures when:

  1. The use_wgc parameter is set to True
  2. The window title matches any pattern defined in the COMPUTER_CONTROL_MCP_WGC_PATTERNS environment variable

Available Tools

Mouse Control

  • click_screen(x: int, y: int): Click at specified screen coordinates
  • move_mouse(x: int, y: int): Move mouse cursor to specified coordinates
  • drag_mouse(from_x: int, from_y: int, to_x: int, to_y: int, duration: float = 0.5): Drag mouse from one position to another
  • mouse_down(button: str = "left"): Hold down a mouse button ('left', 'right', 'middle')
  • mouse_up(button: str = "left"): Release a mouse button ('left', 'right', 'middle')

Keyboard Control

  • type_text(text: str): Type the specified text at current cursor position
  • press_key(key: str): Press a specified keyboard key
  • key_down(key: str): Hold down a specific keyboard key until released
  • key_up(key: str): Release a specific keyboard key
  • press_keys(keys: Union[str, List[Union[str, List[str]]]]): Press keyboard keys (supports single keys, sequences, and combinations)

Screen and Window Management

  • take_screenshot(title_pattern: str = None, use_regex: bool = False, threshold: int = 60, scale_percent_for_ocr: int = None, save_to_downloads: bool = False, use_wgc: bool = False): Capture screen or window
  • take_screenshot_with_ocr(title_pattern: str = None, use_regex: bool = False, threshold: int = 10, scale_percent_for_ocr: int = None, save_to_downloads: bool = False): Extract adn return text with coordinates using OCR from screen or window
  • get_screen_size(): Get current screen resolution
  • list_windows(): List all open windows
  • activate_window(title_pattern: str, use_regex: bool = False, threshold: int = 60): Bring specified window to foreground
  • wait_milliseconds(milliseconds: int): Wait for a specified number of milliseconds

Development

Setting up the Development Environment

# Clone the repository
git clone https://github.com/AB498/computer-control-mcp.git
cd computer-control-mcp

# Build/Run:

# 1. Install in development mode | Meaning that your edits to source code will be reflected in the installed package.
pip install -e .

# Then Start server | This is equivalent to `uvx computer-control-mcp@latest` just the local code is used
computer-control-mcp

# -- OR --

# 2. Build after `pip install hatch` | This needs version increment in orer to reflect code changes
hatch build

# Windows
$latest = Get-ChildItem .\dist\*.whl | Sort-Object LastWriteTime -Descending | Select-Object -First 1
pip install $latest.FullName --upgrade 

# Non-windows
pip install dist/*.whl --upgrade

# Run
computer-control-mcp

Running Tests

python -m pytest

API Reference

See the API Reference for detailed information about the available functions and classes.

License

MIT

For more information or help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

computer_control_mcp-0.3.9.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

computer_control_mcp-0.3.9-py3-none-any.whl (1.8 MB view details)

Uploaded Python 3

File details

Details for the file computer_control_mcp-0.3.9.tar.gz.

File metadata

File hashes

Hashes for computer_control_mcp-0.3.9.tar.gz
Algorithm Hash digest
SHA256 b39aed2c8f87ed28e5f8e7c8caf3e2fea3a4edc2aa06824c8b67741090d0d7c6
MD5 ac1a462be9e2eb9b1d66c5989e270f73
BLAKE2b-256 2b4038a20d81d5c0f5372d8621908aa680b11ab1b005f4f0b4510ad5336077b8

See more details on using hashes here.

File details

Details for the file computer_control_mcp-0.3.9-py3-none-any.whl.

File metadata

File hashes

Hashes for computer_control_mcp-0.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 9ba05a094fc71f7df981ccd30f219bd4c11fbb893fc417d27b99b1bff29c7900
MD5 28768efed1b0df35251618b82aedb1a3
BLAKE2b-256 e0302772e9a36d9070bc3292dffa59c499b668361c76b4e9e4eaef8a6b3879ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page