Skip to main content

ScreenPilot MCP server for screen automation

Project description

Screen Pilot MCP

A Model Context Protocol server that provides screen automation capabilities. This server enables LLMs to control and interact with the screen, keyboard, and mouse, allowing AI to navigate and manipulate graphical user interfaces.

The server provides a consistent interface regardless of the actual screen resolution, with coordinates automatically scaled between the target resolution (1366x768) and the actual screen size.

Available Tools

  • screen_capture - Captures screenshots and provides screen information

    • see_screen(format: str = "PNG"): Takes a screenshot of the current screen
    • get_screen_info(): Returns screen resolution and current mouse position
  • mouse - Controls mouse actions

    • mouse_click(x: int, y: int, button: str = "left", clicks: int = 1, take_screenshot: bool = True, format: str = "PNG"): Moves the mouse to specified coordinates and performs a click
  • keyboard - Controls keyboard inputs

    • keyboard_action(action_type: str, value: str, take_screenshot: bool = True, format: str = "PNG"): Performs keyboard actions (type, press, hotkey)
  • scroll - Controls screen scrolling

    • scroll(direction: str = "down", amount: int = 300, take_screenshot: bool = True, format: str = "PNG"): Scrolls the screen in specified direction
    • scroll_to_position(percent: float = 50, take_screenshot: bool = True, format: str = "PNG"): Scrolls to an approximate position in document
  • element - Detects and waits for screen elements

    • element_exists(image_path: str, confidence: float = 0.9): Checks if an element exists on screen
    • wait_for_element(image_path: str, max_wait_seconds: int = 10, confidence: float = 0.9): Waits for an element to appear
  • action_sequence - Performs sequences of actions

    • perform_actions(actions: List[Dict], take_screenshots: bool = True, format: str = "PNG"): Executes a sequence of mouse and keyboard actions

Prompts

  • use_my_device
    • Provides guidance on proper device interaction sequence

Installation

Using uv (recommended)

uvx screen-pilot-mcp

Using PIP

pip install screen-pilot-mcp

After installation, you can run it as a script using:

python -m screen_pilot_mcp

Configuration

Configure for Claude Desktop

Add to your Claude Desktop config file claude_desktop_config.json:

Using uvx

{
  "mcpServers": {
    "screen-pilot": {
      "command": "uvx",
      "args": ["run", "screen-pilot-mcp"]
    }
  }
}

Using pip installation

{
  "mcpServers": {
    "screen-pilot": {
      "command": "python",
      "args": ["-m", "screen_pilot_mcp"]
    }
  }
}

Example Prompts

Use the screen capture tool to take a screenshot of the current screen. Then analyze what's visible, and help me click the login button on the page.
Take a screenshot, find the search box, type "weather forecast", and press Enter.

Notes

  • Requires Python 3.10 or higher
  • First run may request screen access permissions
  • Do not run multiple instances simultaneously

Contributing

We encourage contributions to help expand and improve screen-pilot-mcp. Whether you want to add new tools, enhance existing functionality, or improve documentation, your input is valuable.

For examples of other MCP servers and implementation patterns, see: https://github.com/modelcontextprotocol/servers

Pull requests are welcome! Feel free to contribute new ideas, bug fixes, or enhancements to make screen-pilot-mcp even more powerful and useful.

License

screen-pilot-mcp is licensed under the MIT License. This means you are free to use, modify, and distribute the software, subject to the terms and conditions of the MIT License. For more details, please see the LICENSE file in the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

screen_pilot_mcp-0.1.1.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

screen_pilot_mcp-0.1.1-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file screen_pilot_mcp-0.1.1.tar.gz.

File metadata

  • Download URL: screen_pilot_mcp-0.1.1.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for screen_pilot_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 64f3e4fa80587d8740785d51d6f5b721085253f5ae5a316ba21e879a7cde22b6
MD5 8bedef9aa4007974092d064fc0c83b20
BLAKE2b-256 d4e3462e00d7643943af963d543366757c1914aaf0d4ce76a337045155d69913

See more details on using hashes here.

File details

Details for the file screen_pilot_mcp-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for screen_pilot_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bf2a0d9704cb1e77dc3ae60e5328e81ac11c1a430bb4b115c69d155c6828a7b5
MD5 66da910dd8ced59b17de3a11deba72ee
BLAKE2b-256 29f9af1c6d84b739f3cb16037cbb3b54a18e40d85a794f96eef7a86117b7ebbf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page