desktop_access_mcp_server

Project description

🖥️ Desktop Access MCP Server

The "Eyes and Hands" for LLM Agents to Control Desktop Environments

A Python-based Model Context Protocol (MCP) server that provides comprehensive desktop access capabilities to LLM agents, enabling them to see and interact with desktop environments through screenshots and input simulation.

Features • Quick Start • Usage • API • Development • License

🌟 Why Desktop Access MCP Server?

In the era of AI agents, LLMs need more than just text-based interactions. They need to see the desktop through screenshots and interact with applications through keyboard and mouse simulation. This MCP server bridges that gap, providing LLM agents with the "eyes and hands" they need to:

🖼️ See the desktop environment through high-quality screenshots
⌨️ Type text and execute key combinations
🖱️ Control the mouse cursor for navigation and interaction
🖥️ Work with single or multi-monitor setups
🤖 Automate complex desktop workflows

🚀 Features

👁️ Eyes - Visual Perception

Full Desktop Screenshots in PNG or JPEG formats
Multi-Monitor Support - Capture individual monitors or combined view
Configurable Quality - Adjust JPEG compression for balance of size and quality
Base64 Encoding - Ready for direct LLM consumption

🖐️ Hands - Input Control

Keyboard Simulation
- Type text with configurable delays
- Execute key combinations (Ctrl+C, Alt+Tab, etc.)
Mouse Control
- Move cursor to precise coordinates
- Click, double-click, and right-click
- Scroll vertically
- Drag from one point to another

🛠️ Technical Excellence

MCP Compliant - Works with any MCP-compatible client
Cross-Platform - Linux, macOS, and Windows support
CLI Interface - Test functionality without an LLM agent
Extensive Logging - Debug and monitor operations
Error Handling - Graceful degradation with informative errors

📦 Installation

From Locally Built Package (Current Status)

# Build the package
python -m build

# Install the package
pip install dist/desktop_access_mcp_server-0.1.0-py3-none-any.whl

From Source (Development)

git clone https://github.com/your-username/desktop-access-mcp-server.git
cd desktop-access-mcp-server
pip install -e .

Note: The package is ready for PyPI publishing. See PUBLISHING_INSTRUCTIONS.md for details.

🚀 Quick Start

Run the Server

Method 1: Direct execution (after installation)

desktop-access-mcp-server

Method 2: Using uvx (without installation)

uvx --from desktop-access-mcp-server desktop-access-mcp-server

Test with CLI

# Take a screenshot
desktop-cli screenshot -o my_screenshot.png

# Type text
desktop-cli keyboard -t "Hello World" -d 0.1

# Move and click mouse
desktop-cli mouse move -x 100 -y 200
desktop-cli mouse click

🛠️ Usage

With MCP Clients

Once the server is running, connect using any MCP-compliant client:

Method 1: Direct execution (after installation)

# Example with Python MCP client
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

server_params = StdioServerParameters(
    command="desktop-access-mcp-server"
)

async with stdio_client(server_params) as (read, write):
    async with ClientSession(read, write) as session:
        # Take a screenshot
        screenshot = await session.call_tool("take_screenshot", {
            "format": "jpeg",
            "quality": 85
        })
        
        # Type text
        await session.call_tool("keyboard_input", {
            "text": "Hello from LLM!",
            "delay": 0.05
        })

Method 2: Using uvx (without installation)

# Example with Python MCP client
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

server_params = StdioServerParameters(
    command=["uvx", "--from", "desktop-access-mcp-server", "desktop-access-mcp-server"]
)

async with stdio_client(server_params) as (read, write):
    async with ClientSession(read, write) as session:
        # Take a screenshot
        screenshot = await session.call_tool("take_screenshot", {
            "format": "jpeg",
            "quality": 85
        })
        
        # Type text
        await session.call_tool("keyboard_input", {
            "text": "Hello from LLM!",
            "delay": 0.05
        })

Command Line Interface

The package includes a comprehensive CLI for testing:

# Screenshot commands
desktop-cli screenshot -o screenshot.png
desktop-cli screenshot -f jpeg -q 90 -m 1 -o monitor1.jpg

# Keyboard commands
desktop-cli keyboard -t "Type this text" -d 0.1
desktop-cli keyboard -c "ctrl+c"

# Mouse commands
desktop-cli mouse move -x 100 -y 200
desktop-cli mouse click -b left
desktop-cli mouse scroll -a 5
desktop-cli mouse drag --from-x 100 --from-y 100 --to-x 200 --to-y 200

🧰 Tools API

`take_screenshot`

Capture a screenshot of the desktop for visual understanding.

Parameters:

Parameter	Type	Description	Default
`format`	`string`	Image format: `png` or `jpeg`	`png`
`quality`	`integer`	JPEG quality (1-100)	`85`
`monitor`	`integer`	Monitor index (0=all, 1+=specific)	`0`

Response:

{
  "success": true,
  "format": "png",
  "data": "base64_encoded_image_data",
  "size": {
    "width": 1920,
    "height": 1080
  },
  "monitor": 0,
  "platform": "linux"
}

`keyboard_input`

Simulate keyboard input to type text or press key combinations.

Parameters:

Parameter	Type	Description	Default
`text`	`string`	Text to type	-
`key_combination`	`string`	Key combo (e.g., `ctrl+c`)	-
`delay`	`number`	Delay between key presses (seconds)	`0.01`

Response:

{
  "success": true,
  "action": "type",
  "text": "Hello World",
  "delay": 0.05
}

`mouse_action`

Perform mouse actions to interact with the desktop.

Parameters:

Action	Required Parameters	Optional Parameters
`move`	`x`, `y`	-
`click`	-	`button` (`left`/`right`/`middle`)
`double_click`	-	`button`
`right_click`	-	`button`
`scroll`	`scroll_amount`	-
`drag`	`from_x`, `from_y`, `to_x`, `to_y`	`duration`

Response:

{
  "success": true,
  "action": "move",
  "x": 100,
  "y": 200
}

🧪 Testing

Automated Tests

# Run all tests
python -m pytest

# Run specific test suites
python test_basic.py
python test_comprehensive.py
python test_screenshot.py

Manual Testing

# Test screenshot functionality
python run_screenshot_tests.py

# Review test results
python review_screenshots.py

📋 Requirements

Python: 3.8 or higher
Operating Systems: Linux, macOS, or Windows
Display Server:
- Linux: X11 or Wayland
- macOS: Aqua
- Windows: Windows Display Driver
Dependencies:
- mcp>=1.0.0
- Pillow>=9.0.0
- pynput>=1.7.0
- mss>=9.0.0

🔧 Troubleshooting

Screenshot Issues

Linux: Ensure X11/Wayland access
```
xhost +SI:localuser:$USER
```
macOS: Grant screen recording permissions
Windows: Run as administrator if needed

Permission Errors

# Add user to input group (Linux)
sudo adduser $USER input

# Restart session or reboot after adding to group

Dependency Issues

# Install system dependencies (Ubuntu/Debian)
sudo apt-get install python3-dev python3-pip scrot

# Install system dependencies (CentOS/RHEL)
sudo yum install python3-devel python3-pip scrot

🛠️ Development

Setup Development Environment

# Clone repository
git clone https://github.com/your-username/desktop-access-mcp-server.git
cd desktop-access-mcp-server

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install pytest black flake8

Code Quality

# Format code
black .

# Check code style
flake8 .

# Run tests
python -m pytest

Project Structure

desktop-access-mcp-server/
├── desktop_access_mcp_server/     # Main package
│   ├── __init__.py               # Package init
│   ├── __main__.py               # MCP server entry point
│   ├── cli.py                    # Command-line interface
│   └── desktop_controller.py     # Core functionality
├── test_*.py                     # Test files
├── run_screenshot_tests.py       # Screenshot test suite
├── review_screenshots.py         # Screenshot review tool
├── requirements.txt              # Python dependencies
├── pyproject.toml                # Package configuration
└── README.md                     # This file

⚙️ JSON Configuration for MCP Clients

Claude Desktop Configuration

To use this server with Claude Desktop, add the following to your MCP configuration file:

{
  "mcpServers": {
    "desktop-access": {
      "command": "uvx",
      "args": [
        "--from",
        "desktop-access-mcp-server",
        "desktop-access-mcp-server"
      ]
    }
  }
}

If you have installed the package locally, you can use the direct command instead:

{
  "mcpServers": {
    "desktop-access": {
      "command": "desktop-access-mcp-server"
    }
  }
}

Generic MCP Client Configuration

For other MCP clients that use JSON configuration files, the general format is:

{
  "mcpServers": {
    "desktop-access": {
      "command": "uvx",
      "args": [
        "--from",
        "desktop-access-mcp-server",
        "desktop-access-mcp-server"
      ]
    }
  }
}

Configuration File Locations

Claude Desktop: ~/Library/Application Support/Claude/mcp-config.json (macOS) or %APPDATA%\Claude\mcp-config.json (Windows)
Generic MCP Clients: Check your client's documentation for the configuration file location

Available Tools

Once configured, the following tools will be available to your MCP client:

take_screenshot - Capture desktop screenshots in PNG or JPEG format
keyboard_input - Simulate keyboard typing and key combinations
mouse_action - Control mouse movements, clicks, and scrolling

Example Usage in Claude

After configuration, you can ask Claude to:

"Take a screenshot of my desktop"
"Type 'Hello World' in the current application"
"Click on the center of the screen"
"Press Ctrl+C to copy selected text"

🤝 Contributing

Contributions are welcome! Here's how you can help:

Report Bugs: Use the issue tracker to report bugs
Suggest Features: Request new capabilities
Submit Pull Requests: Fix bugs or add features
Improve Documentation: Enhance this README or add guides

Development Guidelines

Follow PEP 8 style guide
Write tests for new functionality
Document public APIs
Keep dependencies minimal
Ensure cross-platform compatibility

📚 Resources

LLM Agent Guide - How LLM agents can use this server
Claude Desktop Configuration Guide - Detailed instructions for Claude Desktop setup
Testing Guide - Comprehensive testing documentation
MCP Documentation - Official MCP specification
Example Usage - Sample implementation

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Model Context Protocol for the standardized interface
pynput for cross-platform input control
Pillow for image processing capabilities
MSS for fast screenshot capture

🧾 License

This project is licensed under the MIT License. See the LICENSE file for details.

📫 How to Reach Me

🔗 Other Projects

Explore more projects by the author:

TrendMaster - Advanced trend analysis tool for traders.
hjalgos_notebooks - Jupyter notebooks for hjAlgos strategies.
Zerodha-Brokerage-Calculator - A calculator for Zerodha brokerage fees.
TeleTest - Telegram bot for testing trading signals.
Tradingview-Webhook-Manager - Manage TradingView webhooks effectively.
Algotrading_Multi_account_Modern_UI - Modern UI for managing multiple algotrading accounts.
pyPortMan - Python portfolio manager for tracking investments.

🤝 Sponsorship

This project is sponsored by hjLabs.

Built with ❤️ for the AI agent community

Enabling LLMs to see and interact with the world beyond text

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Aug 23, 2025

0.1.0

Aug 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

desktop_access_mcp_server-0.1.1.tar.gz (2.7 MB view details)

Uploaded Aug 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

desktop_access_mcp_server-0.1.1-py3-none-any.whl (18.9 kB view details)

Uploaded Aug 23, 2025 Python 3

File details

Details for the file desktop_access_mcp_server-0.1.1.tar.gz.

File metadata

Download URL: desktop_access_mcp_server-0.1.1.tar.gz
Upload date: Aug 23, 2025
Size: 2.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-requests/2.32.3

File hashes

Hashes for desktop_access_mcp_server-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`0d9d592bcb22f6d052c940b0c41032d66169c6901ae87daec2437e3537a3a7e6`
MD5	`f734eac17abd12dd4290fc12ac7731f6`
BLAKE2b-256	`aee29afedeb2f9be4370f9820b8695fdd79bb1e3153481338384e9b3bf4e3af6`

See more details on using hashes here.

File details

Details for the file desktop_access_mcp_server-0.1.1-py3-none-any.whl.

File metadata

Download URL: desktop_access_mcp_server-0.1.1-py3-none-any.whl
Upload date: Aug 23, 2025
Size: 18.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-requests/2.32.3

File hashes

Hashes for desktop_access_mcp_server-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eef0b574b3e95e526155a36eb8beb1046e8f70aa4c6906632a4e444658c5ddb6`
MD5	`534d4504313338d541d2bf87b93e0f05`
BLAKE2b-256	`26cef01f3aef062444b0b7df0739eb41ad8c9f604ccce5abf0ce83cfed3bd56c`

See more details on using hashes here.

desktop_access_mcp_server 0.1.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

🖥️ Desktop Access MCP Server

🌟 Why Desktop Access MCP Server?

🚀 Features

👁️ Eyes - Visual Perception

🖐️ Hands - Input Control

🛠️ Technical Excellence

📦 Installation

From Locally Built Package (Current Status)

From Source (Development)

🚀 Quick Start

Run the Server

Method 1: Direct execution (after installation)

Method 2: Using uvx (without installation)

Test with CLI

🛠️ Usage

With MCP Clients

Method 1: Direct execution (after installation)

Method 2: Using uvx (without installation)

Command Line Interface

🧰 Tools API

take_screenshot

keyboard_input

mouse_action

🧪 Testing

Automated Tests

Manual Testing

📋 Requirements

🔧 Troubleshooting

Screenshot Issues

Permission Errors

Dependency Issues

🛠️ Development

Setup Development Environment

Code Quality

Project Structure

⚙️ JSON Configuration for MCP Clients

Claude Desktop Configuration

Generic MCP Client Configuration

Configuration File Locations

Available Tools

Example Usage in Claude

🤝 Contributing

Development Guidelines

📚 Resources

📄 License

🙏 Acknowledgments

🧾 License

📫 How to Reach Me

🔗 Other Projects

🤝 Sponsorship

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`take_screenshot`

`keyboard_input`

`mouse_action`