desktop_access_mcp_server
Project description
🖥️ Desktop Access MCP Server
The "Eyes and Hands" for LLM Agents to Control Desktop Environments
A Python-based Model Context Protocol (MCP) server that provides comprehensive desktop access capabilities to LLM agents, enabling them to see and interact with desktop environments through screenshots and input simulation.
🌟 Why Desktop Access MCP Server?
In the era of AI agents, LLMs need more than just text-based interactions. They need to see the desktop through screenshots and interact with applications through keyboard and mouse simulation. This MCP server bridges that gap, providing LLM agents with the "eyes and hands" they need to:
- 🖼️ See the desktop environment through high-quality screenshots
- ⌨️ Type text and execute key combinations
- 🖱️ Control the mouse cursor for navigation and interaction
- 🖥️ Work with single or multi-monitor setups
- 🤖 Automate complex desktop workflows
🚀 Features
👁️ Eyes - Visual Perception
- Full Desktop Screenshots in PNG or JPEG formats
- Multi-Monitor Support - Capture individual monitors or combined view
- Configurable Quality - Adjust JPEG compression for balance of size and quality
- Base64 Encoding - Ready for direct LLM consumption
🖐️ Hands - Input Control
- Keyboard Simulation
- Type text with configurable delays
- Execute key combinations (Ctrl+C, Alt+Tab, etc.)
- Mouse Control
- Move cursor to precise coordinates
- Click, double-click, and right-click
- Scroll vertically
- Drag from one point to another
🛠️ Technical Excellence
- MCP Compliant - Works with any MCP-compatible client
- Cross-Platform - Linux, macOS, and Windows support
- CLI Interface - Test functionality without an LLM agent
- Extensive Logging - Debug and monitor operations
- Error Handling - Graceful degradation with informative errors
📦 Installation
From Locally Built Package (Current Status)
# Build the package
python -m build
# Install the package
pip install dist/desktop_access_mcp_server-0.1.0-py3-none-any.whl
From Source (Development)
git clone https://github.com/your-username/desktop-access-mcp-server.git
cd desktop-access-mcp-server
pip install -e .
Note: The package is ready for PyPI publishing. See
PUBLISHING_INSTRUCTIONS.mdfor details.
🚀 Quick Start
Run the Server
desktop-access-mcp-server
Test with CLI
# Take a screenshot
desktop-cli screenshot -o my_screenshot.png
# Type text
desktop-cli keyboard -t "Hello World" -d 0.1
# Move and click mouse
desktop-cli mouse move -x 100 -y 200
desktop-cli mouse click
🛠️ Usage
With MCP Clients
Once the server is running, connect using any MCP-compliant client:
# Example with Python MCP client
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
server_params = StdioServerParameters(
command="desktop-access-mcp-server"
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Take a screenshot
screenshot = await session.call_tool("take_screenshot", {
"format": "jpeg",
"quality": 85
})
# Type text
await session.call_tool("keyboard_input", {
"text": "Hello from LLM!",
"delay": 0.05
})
Command Line Interface
The package includes a comprehensive CLI for testing:
# Screenshot commands
desktop-cli screenshot -o screenshot.png
desktop-cli screenshot -f jpeg -q 90 -m 1 -o monitor1.jpg
# Keyboard commands
desktop-cli keyboard -t "Type this text" -d 0.1
desktop-cli keyboard -c "ctrl+c"
# Mouse commands
desktop-cli mouse move -x 100 -y 200
desktop-cli mouse click -b left
desktop-cli mouse scroll -a 5
desktop-cli mouse drag --from-x 100 --from-y 100 --to-x 200 --to-y 200
🧰 Tools API
take_screenshot
Capture a screenshot of the desktop for visual understanding.
Parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
format |
string |
Image format: png or jpeg |
png |
quality |
integer |
JPEG quality (1-100) | 85 |
monitor |
integer |
Monitor index (0=all, 1+=specific) | 0 |
Response:
{
"success": true,
"format": "png",
"data": "base64_encoded_image_data",
"size": {
"width": 1920,
"height": 1080
},
"monitor": 0,
"platform": "linux"
}
keyboard_input
Simulate keyboard input to type text or press key combinations.
Parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
text |
string |
Text to type | - |
key_combination |
string |
Key combo (e.g., ctrl+c) |
- |
delay |
number |
Delay between key presses (seconds) | 0.01 |
Response:
{
"success": true,
"action": "type",
"text": "Hello World",
"delay": 0.05
}
mouse_action
Perform mouse actions to interact with the desktop.
Parameters:
| Action | Required Parameters | Optional Parameters |
|---|---|---|
move |
x, y |
- |
click |
- | button (left/right/middle) |
double_click |
- | button |
right_click |
- | button |
scroll |
scroll_amount |
- |
drag |
from_x, from_y, to_x, to_y |
duration |
Response:
{
"success": true,
"action": "move",
"x": 100,
"y": 200
}
🧪 Testing
Automated Tests
# Run all tests
python -m pytest
# Run specific test suites
python test_basic.py
python test_comprehensive.py
python test_screenshot.py
Manual Testing
# Test screenshot functionality
python run_screenshot_tests.py
# Review test results
python review_screenshots.py
📋 Requirements
- Python: 3.8 or higher
- Operating Systems: Linux, macOS, or Windows
- Display Server:
- Linux: X11 or Wayland
- macOS: Aqua
- Windows: Windows Display Driver
- Dependencies:
mcp>=1.0.0Pillow>=9.0.0pynput>=1.7.0mss>=9.0.0
🔧 Troubleshooting
Screenshot Issues
- Linux: Ensure X11/Wayland access
xhost +SI:localuser:$USER
- macOS: Grant screen recording permissions
- Windows: Run as administrator if needed
Permission Errors
# Add user to input group (Linux)
sudo adduser $USER input
# Restart session or reboot after adding to group
Dependency Issues
# Install system dependencies (Ubuntu/Debian)
sudo apt-get install python3-dev python3-pip scrot
# Install system dependencies (CentOS/RHEL)
sudo yum install python3-devel python3-pip scrot
🛠️ Development
Setup Development Environment
# Clone repository
git clone https://github.com/your-username/desktop-access-mcp-server.git
cd desktop-access-mcp-server
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install pytest black flake8
Code Quality
# Format code
black .
# Check code style
flake8 .
# Run tests
python -m pytest
Project Structure
desktop-access-mcp-server/
├── desktop_access_mcp_server/ # Main package
│ ├── __init__.py # Package init
│ ├── __main__.py # MCP server entry point
│ ├── cli.py # Command-line interface
│ └── desktop_controller.py # Core functionality
├── test_*.py # Test files
├── run_screenshot_tests.py # Screenshot test suite
├── review_screenshots.py # Screenshot review tool
├── requirements.txt # Python dependencies
├── pyproject.toml # Package configuration
└── README.md # This file
🤝 Contributing
Contributions are welcome! Here's how you can help:
- Report Bugs: Use the issue tracker to report bugs
- Suggest Features: Request new capabilities
- Submit Pull Requests: Fix bugs or add features
- Improve Documentation: Enhance this README or add guides
Development Guidelines
- Follow PEP 8 style guide
- Write tests for new functionality
- Document public APIs
- Keep dependencies minimal
- Ensure cross-platform compatibility
📚 Resources
- LLM Agent Guide - How LLM agents can use this server
- Testing Guide - Comprehensive testing documentation
- MCP Documentation - Official MCP specification
- Example Usage - Sample implementation
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Model Context Protocol for the standardized interface
- pynput for cross-platform input control
- Pillow for image processing capabilities
- MSS for fast screenshot capture
Built with ❤️ for the AI agent community
Enabling LLMs to see and interact with the world beyond text
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file desktop_access_mcp_server-0.1.0.tar.gz.
File metadata
- Download URL: desktop_access_mcp_server-0.1.0.tar.gz
- Upload date:
- Size: 2.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c5f467b0c290eaacb7c92d5af2d80b65bf52032e47df49de72c5aefb8cd4235
|
|
| MD5 |
69e7f848c40a52842371b23a8b571ed6
|
|
| BLAKE2b-256 |
f4187fe89f18d8ec221520858e6b70aea0c9a120ab3ebcdd5c5db23cafdc73c9
|
File details
Details for the file desktop_access_mcp_server-0.1.0-py3-none-any.whl.
File metadata
- Download URL: desktop_access_mcp_server-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b438212cb3d6270a17f0ac3f86a793ef0f7751376385215c21a475e28e0c217
|
|
| MD5 |
1bc513ff716df584843b7307cabe6429
|
|
| BLAKE2b-256 |
d48571c52faeb70b8b7e21d14b96b5cbcb1493be3e0a377c77569092b747eb64
|
File details
Details for the file desktop_access_mcp_server-0.1.0-py2.py3-none-any.whl.
File metadata
- Download URL: desktop_access_mcp_server-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9be0369c9c0f616740bc242752f995f36b2d16d725c0c7f2812b1b4fa640267d
|
|
| MD5 |
d634519f482cf711e12106723baffdd1
|
|
| BLAKE2b-256 |
0619def3406f9fe8635f123bd90ca386c73eb295636f66c80ec46559b331d69d
|