Windows desktop-control MCP server: screenshot, window mgmt, mouse/keyboard input, and screen-recording. Config-gated tool groups (observe/window/input/record) -- input disabled by default. Local-only, single-machine.
Project description
desktop-mcp
Windows desktop-control MCP server: screenshot, window management, mouse/keyboard input, and ffmpeg screen-recording -- built to the same standard as mcp-factory and rag-mcp (own pyproject, fastmcp server, honest README, real test suite). Config-gated tool groups, input disabled by default.
Tool groups
| Group | Tools | Default state |
|---|---|---|
observe |
screenshot, list_windows, get_active_window, window_info |
always on |
window |
focus_window, move_resize_window, minimize_window, restore_window |
env-gated, off unless DESKTOP_MCP_ENABLE_WINDOW=1 |
input |
mouse_move, mouse_click, mouse_drag, mouse_scroll, key_press, hotkey, type_text |
env-gated, OFF by default -- requires DESKTOP_MCP_ENABLE_INPUT=1 |
record |
record_start, record_status, record_stop |
env-gated, off unless DESKTOP_MCP_ENABLE_RECORD=1 |
A disabled group returns a structured policy_refusal error (never a silent
no-op, never a crash). Input actions are additionally rate-capped (default 60
actions/min, tunable via DESKTOP_MCP_RATE_LIMIT_PER_MIN) -- exceeding the
cap returns a structured rate_limited error.
This is defense-in-depth: harness-level permission prompts are the first gate, but the server itself refuses input/window/record actions unless its own env explicitly enables them, so a misconfigured or overly-permissive harness can't turn on capabilities the operator didn't opt into for this process.
Honest-capabilities table
Every claim below maps to the file that implements it and the test(s) that verify it -- no capability is asserted without a corresponding implementation + test.
| Claim | Implementation | Verified by |
|---|---|---|
| Capture a screenshot (monitor / region / window) as PNG | desktop_mcp/groups/observe.py::screenshot |
tests/test_observe.py::TestScreenshot, live: tests/test_live_smoke.py::test_live_screenshot_real_png |
| Enumerate windows / get active window / look up by title | desktop_mcp/groups/observe.py |
tests/test_observe.py::TestListWindows, TestGetActiveWindow, TestWindowInfo |
| Focus / move+resize / minimize / restore a window | desktop_mcp/groups/window.py |
tests/test_window.py |
| Mouse move/click/drag/scroll, key press/hotkey, type text | desktop_mcp/groups/input_tools.py |
tests/test_input.py (mocked pyautogui only -- see Limitations) |
| Screen recording via ffmpeg gdigrab, hard duration cap, graceful stop | desktop_mcp/groups/record.py |
tests/test_record.py, live: tests/test_live_smoke.py::test_live_record_real_mp4 |
| Input group OFF by default, structured refusal when disabled | desktop_mcp/config.py::group_enabled, gated |
tests/test_config.py::TestGroupEnabled, tests/test_input.py::TestGateDisabledByDefault |
| Rate cap on input actions (default 60/min) | desktop_mcp/config.py::TokenBucket, RateLimiterRegistry |
tests/test_config.py::TestTokenBucket, tests/test_input.py::TestRateLimit |
| DPI-awareness bootstrap (per-monitor-v2) so mss/pyautogui coords agree on scaled displays | desktop_mcp/config.py::ensure_dpi_awareness |
tests/test_config.py::TestDpiAwareness (idempotency only; visual coord-agreement is not automated -- see Limitations) |
| Coordinate validation against virtual-desktop bounds before any mouse action | desktop_mcp/groups/input_tools.py::_validate_point |
tests/test_input.py (out-of-bounds cases) |
| Orphan-guard: a stale recorder from a crashed process gets killed before a new one starts | desktop_mcp/groups/record.py::_orphan_guard |
tests/test_record.py::TestRecordStart::test_orphan_guard_kills_stale_recorder |
Limitations (read before relying on this)
- UIPI (User Interface Privilege Isolation). A medium-integrity process
(this server, unless you elevate it) cannot send input to or manipulate
windows owned by a higher-integrity (elevated/admin) process. Window and
input tools surface this as a structured
window_action_failed/input_failederror naming UIPI, never a silent no-op -- but there is no workaround short of running the server elevated, which this project does not do or recommend. - DPI scaling. The server sets per-monitor-v2 DPI awareness at startup so
msspixel coordinates andpyautoguipoint coordinates should agree on scaled displays. This bootstrap is unit-tested for idempotency/no-crash only -- actual coordinate agreement on a live multi-DPI multi-monitor setup has not been automated-tested and should be spot-checked if you're targeting a non-100%-scaled monitor. - UAC secure desktop. When Windows switches to the secure desktop (UAC elevation prompts, Ctrl+Alt+Del, lock screen), no process running on the regular desktop -- including this server -- can see or interact with it. Screenshots will show whatever was on the regular desktop before the switch; input calls will not reach the secure desktop at all.
- Single machine, local only. No network transport, no remote control. stdio only, spawned by the MCP host on the same machine.
- Input group is off by default in this repo's own registration. See
~/.claude.json'sdesktop-mcpentry --DESKTOP_MCP_ENABLE_INPUTis not set there. Enabling it is a deliberate per-registration operator choice, not a code change. - pyautogui failsafe.
FAILSAFE=Trueis intentional: slamming the cursor into a screen corner mid-action raises inside pyautogui and aborts the call. This can interrupt an in-flightmouse_drag. Treated as acceptable v1 behavior (see plan's Open questions) -- it's a deliberate human kill-switch, not a bug. - No OCR / vision analysis. Screenshots are raw PNGs; interpreting their content is the consumer's job, not this server's.
- No clipboard tools. Credential-adjacent surface, deferred to a v2 with its own safety design.
- Not registered with the mcp-factory hub. This ships as a standalone repo (own pyproject, own venv-free system-Python312 install), matching the rag-mcp model. Hub/registry integration is a v2 candidate.
Env vars
| Var | Effect | Default |
|---|---|---|
DESKTOP_MCP_ENABLE_WINDOW |
enable the window tool group |
unset (off) |
DESKTOP_MCP_ENABLE_INPUT |
enable the input tool group |
unset (off) |
DESKTOP_MCP_ENABLE_RECORD |
enable the record tool group |
unset (off) |
DESKTOP_MCP_RATE_LIMIT_PER_MIN |
input-group rate cap | 60 |
DESKTOP_MCP_SCRATCH_DIR |
where screenshots/recordings/pidfiles are written | %TEMP%\desktop-mcp-scratch |
DESKTOP_MCP_LIVE |
1 to run real-hardware smoke tests (see Testing) |
unset (skip) |
Usage examples
// A tool call from the MCP host, illustrative -- not a shell command.
{"tool": "screenshot", "arguments": {"monitor": 0}}
// -> {"ok": true, "path": "C:\\Users\\...\\Temp\\desktop-mcp-scratch\\screenshot-....png", "w": 3840, "h": 1080, "monitor": 0}
{"tool": "record_start", "arguments": {"fps": 30, "max_duration_s": 30}}
// -> {"ok": true, "path": "...\\recording-....mp4", "pid": 12345, "fps": 30, "max_duration_s": 30}
{"tool": "record_stop", "arguments": {}}
// -> {"ok": true, "path": "...\\recording-....mp4", "bytes": 800560, "duration_s": 3.13}
// input group disabled (default):
{"tool": "mouse_click", "arguments": {"x": 500, "y": 500}}
// -> {"ok": false, "error": {"type": "policy_refusal", "group": "input", "required_env": "DESKTOP_MCP_ENABLE_INPUT", ...}}
Testing
# unit suite (mocked backends, no real screen/input/recording touched)
python -m pytest -q
# handshake check -- prints every registered tool name
python scripts/list_tools.py
# real-hardware smokes (real screenshot PNG, real ~3s screen recording;
# never input-injection -- see safety rails above)
DESKTOP_MCP_LIVE=1 python -m pytest -q -k live_screenshot
DESKTOP_MCP_LIVE=1 python -m pytest -q -k live_record
Install
pip install -r requirements.txt # or: pip install .
# deps: fastmcp==3.4.2, mss==10.2.0, pyautogui==0.9.54, PyGetWindow==0.0.9, pywin32==312
# also requires ffmpeg + ffprobe on PATH for the record group
Registered in ~/.claude.json as desktop-mcp (stdio, system Python312,
observe+window+record groups enabled, input group absent from env).
Commercial support
Maintained by Jaimen Bell. For production MCP integrations, custom servers, or agent-reliability work, see jaimenbell.dev or sponsor ongoing maintenance via GitHub Sponsors.
mcp-name: io.github.jaimenbell/desktop-mcp
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file desktop_mcp-0.1.1.tar.gz.
File metadata
- Download URL: desktop_mcp-0.1.1.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88245afeee5ae726b08490a765af39cdcdfae588afdc9a51e3792f6263d07633
|
|
| MD5 |
1b1ac420e6f8eb4c85c6f8ed9d6ae88c
|
|
| BLAKE2b-256 |
b348190d956243b9b28c4bc9f72073b3e4c51423c26065bde02b2103cd6979ab
|
File details
Details for the file desktop_mcp-0.1.1-py3-none-any.whl.
File metadata
- Download URL: desktop_mcp-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
762df547e73b1e0b4043a3083cd33ac7547df762ee67562c22b587edba876f93
|
|
| MD5 |
b9e12052a8cdb67493c89875787f5592
|
|
| BLAKE2b-256 |
06b805c1961fa783f038676922bf8ee4d66cd44372f49d296a3b1b5fe982bfac
|