Local-first MCP server for desktop automation: screenshots, mouse, keyboard
Project description
vadgr-computer-use
Local MCP server for desktop automation. 13 tools for capture, mouse, keyboard, and platform introspection. The calling agent takes a screenshot, reasons over the pixels, and drives mouse/keyboard through the server.
Tested with Claude Code, Codex CLI, and Gemini CLI (same server, same tools, same prompt).
Platforms: works on Linux (X11 and Wayland incl. GNOME), Windows native, and WSL2. macOS support is a work in progress and not usable yet. See Platform support for detail.
Install
pip install vadgr-computer-use
That ships a console script called vadgr-cua. Verify:
vadgr-cua doctor
# {"daemon_running": false, "windows_python": null, "port": 19542, ...}
On WSL2, the bridge daemon auto-launches the first time a tool is called. On other platforms it's a no-op; direct backends handle everything.
Wire it into your agent
Pick your client. The server command is vadgr-cua --transport stdio in every case. Each agent launches that stdio process itself, so it needs the full path to the binary unless vadgr-cua is already on the agent's PATH.
First, find the path:
which vadgr-cua
# global install: /home/you/.local/bin/vadgr-cua
# venv install: /path/to/.venv/bin/vadgr-cua
Substitute that path in each config below.
Claude Code
Project-level (.mcp.json at the repo root you want to automate from):
{
"mcpServers": {
"vadgr-computer-use": {
"type": "stdio",
"command": "/path/to/vadgr-cua",
"args": ["--transport", "stdio"]
}
}
}
User-level (add to ~/.claude.json under mcpServers with the same shape).
Verify: claude mcp list should print vadgr-computer-use: ... ✓ Connected.
Codex CLI
Add to ~/.codex/config.toml:
[mcp_servers.vadgr-computer-use]
command = "/path/to/vadgr-cua"
args = ["--transport", "stdio"]
Verify: codex mcp list should list vadgr-computer-use with status enabled.
Gemini CLI
gemini mcp add --scope user --trust \
vadgr-computer-use /path/to/vadgr-cua \
-- --transport stdio
That writes ~/.gemini/settings.json. Verify by running an interactive session: Gemini shows MCP tool calls inline.
Try it
Once the wire-up is done, any of these commands launch the client, which starts vadgr-cua --transport stdio in the background via MCP, and drives your desktop. Same prompt, same tools: pick the client you already use.
Sanity check (focus + Ctrl+A):
Take a screenshot, tell me in one sentence what application is in focus,
then press Ctrl+A and take another screenshot to confirm the action.
Claude Code
Interactive (most common):
claude --dangerously-skip-permissions
# then paste the prompt at the > cursor
Headless one-shot:
claude --dangerously-skip-permissions -p \
"Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again."
Codex CLI
Headless one-shot (the usual way to drive Codex):
codex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check \
"Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again."
Expected output (abbreviated):
mcp: vadgr-computer-use/screenshot (completed)
mcp: vadgr-computer-use/key_press (completed)
mcp: vadgr-computer-use/screenshot (completed)
The focused app is <...>; Ctrl+A selected its content.
Gemini CLI
Works end-to-end, but pixel grounding on full-screen shots is weaker than Claude/Codex: first-attempt clicks on small targets can miss by 20-60 px (the model usually recovers via screenshot_region crops). Pass the model explicitly, since the default may silently fall back to an older Gemini on some accounts:
gemini -m gemini-3.1-pro-preview -p \
"Use only vadgr-computer-use tools. Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again." \
-y --allowed-mcp-server-names vadgr-computer-use
Fuller example: play a song on YouTube Music (Codex)
A Chrome window is already open with a "YouTube Music" tab. One call:
codex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check \
"Use only vadgr-computer-use MCP tools. In the already-open Chrome,
switch to the YouTube Music tab, search 'Space Oddity David Bowie',
and play the first result."
Real transcript (trimmed):
mcp: vadgr-computer-use/screenshot (completed)
mcp: vadgr-computer-use/click (completed) # YouTube Music tab
mcp: vadgr-computer-use/click (completed) # search box
mcp: vadgr-computer-use/type_text (completed)
mcp: vadgr-computer-use/key_press (completed) # enter
mcp: vadgr-computer-use/click (completed) # first result
mcp: vadgr-computer-use/click (completed) # dismiss ad overlay
mcp: vadgr-computer-use/screenshot (completed) # verify now-playing bar
Yes, "Space Oddity" by David Bowie is now playing.
How it works
The LLM owns the "where to click" decision; the server owns "how to click it precisely". No other abstraction in between.
Platform support
| Platform | Screenshots | Mouse / keyboard | Install notes |
|---|---|---|---|
| Linux / X11 | mss |
xdotool |
apt install xdotool (or distro equivalent) |
| Linux / Wayland (GNOME) | gnome-screenshot |
Mutter RemoteDesktop via jeepney |
nothing extra; pre-installed on stock GNOME, deps pulled by pip |
| Linux / Wayland (Sway, Hyprland, wlroots) | grim |
evdev |
apt install grim; sudo usermod -aG input $USER then re-login |
| Windows native | Win32 GDI | SendInput | nothing extra |
| WSL2 → Windows host | TCP bridge daemon (mss on Windows) |
TCP bridge daemon (Win32 SendInput) |
bridge daemon auto-launches |
| macOS | screencapture |
osascript / cliclick |
WIP, not functional yet |
pip install vadgr-computer-use pulls jeepney and evdev automatically on Linux (both are pure-Python or shipped as wheels, no libdbus-1-dev or compilation needed). Foreground-window detection on Wayland uses AT-SPI2 if available; install with pip install vadgr-computer-use[linux-atspi] to enable it.
If the WSL2 daemon can't start (e.g. no Windows Python available), the server falls back to a slower PowerShell path. See Daemon management below.
MCP tools (13)
Capture (2)
screenshot(): full screen, downscaled toCU_MAX_WIDTH(auto-picks 1024 / 1280 / 1366).screenshot_region(x, y, w, h): cropped region.
Input (8)
click(x, y)/double_click(x, y)/right_click(x, y)move_mouse(x, y)/drag(start_x, start_y, end_x, end_y, duration=0.5)scroll(x, y, amount): positive = up, negative = downtype_text(text)/key_press(keys): keys likectrl+s,alt+tab,enter
Platform info (3)
get_platform()/get_platform_info()/get_screen_size()
Daemon management (WSL2)
Most users never touch this. For when you do:
vadgr-cua doctor # JSON: platform, Windows Python, daemon state, port, hash
vadgr-cua install-daemon # Eager deploy + launch
vadgr-cua stop-daemon # Kill the running daemon
vadgr-cua restart-daemon # Stop then start
The daemon file is deployed to %USERPROFILE%\vadgr\daemon.py and listens on TCP 127.0.0.1:19542. After pip install -U vadgr-computer-use, the next MCP session detects the version-hash drift via a ping handshake and redeploys the daemon automatically.
Library usage
from computer_use import ComputerUseEngine
engine = ComputerUseEngine()
shot = engine.screenshot()
engine.click(500, 300)
engine.type_text("hello")
The library is just the input/capture primitives, no LLM or agent loop inside. To drive it with a model, point an MCP client (Claude Code, Codex, Gemini, or your own) at the vadgr-cua server as shown above.
Environment
| Variable | Purpose |
|---|---|
CU_MAX_WIDTH |
Override screenshot downscale target (default: auto 1024/1280/1366) |
CUE_BRIDGE_PORT |
Override WSL2 bridge daemon TCP port (default: 19542) |
VADGR_DEBUG |
Set to 1 to dump screenshots to <package>/.debug/ |
Tests
pip install -e ".[dev]"
pytest computer_use/tests -q
License
Apache 2.0. See LICENSE.
Part of Vadgr
- vadgr: workflow engine (brain)
- vadgr-computer-use: desktop automation MCP (eyes)
- vadgr-agent-os: containerized agent runtime
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vadgr_computer_use-0.1.2.tar.gz.
File metadata
- Download URL: vadgr_computer_use-0.1.2.tar.gz
- Upload date:
- Size: 66.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61549486075dce405ed61d4b2019c92d7bc54b24e99065a6db33815afa059f73
|
|
| MD5 |
d210107a2e1e99c04a2cdb3c99be1f1b
|
|
| BLAKE2b-256 |
4d61341e54a046ccb091dc2de3402db89dfcbba2018738797c1cfff1c27a7356
|
Provenance
The following attestation bundles were made for vadgr_computer_use-0.1.2.tar.gz:
Publisher:
publish.yml on MONTBRAIN/vadgr-computer-use
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vadgr_computer_use-0.1.2.tar.gz -
Subject digest:
61549486075dce405ed61d4b2019c92d7bc54b24e99065a6db33815afa059f73 - Sigstore transparency entry: 1376396706
- Sigstore integration time:
-
Permalink:
MONTBRAIN/vadgr-computer-use@6055361be2d0551ed4ce6d2b061bb6b3a31ed3a1 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/MONTBRAIN
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6055361be2d0551ed4ce6d2b061bb6b3a31ed3a1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vadgr_computer_use-0.1.2-py3-none-any.whl.
File metadata
- Download URL: vadgr_computer_use-0.1.2-py3-none-any.whl
- Upload date:
- Size: 76.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1451e6ce669f24b55a244517b2f4979fd00bf3093231bdef7034c403c6037d3e
|
|
| MD5 |
d4a44d6c63ca1c604be8e68a0a2e2cc4
|
|
| BLAKE2b-256 |
5616c5e1e0cbb13843b4f9e29df89fd17bd501d7b5bcd0e0cdf4b3f65fbed9a9
|
Provenance
The following attestation bundles were made for vadgr_computer_use-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on MONTBRAIN/vadgr-computer-use
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vadgr_computer_use-0.1.2-py3-none-any.whl -
Subject digest:
1451e6ce669f24b55a244517b2f4979fd00bf3093231bdef7034c403c6037d3e - Sigstore transparency entry: 1376396713
- Sigstore integration time:
-
Permalink:
MONTBRAIN/vadgr-computer-use@6055361be2d0551ed4ce6d2b061bb6b3a31ed3a1 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/MONTBRAIN
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6055361be2d0551ed4ce6d2b061bb6b3a31ed3a1 -
Trigger Event:
push
-
Statement type: