Local-first MCP server for desktop automation: screenshots, mouse, keyboard

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

vadgr-computer-use

Local MCP server for desktop automation. 13 tools for capture, mouse, keyboard, and platform introspection. The calling agent takes a screenshot, reasons over the pixels, and drives mouse/keyboard through the server.

Tested with Claude Code, Codex CLI, and Gemini CLI (same server, same tools, same prompt).

Platforms: works on Linux (X11 and Wayland incl. GNOME), Windows native, WSL2, and macOS. macOS asks for Accessibility and Screen Recording permission on first run; see First run on macOS. See Platform support for detail.

Install

pip install vadgr-computer-use

That ships a console script called vadgr-cua. Verify:

vadgr-cua doctor
# {"daemon_running": false, "windows_python": null, "port": 19542, ...}

On WSL2, the bridge daemon auto-launches the first time a tool is called. On other platforms it's a no-op; direct backends handle everything.

Wire it into your agent

Pick your client. The server command is vadgr-cua --transport stdio in every case. Each agent launches that stdio process itself, so it needs the full path to the binary unless vadgr-cua is already on the agent's PATH.

First, find the path:

which vadgr-cua
# global install: /home/you/.local/bin/vadgr-cua
# venv install:  /path/to/.venv/bin/vadgr-cua

Substitute that path in each config below.

Claude Code

Project-level (.mcp.json at the repo root you want to automate from):

{
  "mcpServers": {
    "vadgr-computer-use": {
      "type": "stdio",
      "command": "/path/to/vadgr-cua",
      "args": ["--transport", "stdio"]
    }
  }
}

User-level (add to ~/.claude.json under mcpServers with the same shape).

Verify: claude mcp list should print vadgr-computer-use: ... ✓ Connected.

Codex CLI

Add to ~/.codex/config.toml:

[mcp_servers.vadgr-computer-use]
command = "/path/to/vadgr-cua"
args = ["--transport", "stdio"]

Verify: codex mcp list should list vadgr-computer-use with status enabled.

Gemini CLI

gemini mcp add --scope user --trust \
  vadgr-computer-use /path/to/vadgr-cua \
  -- --transport stdio

That writes ~/.gemini/settings.json. Verify by running an interactive session: Gemini shows MCP tool calls inline.

Try it

Once the wire-up is done, any of these commands launch the client, which starts vadgr-cua --transport stdio in the background via MCP, and drives your desktop. Same prompt, same tools: pick the client you already use.

Sanity check (focus + Ctrl+A):

Take a screenshot, tell me in one sentence what application is in focus,
then press Ctrl+A and take another screenshot to confirm the action.

Claude Code

Interactive (most common):

claude --dangerously-skip-permissions
# then paste the prompt at the > cursor

Headless one-shot:

claude --dangerously-skip-permissions -p \
  "Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again."

Codex CLI

Headless one-shot (the usual way to drive Codex):

codex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check \
  "Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again."

Expected output (abbreviated):

mcp: vadgr-computer-use/screenshot (completed)
mcp: vadgr-computer-use/key_press (completed)
mcp: vadgr-computer-use/screenshot (completed)
The focused app is <...>; Ctrl+A selected its content.

Gemini CLI

Works end-to-end, but pixel grounding on full-screen shots is weaker than Claude/Codex: first-attempt clicks on small targets can miss by 20-60 px (the model usually recovers via screenshot_region crops). Pass the model explicitly, since the default may silently fall back to an older Gemini on some accounts:

gemini -m gemini-3.1-pro-preview -p \
  "Use only vadgr-computer-use tools. Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again." \
  -y --allowed-mcp-server-names vadgr-computer-use

Fuller example: play a song on YouTube Music (Codex)

A Chrome window is already open with a "YouTube Music" tab. One call:

codex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check \
  "Use only vadgr-computer-use MCP tools. In the already-open Chrome,
   switch to the YouTube Music tab, search 'Space Oddity David Bowie',
   and play the first result."

Real transcript (trimmed):

mcp: vadgr-computer-use/screenshot (completed)
mcp: vadgr-computer-use/click (completed)        # YouTube Music tab
mcp: vadgr-computer-use/click (completed)        # search box
mcp: vadgr-computer-use/type_text (completed)
mcp: vadgr-computer-use/key_press (completed)    # enter
mcp: vadgr-computer-use/click (completed)        # first result
mcp: vadgr-computer-use/click (completed)        # dismiss ad overlay
mcp: vadgr-computer-use/screenshot (completed)   # verify now-playing bar
Yes, "Space Oddity" by David Bowie is now playing.

How it works

The LLM owns the "where to click" decision; the server owns "how to click it precisely". No other abstraction in between.

Platform support

Platform	Screenshots	Mouse / keyboard	Install notes
Linux / X11	`mss`	`xdotool`	`apt install xdotool` (or distro equivalent)
Linux / Wayland (GNOME)	`gnome-screenshot`	Mutter RemoteDesktop via `jeepney`	nothing extra; pre-installed on stock GNOME, deps pulled by pip
Linux / Wayland (Sway, Hyprland, wlroots)	`grim`	`evdev`	`apt install grim`; `sudo usermod -aG input $USER` then re-login
Windows native	Win32 GDI	SendInput	nothing extra
WSL2 → Windows host	TCP bridge daemon (`mss` on Windows)	TCP bridge daemon (Win32 `SendInput`)	bridge daemon auto-launches
macOS	`mss`	Quartz `CGEvent` (via `pyobjc`)	nothing extra; deps pulled by pip. Grant Accessibility + Screen Recording on first run

pip install vadgr-computer-use pulls jeepney and evdev automatically on Linux (both are pure-Python or shipped as wheels, no libdbus-1-dev or compilation needed). Foreground-window detection on Wayland uses AT-SPI2 if available; install with pip install vadgr-computer-use[linux-atspi] to enable it.

On macOS, pip install vadgr-computer-use pulls pyobjc-framework-Quartz and pyobjc-framework-ApplicationServices (wheel install, no compilation). No Homebrew packages required.

First run on macOS

You can pre-grant permissions before connecting an agent:

vadgr-cua setup

That fires the Accessibility and Screen Recording prompts and prints the current grant state as JSON. Toggle the entries on in System Settings when prompted. If you skip this, the same prompts fire on the first MCP tool call from your agent.

The first time the MCP server captures the screen or injects an input event, macOS opens System Settings to two panes and asks you to grant the running Python interpreter:

Privacy & Security -> Screen Recording (required for screenshot() / screenshot_region()).
Privacy & Security -> Accessibility (required for clicks, typing, scroll, drag).

Toggle both for the python binary that runs vadgr-cua (e.g. /path/to/.venv/bin/python or /opt/homebrew/bin/python3.12). The grant is per-interpreter and persists; you will not be asked again. Verify status:

vadgr-cua doctor
# {... "macos_accessibility_granted": true, "macos_screen_recording_granted": true,
#      "python_executable": "/opt/homebrew/bin/python3.12" }

Apple enforces these prompts at the OS level for every screen-capture / input-injection API; they cannot be skipped.

If you later revoke either permission in System Settings, the next MCP tool call detects it via CGPreflightScreenCaptureAccess() / AXIsProcessTrusted(), opens System Settings to the right pane, and returns a structured error to the agent. Toggle the entry back on and the next call works. No silent black screenshots, no hunting through System Settings.

If the WSL2 daemon can't start (e.g. no Windows Python available), the server falls back to a slower PowerShell path. See Daemon management below.

MCP tools (13)

Capture (2)

screenshot(): full screen, downscaled to CU_MAX_WIDTH (auto-picks 1024 / 1280 / 1366).
screenshot_region(x, y, w, h): cropped region.

Input (8)

click(x, y) / double_click(x, y) / right_click(x, y)
move_mouse(x, y) / drag(start_x, start_y, end_x, end_y, duration=0.5)
scroll(x, y, amount): positive = up, negative = down
type_text(text) / key_press(keys): keys like ctrl+s, alt+tab, enter

Platform info (3)

get_platform() / get_platform_info() / get_screen_size()

Daemon management (WSL2)

Most users never touch this. For when you do:

vadgr-cua doctor           # JSON: platform, Windows Python, daemon state, port, hash
vadgr-cua install-daemon   # Eager deploy + launch
vadgr-cua stop-daemon      # Kill the running daemon
vadgr-cua restart-daemon   # Stop then start

The daemon file is deployed to %USERPROFILE%\vadgr\daemon.py and listens on TCP 127.0.0.1:19542. After pip install -U vadgr-computer-use, the next MCP session detects the version-hash drift via a ping handshake and redeploys the daemon automatically.

Library usage

from computer_use import ComputerUseEngine

engine = ComputerUseEngine()
shot = engine.screenshot()
engine.click(500, 300)
engine.type_text("hello")

The library is just the input/capture primitives, no LLM or agent loop inside. To drive it with a model, point an MCP client (Claude Code, Codex, Gemini, or your own) at the vadgr-cua server as shown above.

Environment

Variable	Purpose
`CU_MAX_WIDTH`	Override screenshot downscale target (default: auto 1024/1280/1366)
`CUE_BRIDGE_PORT`	Override WSL2 bridge daemon TCP port (default: 19542)
`VADGR_DEBUG`	Set to `1` to dump screenshots to `<package>/.debug/`

Tests

pip install -e ".[dev]"
pytest computer_use/tests -q

License

Apache 2.0. See LICENSE.

Part of Vadgr

vadgr: workflow engine (brain)
vadgr-computer-use: desktop automation MCP (eyes)
vadgr-agent-os: containerized agent runtime

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

santiagomd11

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

Apr 26, 2026

This version

0.1.4

Apr 26, 2026

0.1.3

Apr 25, 2026

0.1.2

Apr 25, 2026

0.1.1

Apr 22, 2026

0.1.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vadgr_computer_use-0.1.4.tar.gz (70.3 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vadgr_computer_use-0.1.4-py3-none-any.whl (80.2 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file vadgr_computer_use-0.1.4.tar.gz.

File metadata

Download URL: vadgr_computer_use-0.1.4.tar.gz
Upload date: Apr 26, 2026
Size: 70.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vadgr_computer_use-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`6b887e26dc5b3dac9c49a5377fdc77c871338b4c01d115c763172e48b04cf79f`
MD5	`3107b9cc1752e6b6c5d357110e5c79fd`
BLAKE2b-256	`a1e2982d97ff5aeba6e8f37faad11533fdf6b80d945fd69b73f052ba1a7f06fa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vadgr_computer_use-0.1.4.tar.gz:

Publisher: publish.yml on MONTBRAIN/vadgr-computer-use

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vadgr_computer_use-0.1.4.tar.gz
- Subject digest: 6b887e26dc5b3dac9c49a5377fdc77c871338b4c01d115c763172e48b04cf79f
- Sigstore transparency entry: 1391460709
- Sigstore integration time: Apr 26, 2026
Source repository:
- Permalink: MONTBRAIN/vadgr-computer-use@47c545658668d946ba6ea69f69fa2f2ef59ef083
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/MONTBRAIN
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@47c545658668d946ba6ea69f69fa2f2ef59ef083
- Trigger Event: push

File details

Details for the file vadgr_computer_use-0.1.4-py3-none-any.whl.

File metadata

Download URL: vadgr_computer_use-0.1.4-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 80.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vadgr_computer_use-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5ae107ff0bb878956e87e6761a335276a1ba79f9b492070486e688230cf2cc32`
MD5	`c97b967e5547706b406510e349f59cfe`
BLAKE2b-256	`d329327c056672d4e76bd4693dac07d0ca5d102afd7fea01afae746f35a526ba`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vadgr_computer_use-0.1.4-py3-none-any.whl:

Publisher: publish.yml on MONTBRAIN/vadgr-computer-use

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vadgr_computer_use-0.1.4-py3-none-any.whl
- Subject digest: 5ae107ff0bb878956e87e6761a335276a1ba79f9b492070486e688230cf2cc32
- Sigstore transparency entry: 1391460710
- Sigstore integration time: Apr 26, 2026
Source repository:
- Permalink: MONTBRAIN/vadgr-computer-use@47c545658668d946ba6ea69f69fa2f2ef59ef083
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/MONTBRAIN
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@47c545658668d946ba6ea69f69fa2f2ef59ef083
- Trigger Event: push

vadgr-computer-use 0.1.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

vadgr-computer-use

Install

Wire it into your agent

Claude Code

Codex CLI

Gemini CLI

Try it

Claude Code

Codex CLI

Gemini CLI

Fuller example: play a song on YouTube Music (Codex)

How it works

Platform support

First run on macOS

MCP tools (13)

Daemon management (WSL2)

Library usage

Environment

Tests

License

Part of Vadgr

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance