Pixel-level browser automation MCP server for WSL2 — drive a real Chrome with screenshot + xdotool, no CDP.

These details have not been verified by PyPI

Project links

Project description

hermes-computer-use

Scope: Windows 11 + WSL2 Ubuntu 22.04 / 24.04 only. This project intentionally limits its support matrix — native Linux / macOS / Windows are not targets. See docs/WSL_SETUP.md for why and for the full setup walkthrough.

Pixel-level browser automation MCP server. Gives any MCP-speaking agent (hermes-agent, Claude Code, Codex, …) 21 tools to drive a real Chrome browser running in an Xvfb display: screenshots as vision input, OS-level mouse/keyboard as output. No CDP. No navigator.webdriver. No DOM shortcuts.

Think of it as the Linux-side reproduction of Anthropic's computer-use-demo — but exposed over stdio MCP so you can pair it with any agent runtime and any vision-capable model.

agent ── stdio MCP ──▶ hermes_computer_use.server ── subprocess ──▶ xdotool / scrot
                                                                          │
                                                                          ▼
                                                                      Xvfb :99
                                                                          │
                                                          ┌───────────────┴────────────────┐
                                                          ▼                                ▼
                                                    x11vnc :5900              websockify + noVNC :6080
                                                (native VNC clients)            (browser viewer)

See docs/ARCHITECTURE.md for the longer version.

Why

	Playwright / CDP	hermes-computer-use
`navigator.webdriver`	`true` (detectable)	`undefined`
CDP endpoint	open	none
DOM access	direct (fast, brittle to markup changes)	screenshot only (slower, resilient to selector renames)
Anti-bot footprint	large, constantly patched	near-zero: stock Chrome, stock X11 input
Best for	reliable flows on sites you own	agents operating unfamiliar sites like a human

If your automation has to walk a login funnel on a site with Cloudflare, Kasada, or reCAPTCHA sprinkled on it, this stack usually passes where Playwright gets stopped — because the browser is indistinguishable from a stock Chrome driven by a stock X server.

Install

Prerequisites (Windows host): Windows 11, WSL2 with an Ubuntu 22.04 or 24.04 distro, and systemd enabled in WSL. Full walkthrough in docs/WSL_SETUP.md.

Everything below runs inside the WSL shell, not in PowerShell.

git clone https://github.com/Noah3521/hermes-computer-use.git ~/hermes-computer-use
cd ~/hermes-computer-use

# 1. System packages (sudo): Xvfb, fluxbox, x11vnc, xdotool, ydotool, scrot,
#    ImageMagick, CJK fonts, Google Chrome, plus uinput if available.
bash scripts/setup.sh

# 2. Python package
python3 -m venv .venv
. .venv/bin/activate
pip install -e ".[novnc]"       # omit [novnc] if you don't want the web viewer

# 3. Optional browser-based observer at http://localhost:6080/vnc.html
bash scripts/install-novnc.sh

# 4. Persistent services
mkdir -p ~/.config/systemd/user
cp systemd/computer-use.service.example ~/.config/systemd/user/computer-use.service
cp systemd/novnc.service.example        ~/.config/systemd/user/novnc.service
sudo loginctl enable-linger "$USER"
systemctl --user daemon-reload
systemctl --user enable --now computer-use.service novnc.service

Smoke test:

python examples/smoke_test.py

Wire to hermes-agent

Paste config/hermes.yaml.example into your ~/.hermes/config.yaml under mcp_servers:, then hermes gateway run --replace. The model immediately gets the full tool surface.

The same config shape works for any stdio-MCP client (Claude Code, mcp-inspector, custom runners).

Tools

Category	Tools
Status	`screen_info`, `cursor_position`
Capture	`screenshot` (base64 PNG)
Pointer	`move`, `left_click`, `right_click`, `double_click`, `middle_click`, `drag`, `scroll`
Keyboard	`type_text`, `press_key`, `hold_key`
Timing	`wait`
Browser	`open_url`, `new_tab`, `close_tab`, `back`, `forward`, `reload`
Escape hatch	`run_shell`

Full signatures live in src/hermes_computer_use/server.py and are discoverable via MCP tools/list.

Demo prompts

examples/demo_prompts.md ships ten graduated prompts from a 5-second sanity check to a 5-hop Google → external site → SSO-login flow that passes without captchas. Open the noVNC tab while running them — watching the pointer interpolate through Google's search box is surprisingly compelling.

Configuration

All runtime behaviour is controlled by env vars. Sensible defaults everywhere.

Var	Default	Meaning
`CU_DISPLAY`	`99`	X display number
`CU_WIDTH` / `CU_HEIGHT`	`1440` / `900`	Virtual screen size
`CU_VNC_PORT`	`5900`	x11vnc listen port
`CU_STATE_DIR`	`/tmp/hermes-computer-use`	Logs, PID files
`CU_PROFILE_DIR`	`$CU_STATE_DIR/chrome-profile`	Persistent Chrome profile (cookies survive restarts)
`CU_START_URL`	`about:blank`	First URL Chrome opens
`CU_INPUT`	`xdotool`	Set to `ydotool` for kernel `/dev/uinput` input
`CU_KEY_DELAY_MS`	`25`	Inter-keystroke delay
`CU_MOVE_STEPS`	`18`	Interpolation steps for `move(human=True)` and `drag`

Troubleshooting

See docs/TROUBLESHOOTING.md. The usual suspects:

scrot: Can't open X display → Xvfb died. systemctl --user restart computer-use.service.
Chrome immediately exits → sandbox / dev-shm issue. The scripts/display.sh launcher already sets the right flags; if you hand-roll, copy from there.
Stack dies on logout → sudo loginctl enable-linger $USER.
Google flags "unusual traffic" → IP reputation, not behavioural. Use a residential proxy or prewarm with a manual login via VNC.

Security

This is an LLM with hands. Read docs/SECURITY.md before pointing it at anything you care about. At minimum:

Run in an isolated WSL distro or VM — never your daily driver.
Remove the run_shell tool if the agent does not need a shell.
Do not persist real credentials in CU_PROFILE_DIR.

Contributing

See CONTRIBUTING.md. Scope guardrails are strict: no DOM selectors, no OCR, no anti-detection arms race. The thesis is "emit no abnormal signals" > "emit clever evasions".

License

MIT. See LICENSE.

Acknowledgements

anthropic-quickstarts/computer-use-demo for the reference loop.
x11vnc + noVNC for the observer pipeline.
Model Context Protocol for making "tool surface you can point any agent at" a real thing.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hermes_computer_use-0.1.0.tar.gz (27.5 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hermes_computer_use-0.1.0-py3-none-any.whl (10.9 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file hermes_computer_use-0.1.0.tar.gz.

File metadata

Download URL: hermes_computer_use-0.1.0.tar.gz
Upload date: Apr 20, 2026
Size: 27.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for hermes_computer_use-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4544118ffe76c4bc34500e7b715ffa6b4529a0af56827f571ad3d755c1d8a257`
MD5	`0d95a81713182c342d6a73fb26edd4a8`
BLAKE2b-256	`c95a5739d46d098978dc20050d5a54ce8c8b008139d96838973221259f53cfaf`

See more details on using hashes here.

File details

Details for the file hermes_computer_use-0.1.0-py3-none-any.whl.

File metadata

Download URL: hermes_computer_use-0.1.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 10.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for hermes_computer_use-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`159d27c135403451633d79ce4700eb8ae50b287bc7235e2e8333bc14703cb9b2`
MD5	`3e594de980ca161566e7fab8e3c6324b`
BLAKE2b-256	`402feae4bf91c91bfe09fefea778db81f21584322c589cc0e8e55e1f81c6f3c5`

See more details on using hashes here.

hermes-computer-use 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hermes-computer-use

Why

Install

Wire to hermes-agent

Tools

Demo prompts

Configuration

Troubleshooting

Security

Contributing

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes