Layout-aware VNC/RFB remote control CLI: type, key, click, screenshot, against any VNC/RFB server.
Project description
vnc-remote-control
A small command-line tool to drive a VNC/RFB server: type text, press named keys, click, and grab a screenshot. It speaks plain RFB (None or VNC-password security, raw encoding) and sends literal keysyms like a standard VNC client, so it works with any VNC/RFB server; the guest keyboard layout is a server-side setting, not something this client compensates for. A layout-aware server such as openvmm is the ideal target (see the keyboard section), but it is not required.
The CLI is styled with rich-click; the RFB protocol and pixel drawing are pure standard library, and screenshots are written as PNG with Pillow. The tool also needs one external program: tesseract, used for OCR. See the Install section.
When this is useful
The connection is to a VNC/RFB server, so nothing runs on the target machine. That makes the tool a good fit when:
- A machine has no network of its own. A VM that isn't on the network yet (or never will be) can still be driven through its hypervisor's VNC console, the same screen you'd click in the web UI. This is handy for the first-boot setup where you configure networking before the guest can reach anything.
- You want Claude to operate a machine with no footprint on it. Nothing is installed on the guest: no agent, no service, no open port on the target, nothing visible in its process list. The control happens entirely over the host's VNC.
- A box is only reachable over VNC. For remote administration where VNC is the one door you have, this drives it the same way a person at the console would.
- Legacy desktop software has only a GUI. Old line-of-business apps, dated installers, and vendor tools often have no API, CLI, or accessibility tree to automate against, only a window. Reading the screen with OCR and clicking by label drives them when nothing else can, and legacy UIs rarely change layout, so the coordinates stay stable.
What it does
typea string into the focused guest field (literal keysyms).keypresses a single named key (enter, tab, esc, arrows, function keys).clicksends a left-button click at a pixel position.screenshotwrites the native-resolution framebuffer to a PNG file, with optional crosshair and grid overlays.ocrlists the words on screen with their click centers and confidence.click-textclicks the first on-screen word matching a pattern.
Install
Where to install it
vnc-remote-control is a pure client: install it only on the machine you drive it
from (your control or development box), never on the targets. It connects out to a
target's VNC/RFB port over the network, directly or through an SSH tunnel to a
console port. Nothing runs on the machine you control: no agent, no service, no open
port on the target, nothing in its process list. So one machine has the tool, and
any number of targets are driven over VNC while staying untouched.
Prerequisite: tesseract
tesseract is a required system dependency. OCR is core to the tool (it is how an
LLM finds what to click via ocr and click-text), so install it first:
# Debian/Ubuntu
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
# Windows
choco install tesseract
The package
With pip:
pip install vnc-remote-control
With uv:
uv tool install vnc-remote-control
From a checkout:
pip install -e .
For alternative install paths (pip, pipx, uv, uvx, source builds), see
INSTALL.md. Every supported method registers the
vnc-remote-control command on your PATH.
Usage
The CLI uses rich-click, so help output and
validation errors render with Rich styling while keeping the familiar click
ergonomics. Every command needs --port. The host defaults to 127.0.0.1.
# type a command and press Return
vnc-remote-control --port 5901 type "chkdsk c: /f" --enter
# type any characters; the server maps them to the guest layout
vnc-remote-control --port 5901 type "user@host"
# press a single key
vnc-remote-control --port 5901 key enter
vnc-remote-control --port 5901 key esc
# click at a pixel position
vnc-remote-control --port 5901 click 640 480
# screenshot to a PNG (prints "resolution: WxH" on stdout)
vnc-remote-control --port 5901 screenshot /tmp/guest.png
# screenshot with a crosshair through a candidate click point
vnc-remote-control --port 5901 screenshot /tmp/guest.png --mark 640,480
# screenshot with a 50px coordinate grid
vnc-remote-control --port 5901 screenshot /tmp/guest.png --grid 50
# list on-screen words with their click centers
vnc-remote-control --port 5901 ocr
vnc-remote-control --port 5901 ocr --grep "Sign in"
# click the first on-screen word matching a pattern
vnc-remote-control --port 5901 click-text "Next"
# against a remote host
vnc-remote-control --host 10.0.0.5 --port 5901 key f8
You can also run it as a module: python -m vnc_remote_control --port 5901 key enter.
Authentication
Servers offering None security (an openvmm/hypervisor console on localhost is the
typical case) need nothing. For a server that requires a VNC password, pass one
with the global --password option; the client then does the standard VNC DES
challenge-response.
A password on the command line is visible in the process list, so prefer the
VNC_REMOTE_CONTROL_PASSWORD environment variable, which --password reads by
default:
# preferred: password via environment
VNC_REMOTE_CONTROL_PASSWORD=secret vnc-remote-control --port 5901 key enter
# or explicitly (visible in `ps`)
vnc-remote-control --port 5901 --password secret screenshot /tmp/guest.png
Apple Remote Desktop and TLS/VeNCrypt auth are not supported.
Timing (sluggish guests)
Each key and click is sent as a down edge, a short delay, then an up edge. The default delays are tuned so a normal guest registers every event, but a sluggish guest (an old desktop, a loaded VM, legacy software that repaints slowly) can drop events typed too fast. There are two ways to slow things down:
-
Quick knob: the global
--delay-scaleoption multiplies every delay. For a guest that misses keystrokes, try doubling them:vnc-remote-control --port 5901 --delay-scale 2 type "slow guest"
-
Per-delay config: the
[vnc]section sets the individual delays (seconds). Set them in a config file, via environment variables, or with--set:vnc-remote-control --port 5901 --set vnc.key_up_gap=0.2 type "hi"
The keys are
key_down_hold,key_up_gap,click_move_gap,click_hold, andclick_release_gap; seeCONFIG.mdand the bundled defaults for the documented values.--delay-scaleapplies on top of whatever the config resolves to.
Driving with an LLM (Claude)
The point of this tool is to let an LLM see the guest screen and click the right place every time. The key fact: RFB pointer coordinates are absolute framebuffer pixels, the same pixels in a native-resolution screenshot. So a coordinate read off the screenshot is the exact coordinate to click. Past "wrong pixel" failures came from reading coordinates off a scaled image; this tool never scales.
The loop:
- Capture and look. Run
screenshot out.png. The command printsresolution: WxHon stdout. Viewout.pngat native size (do not let your viewer downscale it). - Read coordinates directly. Any (x, y) you read off
out.pngis the click coordinate. There is no scale factor to undo. - Verify a coordinate before committing. Run
screenshot out.png --mark X,Yand check the crosshair crosses exactly on the target.--grid 50adds a coordinate grid if you want a ruler. - Click. Either click the verified pixel with
click X Y, or, more reliably, click by label: runocr --grep "Sign in"to get the word's center<cx> <cy>andclick <cx> <cy>, or do it in one shot withclick-text "Sign in". Clicking by label avoids guessing pixels entirely. - Focus, then type. A window you just opened or launched (taskbar search, the
Run box, a freshly launched app) often does NOT have keyboard focus yet.
Click into the target text field first, then
type. Skipping the click is the most common reason text appears to go nowhere: a freshly-opened editor showed nothing typed until it was clicked to focus. Then usetype(with--enter) andkey.
Focus before typing
Typing goes to whatever has keyboard focus, which is not always the window you
just opened. After you open or launch anything (a search result, the Run box, an
application), click into the actual text field before you type. If text seems to
vanish, the field was not focused: screenshot, click the field, and type again.
Keyboard layout (server-side)
The caller never needs to know the guest's keyboard layout. Neither does an LLM driving this tool. You pass the text you want typed, and that is all the knowledge required on this side.
Here is why. There is no local keyboard in the loop: the type argument is
already a string of Unicode characters, and the client puts each character's
code point straight on the wire as its keysym (to type | it sends 0x7C, to
type ä it sends 0xE4), exactly like a standard VNC client. Your own laptop's
layout is irrelevant; the same bytes go out whether you are on a US, German, or
Dvorak keyboard. The client does NOT reverse-map characters or juggle AltGr.
The layout lives in exactly one place: the server. A layout-aware RFB server maps
each keysym to the guest's configured layout. openvmm is the concrete example: its
--vnc-keyboard-layout flag (on Proxmox the ovm shim derives it from the VM's
keyboard: key) tells it the guest's layout, and it works backwards from the
character you asked for to the physical keypress that produces it on that layout.
So as long as the server's configured layout matches the guest's actual layout,
every character types correctly, and there is nothing to detect or compensate for
on this side.
It only goes wrong if the server is told the wrong layout (its setting does not match the guest). The fix is never on the client: correct the server's layout to match the guest. A driving LLM would see the wrong characters land in a screenshot and can flag it, but it does not, and should not, try to compensate per keystroke.
Plainer VNC servers that map keysyms assuming a fixed (often US) layout still work for any text that layout can produce; a layout-aware server such as openvmm is what makes arbitrary characters and non-US layouts type reliably.
Further Documentation
- Install Guide
- Development Handbook
- Contributor Guide
- Security Policy
- Changelog
- Module Reference
- License
AI transparency
This project was built with AI assistance. See ai-disclosure.md for exactly how, and ai-stance.md for the reasoning behind it.
License
MIT. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vnc_remote_control-0.2.1.tar.gz.
File metadata
- Download URL: vnc_remote_control-0.2.1.tar.gz
- Upload date:
- Size: 137.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7ad615d290febecb2c812f6862eb108c27a5633f9314b3909ebf3a1ea0767a2
|
|
| MD5 |
e02496831033375f87370861c9bd7298
|
|
| BLAKE2b-256 |
105dfb483d78980f87c547358d6e674c49a44df4a6d1bb8bc60a161eb3836ccd
|
Provenance
The following attestation bundles were made for vnc_remote_control-0.2.1.tar.gz:
Publisher:
default_release_public.yml on bitranox/vnc-remote-control
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vnc_remote_control-0.2.1.tar.gz -
Subject digest:
b7ad615d290febecb2c812f6862eb108c27a5633f9314b3909ebf3a1ea0767a2 - Sigstore transparency entry: 1912120763
- Sigstore integration time:
-
Permalink:
bitranox/vnc-remote-control@5e03dac08cc8fa474603f513bbcf3e9582877ab3 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/bitranox
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
default_release_public.yml@5e03dac08cc8fa474603f513bbcf3e9582877ab3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vnc_remote_control-0.2.1-py3-none-any.whl.
File metadata
- Download URL: vnc_remote_control-0.2.1-py3-none-any.whl
- Upload date:
- Size: 90.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b38d48f812beeddefe1365875d1f9469df7ff87107cd7c7193767f9579ccb56a
|
|
| MD5 |
e445c4bb697f5bf9ddc41a1b2963d2b7
|
|
| BLAKE2b-256 |
396710e60b313db5ebe47b67aa1c3c4a7d8efc50ce3125d6e9ee170a056a24c6
|
Provenance
The following attestation bundles were made for vnc_remote_control-0.2.1-py3-none-any.whl:
Publisher:
default_release_public.yml on bitranox/vnc-remote-control
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vnc_remote_control-0.2.1-py3-none-any.whl -
Subject digest:
b38d48f812beeddefe1365875d1f9469df7ff87107cd7c7193767f9579ccb56a - Sigstore transparency entry: 1912120949
- Sigstore integration time:
-
Permalink:
bitranox/vnc-remote-control@5e03dac08cc8fa474603f513bbcf3e9582877ab3 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/bitranox
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
default_release_public.yml@5e03dac08cc8fa474603f513bbcf3e9582877ab3 -
Trigger Event:
push
-
Statement type: