MCP server & CLI for controlling windows visually — capture screenshots, OCR text extraction, and keyboard/mouse input
Project description
Visual Window Control
MCP server & CLI for controlling windows visually — capture screenshots, extract text via OCR (Tesseract), and send keyboard/mouse input to any target window. Designed for remote desktop workflows (RDP, etc.) but works with any window.
Requirements
- Windows 10/11
- Python 3.10+
- Tesseract OCR
Installation
# Install Tesseract OCR (via Chocolatey or manual download)
choco install tesseract
# Install the package
pip install -e .
Usage
CLI (vwctl)
# List all visible windows
vwctl list-windows
# Capture and OCR a window (by title)
vwctl -w "Remote Desktop" ocr
# Type text with inline tags
vwctl -w "Remote Desktop" type "ls -la{enter}"
# Send a special key with modifiers
vwctl -w "Remote Desktop" key c -m ctrl
# Click at coordinates relative to window
vwctl -w "Remote Desktop" click 400 300
# Execute a command and read output via OCR
vwctl -w "Remote Desktop" exec "ls -la" -W 2.0
# Capture screenshot to file (default: JPEG quality 85)
vwctl -w "Remote Desktop" capture
# → Saved: 2026-03-07_22-24-00_vwctl.jpg (1920x1080)
# Capture with custom filename (use .png extension for PNG output)
vwctl -w "Remote Desktop" capture -o screen.png
# Capture occluded window without bringing to foreground
# (uses PrintWindow API; may produce black images for hardware-accelerated apps)
vwctl -w "Remote Desktop" capture -b
# Use hwnd instead of title (faster, no search overhead)
vwctl -H 1234567 ocr
# Send input without stealing focus (works with cmd.exe, Git Bash, PuTTY, etc.)
vwctl -w "Command Prompt" -n type "dir{enter}"
Subcommands
| Command | Description |
|---|---|
list-windows |
List all visible windows with hwnd and title |
type TEXT |
Type text with inline {tag} support |
key KEY [-m MOD] |
Send a single key press with optional modifiers |
keys JSON |
Send a key sequence from JSON array |
click X Y [-b] |
Click at position relative to window |
move X Y [-r] |
Move mouse cursor (absolute or relative) |
drag X1 Y1 X2 Y2 |
Drag mouse from start to end position |
scroll AMOUNT |
Scroll mouse wheel (+up, -down) |
capture [-o FILE] [-b] |
Capture window to JPEG file or base64 stdout (.png extension for PNG) |
ocr [-b] |
Capture window and extract text via OCR |
exec CMD [-W SEC] |
Type command, Enter, wait, then OCR output |
Global Options
| Option | Description |
|---|---|
-w, --window TITLE |
Target window by title (partial match) |
-H, --hwnd HWND |
Target window by handle directly |
-c, --config FILE |
Config file path |
-n, --no-focus |
Send input via PostMessage without stealing focus |
Configuration
Settings can be provided via config file, environment variables, or CLI arguments. Priority: CLI args > env vars > config file.
Config File
TOML format. Search order (first found wins):
--config FILE/VWCTL_CONFIGenv var./vwctl.toml(current directory)~/.config/vwctl/config.toml(Linux) /%APPDATA%\vwctl\config.toml(Windows)
Example vwctl.toml:
window = "Remote Desktop"
ocr_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
capture_log_dir = "./captures"
no_focus = false
Environment Variables
| Variable | Description |
|---|---|
VWCTL_WINDOW |
Default target window title |
VWCTL_HWND |
Default target window handle |
VWCTL_OCR_CMD |
Tesseract executable path |
VWCTL_CAPTURE_LOG_DIR |
Default directory for capture output |
VWCTL_NO_FOCUS |
Send input via PostMessage without stealing focus (1/true) |
VWCTL_CONFIG |
Config file path |
MCP Server
Add to your MCP client configuration (e.g. .claude.json):
{
"mcpServers": {
"visual-window-control": {
"type": "stdio",
"command": "mcp-visual-window-control"
}
}
}
The MCP server exposes the same functionality as the CLI as tools: list_windows, set_target_window, get_screen_text, get_screen_image, send_keys, send_special_key, send_key_sequence, click, mouse_move, mouse_drag, mouse_scroll, execute_and_read, list_child_windows, get_focus_info.
Inline Tags (send_keys / type)
Text input supports {tag} syntax for special keys:
"ls -la{enter}" → types "ls -la" then presses Enter
"awk '{print $1}' file.txt{enter}" → braces pass through (not a known tag)
"echo {{enter}}" → types "echo {enter}" (escaped)
"{ctrl+c}" → sends Ctrl+C
Whitelist-based: Only recognized key names are interpreted as tags. Unknown {content} passes through literally, so code with curly braces (awk, Python, shell) works without escaping.
Supported keys: {enter}, {tab}, {escape}, {backspace}, {delete}, {up}, {down}, {left}, {right}, {home}, {end}, {pageup}, {pagedown}, {space}, {f1}–{f12}
Modifiers: {ctrl+c}, {alt+f4}, {shift+tab}
Escaping: {{ → literal {, }} → literal }
Raw Mode
Disable all tag interpretation. Newline characters (\n) are sent as Enter key presses.
# CLI
vwctl -w "Remote Desktop" type -r "echo hello
echo world
"
# MCP: {"text": "echo hello\necho world\n", "raw": true}
Limitations
- Focus stealing: When sending input to the target window, focus is moved to that window by default. This is required for the input to be received by the target application.
- No-focus mode (
-n/--no-focus): An option exists to send input viaPostMessagewithout stealing focus, but this only works with certain native Windows applications (e.g.cmd.exe, Git Bash, PuTTY). Remote desktop applications (RDP, Guacamole, VNC, etc.) do not support no-focus input — they require the window to be focused and in the foreground to receive keyboard/mouse events. - Admin privileges: When the target application runs as admin, the controlling process must also run as admin due to Windows UIPI restrictions.
OCR Tips
- Use monospace fonts (JetBrains Mono, Hack, Fira Code) at 24pt+
- Use high-contrast terminal themes
- Larger window sizes improve accuracy
For LLM Agents
See LLM.md for a CLI reference designed for LLM agents — recommended workflow, command examples, and common patterns.
Tested With
- Windows 11, Python 3.12+, Tesseract 5.x
- Remote Desktop (mstsc.exe)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file visual_window_control-0.1.0.tar.gz.
File metadata
- Download URL: visual_window_control-0.1.0.tar.gz
- Upload date:
- Size: 31.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d41d9f82a57e3ffa34ef17b15de12288122103f7be77a37826e65bb41b5aa81
|
|
| MD5 |
8c466afb336a49d1dbe5f38bb57ed0ea
|
|
| BLAKE2b-256 |
5205104e4e92afc9ddc8435dbbc1f725fe6c2f5d7e5f0fdad2c26c58e9a56f3e
|
Provenance
The following attestation bundles were made for visual_window_control-0.1.0.tar.gz:
Publisher:
publish.yml on sunasaji/visual-window-control
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
visual_window_control-0.1.0.tar.gz -
Subject digest:
7d41d9f82a57e3ffa34ef17b15de12288122103f7be77a37826e65bb41b5aa81 - Sigstore transparency entry: 1057385693
- Sigstore integration time:
-
Permalink:
sunasaji/visual-window-control@e70412f3b9fe6ff273cfc146a6ed2d6e25601b7d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sunasaji
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e70412f3b9fe6ff273cfc146a6ed2d6e25601b7d -
Trigger Event:
release
-
Statement type:
File details
Details for the file visual_window_control-0.1.0-py3-none-any.whl.
File metadata
- Download URL: visual_window_control-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bf21f8eca2e70f70655e8abf6f3fa55f6862978c8b7010cde3fc3926a3144dc
|
|
| MD5 |
06189c976262e09724064879462c26ab
|
|
| BLAKE2b-256 |
f1613bea75a8c5e7d7aafffba5b1222a60d8f2bd23e1eb5e730021a608fcb2a3
|
Provenance
The following attestation bundles were made for visual_window_control-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on sunasaji/visual-window-control
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
visual_window_control-0.1.0-py3-none-any.whl -
Subject digest:
3bf21f8eca2e70f70655e8abf6f3fa55f6862978c8b7010cde3fc3926a3144dc - Sigstore transparency entry: 1057385770
- Sigstore integration time:
-
Permalink:
sunasaji/visual-window-control@e70412f3b9fe6ff273cfc146a6ed2d6e25601b7d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sunasaji
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e70412f3b9fe6ff273cfc146a6ed2d6e25601b7d -
Trigger Event:
release
-
Statement type: