Skip to main content

MCP server & CLI for controlling windows visually — capture screenshots, OCR text extraction, and keyboard/mouse input

Project description

Visual Window Control

MCP server & CLI for controlling windows visually — capture screenshots, extract text via OCR (Tesseract), and send keyboard/mouse input to any target window. Designed for remote desktop workflows (RDP, etc.) but works with any window.

Requirements

Installation

# Install Tesseract OCR (via Chocolatey or manual download)
choco install tesseract

# Install the package
pip install -e .

Usage

CLI (vwctl)

# List all visible windows
vwctl list-windows

# Capture and OCR a window (by title)
vwctl -w "Remote Desktop" ocr

# Type text with inline tags
vwctl -w "Remote Desktop" type "ls -la{enter}"

# Send a special key with modifiers
vwctl -w "Remote Desktop" key c -m ctrl

# Click at coordinates relative to window
vwctl -w "Remote Desktop" click 400 300

# Execute a command and read output via OCR
vwctl -w "Remote Desktop" exec "ls -la" -W 2.0

# Capture screenshot to file (default: JPEG quality 85)
vwctl -w "Remote Desktop" capture
# → Saved: 2026-03-07_22-24-00_vwctl.jpg (1920x1080)

# Capture with custom filename (use .png extension for PNG output)
vwctl -w "Remote Desktop" capture -o screen.png

# Capture occluded window without bringing to foreground
# (uses PrintWindow API; may produce black images for hardware-accelerated apps)
vwctl -w "Remote Desktop" capture -b

# Use hwnd instead of title (faster, no search overhead)
vwctl -H 1234567 ocr

# Send input without stealing focus (works with cmd.exe, Git Bash, PuTTY, etc.)
vwctl -w "Command Prompt" -n type "dir{enter}"

Subcommands

Command Description
list-windows List all visible windows with hwnd and title
type TEXT Type text with inline {tag} support
key KEY [-m MOD] Send a single key press with optional modifiers
keys JSON Send a key sequence from JSON array
click X Y [-b] Click at position relative to window
move X Y [-r] Move mouse cursor (absolute or relative)
drag X1 Y1 X2 Y2 Drag mouse from start to end position
scroll AMOUNT Scroll mouse wheel (+up, -down)
capture [-o FILE] [-b] Capture window to JPEG file or base64 stdout (.png extension for PNG)
ocr [-b] Capture window and extract text via OCR
exec CMD [-W SEC] Type command, Enter, wait, then OCR output

Global Options

Option Description
-w, --window TITLE Target window by title (partial match)
-H, --hwnd HWND Target window by handle directly
-c, --config FILE Config file path
-n, --no-focus Send input via PostMessage without stealing focus

Configuration

Settings can be provided via config file, environment variables, or CLI arguments. Priority: CLI args > env vars > config file.

Config File

TOML format. Search order (first found wins):

  1. --config FILE / VWCTL_CONFIG env var
  2. ./vwctl.toml (current directory)
  3. ~/.config/vwctl/config.toml (Linux) / %APPDATA%\vwctl\config.toml (Windows)

Example vwctl.toml:

window = "Remote Desktop"
ocr_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
capture_log_dir = "./captures"
no_focus = false

Environment Variables

Variable Description
VWCTL_WINDOW Default target window title
VWCTL_HWND Default target window handle
VWCTL_OCR_CMD Tesseract executable path
VWCTL_CAPTURE_LOG_DIR Default directory for capture output
VWCTL_NO_FOCUS Send input via PostMessage without stealing focus (1/true)
VWCTL_CONFIG Config file path

MCP Server

Add to your MCP client configuration (e.g. .claude.json):

{
  "mcpServers": {
    "visual-window-control": {
      "type": "stdio",
      "command": "mcp-visual-window-control"
    }
  }
}

The MCP server exposes the same functionality as the CLI as tools: list_windows, set_target_window, get_screen_text, get_screen_image, send_keys, send_special_key, send_key_sequence, click, mouse_move, mouse_drag, mouse_scroll, execute_and_read, list_child_windows, get_focus_info.

Inline Tags (send_keys / type)

Text input supports {tag} syntax for special keys:

"ls -la{enter}"                     → types "ls -la" then presses Enter
"awk '{print $1}' file.txt{enter}"  → braces pass through (not a known tag)
"echo {{enter}}"                    → types "echo {enter}" (escaped)
"{ctrl+c}"                          → sends Ctrl+C

Whitelist-based: Only recognized key names are interpreted as tags. Unknown {content} passes through literally, so code with curly braces (awk, Python, shell) works without escaping.

Supported keys: {enter}, {tab}, {escape}, {backspace}, {delete}, {up}, {down}, {left}, {right}, {home}, {end}, {pageup}, {pagedown}, {space}, {f1}{f12}

Modifiers: {ctrl+c}, {alt+f4}, {shift+tab}

Escaping: {{ → literal {, }} → literal }

Raw Mode

Disable all tag interpretation. Newline characters (\n) are sent as Enter key presses.

# CLI
vwctl -w "Remote Desktop" type -r "echo hello
echo world
"

# MCP: {"text": "echo hello\necho world\n", "raw": true}

Limitations

  • Focus stealing: When sending input to the target window, focus is moved to that window by default. This is required for the input to be received by the target application.
  • No-focus mode (-n / --no-focus): An option exists to send input via PostMessage without stealing focus, but this only works with certain native Windows applications (e.g. cmd.exe, Git Bash, PuTTY). Remote desktop applications (RDP, Guacamole, VNC, etc.) do not support no-focus input — they require the window to be focused and in the foreground to receive keyboard/mouse events.
  • Admin privileges: When the target application runs as admin, the controlling process must also run as admin due to Windows UIPI restrictions.

OCR Tips

  • Use monospace fonts (JetBrains Mono, Hack, Fira Code) at 24pt+
  • Use high-contrast terminal themes
  • Larger window sizes improve accuracy

For LLM Agents

See LLM.md for a CLI reference designed for LLM agents — recommended workflow, command examples, and common patterns.

Tested With

  • Windows 11, Python 3.12+, Tesseract 5.x
  • Remote Desktop (mstsc.exe)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visual_window_control-0.1.0.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

visual_window_control-0.1.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file visual_window_control-0.1.0.tar.gz.

File metadata

  • Download URL: visual_window_control-0.1.0.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for visual_window_control-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7d41d9f82a57e3ffa34ef17b15de12288122103f7be77a37826e65bb41b5aa81
MD5 8c466afb336a49d1dbe5f38bb57ed0ea
BLAKE2b-256 5205104e4e92afc9ddc8435dbbc1f725fe6c2f5d7e5f0fdad2c26c58e9a56f3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for visual_window_control-0.1.0.tar.gz:

Publisher: publish.yml on sunasaji/visual-window-control

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file visual_window_control-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for visual_window_control-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3bf21f8eca2e70f70655e8abf6f3fa55f6862978c8b7010cde3fc3926a3144dc
MD5 06189c976262e09724064879462c26ab
BLAKE2b-256 f1613bea75a8c5e7d7aafffba5b1222a60d8f2bd23e1eb5e730021a608fcb2a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for visual_window_control-0.1.0-py3-none-any.whl:

Publisher: publish.yml on sunasaji/visual-window-control

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page