MCP server & CLI for controlling windows visually — capture screenshots, OCR text extraction, and keyboard/mouse input

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Visual Window Control

MCP server & CLI for controlling windows visually — capture screenshots, extract text via OCR (Tesseract), and send keyboard/mouse input to any target window. Designed for remote desktop workflows (RDP, etc.) but works with any window.

Requirements

Windows 10/11
Python 3.10+
Tesseract OCR

Installation

# Install Tesseract OCR (via Chocolatey or manual download)
choco install tesseract

# Install the package
pip install -e .

Usage

CLI (`vwctl`)

# List all visible windows
vwctl list-windows

# Capture and OCR a window (by title)
vwctl -w "Remote Desktop" ocr

# Type text with inline tags
vwctl -w "Remote Desktop" type "ls -la{enter}"

# Type from stdin (pipe or streaming)
echo "ls -la{enter}" | vwctl -w "Remote Desktop" type

# Send a special key with modifiers
vwctl -w "Remote Desktop" key c -m ctrl

# Send a key with custom delay (wait 800ms after key press)
vwctl -w "Remote Desktop" key f -m alt -d 800

# Send a key sequence with per-step timing control (delay_ms in ms)
vwctl -w "Remote Desktop" keys '[{"key":"tab"},{"key":"enter","delay_ms":500}]'

# Click at coordinates relative to window
vwctl -w "Remote Desktop" click 400 300

# Execute a command and read output via OCR
vwctl -w "Remote Desktop" exec "ls -la" -W 2.0

# Capture screenshot to file (default: JPEG quality 85)
vwctl -w "Remote Desktop" capture
# → Saved: 2026-03-07_22-24-00_vwctl.jpg (1920x1080)

# Capture with custom JPEG quality 1-95 (default: 85)
vwctl -w "Remote Desktop" capture -q 60

# Capture with custom filename (use .png extension for PNG output)
vwctl -w "Remote Desktop" capture -o screen.png

# Capture occluded window without bringing to foreground
# (uses PrintWindow API; may produce black images for hardware-accelerated apps)
vwctl -w "Remote Desktop" capture -b

# Use hwnd instead of title (faster, no search overhead)
vwctl -H 1234567 ocr

# Send input without stealing focus (works with cmd.exe, Git Bash, PuTTY, etc.)
vwctl -w "Command Prompt" -n type "dir{enter}"

Subcommands

Command	Description
`list-windows`	List all visible windows with hwnd and title
`type [TEXT] [-f FILE]`	Type text with inline `{tag}` support (reads from stdin if omitted; `-f` to read from file, `-f -` for explicit stdin)
`key KEY [-m MOD] [-d MS]`	Send a single key press with optional modifiers and delay
`keys JSON`	Send a key sequence from JSON array (per-step `key`, `modifiers`, `delay_ms`)
`click X Y [-b]`	Click at position relative to window
`move X Y [-r]`	Move mouse cursor (absolute or relative)
`drag X1 Y1 X2 Y2`	Drag mouse from start to end position
`scroll AMOUNT`	Scroll mouse wheel (+up, -down)
`capture [-o FILE] [-q Q] [-b]`	Capture window to JPEG file or base64 stdout (`.png` extension for PNG)
`ocr [-b]`	Capture window and extract text via OCR
`exec CMD [-W SEC]`	Type command, Enter, wait, then OCR output

Global Options

Option	Description
`-w, --window TITLE`	Target window by title (partial match)
`-H, --hwnd HWND`	Target window by handle directly
`-c, --config FILE`	Config file path
`-n, --no-focus`	Send input via PostMessage without stealing focus

Configuration

Settings can be provided via config file, environment variables, or CLI arguments. Priority: CLI args > env vars > config file.

Config File

TOML format. Search order (first found wins):

--config FILE / VWCTL_CONFIG env var
./vwctl.toml (current directory)
~/.config/vwctl/config.toml (Linux) / %APPDATA%\vwctl\config.toml (Windows)

Example vwctl.toml:

window = "Remote Desktop"
ocr_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
capture_log_dir = "./captures"
jpeg_quality = 85
no_focus = false

Environment Variables

Variable	Description
`VWCTL_WINDOW`	Default target window title
`VWCTL_HWND`	Default target window handle
`VWCTL_OCR_CMD`	Tesseract executable path
`VWCTL_CAPTURE_LOG_DIR`	Default directory for capture output
`VWCTL_JPEG_QUALITY`	JPEG quality 1-95 (default: 85)
`VWCTL_NO_FOCUS`	Send input via PostMessage without stealing focus (`1`/`true`)
`VWCTL_CONFIG`	Config file path

MCP Server

Add to your MCP client configuration (e.g. .claude.json):

{
  "mcpServers": {
    "visual-window-control": {
      "type": "stdio",
      "command": "mcp-visual-window-control"
    }
  }
}

The MCP server exposes the same functionality as the CLI as tools: list_windows, set_target_window, get_screen_text, get_screen_image, send_keys, send_special_key, send_key_sequence, click, mouse_move, mouse_drag, mouse_scroll, execute_and_read, list_child_windows, get_focus_info.

send_keys and send_key_sequence automatically detect focus loss: if the target window loses foreground focus during input, the operation is aborted and the tool returns an "Aborted: target window lost focus (sent X/Y ...)" message instead of the normal result.

Inline Tags (`send_keys` / `type`)

Text input supports {tag} syntax for special keys:

"ls -la{enter}"                     → types "ls -la" then presses Enter
"awk '{print $1}' file.txt{enter}"  → braces pass through (not a known tag)
"echo {{enter}}"                    → types "echo {enter}" (escaped)
"{ctrl+c}"                          → sends Ctrl+C

Whitelist-based: Only recognized key names are interpreted as tags. Unknown {content} passes through literally, so code with curly braces (awk, Python, shell) works without escaping.

Supported keys: {enter}, {tab}, {escape}, {backspace}, {delete}, {up}, {down}, {left}, {right}, {home}, {end}, {pageup}, {pagedown}, {space}, {f1}–{f12}

Modifiers: {ctrl+c}, {alt+f4}, {shift+tab}

Escaping: {{ → literal {, }} → literal }

Supported Characters

Each mode accepts a specific set of characters. Text containing unsupported characters (e.g. escape sequences, null bytes) will be rejected with an error before any keystrokes are sent.

Mode	Accepted characters	Special keys
Tag mode (default for text arg)	Printable characters (U+0020–U+007E, U+0080+)	Via `{tag}` syntax: `{enter}`, `{tab}`, `{ctrl+c}`, etc.
Raw mode (`-r`, default for stdin/file)	Printable characters + `\t` (Tab) + line endings (`\n`, `\r\n`, `\r` → Enter)	None (modifier combos like Ctrl+C not available)

Choosing a mode: Use raw mode (-r) for multi-line or long text input where modifier key combinations are not needed. Use tag mode for interactive sequences that require special keys or modifiers.

Sending arbitrary data: If your text contains control characters or escape sequences (e.g. ANSI codes), encode it as base64 and decode on the remote side:

# Encode locally, type via raw mode, decode on remote
base64 -w0 binary_file.dat | vwctl -H HWND type -f -
# Then on the remote side: echo "<pasted>" | base64 -d > file
# Or as a single pipeline command:
echo "echo '$(base64 -w0 binary_file.dat)' | base64 -d > /tmp/file{enter}" | vwctl -H HWND type -t

Raw Mode

Disable all tag interpretation. Line endings (\n, \r\n, \r) are sent as Enter key presses, and tab characters (\t) are sent as Tab. For multi-line or long text input where modifier key combinations (e.g. {ctrl+c}) are not needed, raw mode (-r) is recommended.

# CLI
vwctl -w "Remote Desktop" type -r "echo hello
echo world
"

# MCP: {"text": "echo hello\necho world\n", "raw": true}

Stdin and File Input

When the text argument is omitted, type reads from stdin line by line. Use --file/-f to read from a file, or -f - for explicit stdin.

Stdin and file input default to raw mode (no tag interpretation), since the typical use case is piping file/program output. Use -t/--tags to enable tag interpretation for these sources.

# Pipe from another command (raw by default)
echo "ls -la" | vwctl -w "Remote Desktop" type

# Explicit stdin with "-f -"
cat commands.txt | vwctl -w "Remote Desktop" type -f -

# Read from a file directly (raw by default)
vwctl -w "Remote Desktop" type -f commands.txt

# File input with tag interpretation
vwctl -w "Remote Desktop" type -f commands.txt -t

# Streaming (line-by-line as data arrives)
tail -f commands.fifo | vwctl -w "Remote Desktop" type

If both a text argument and stdin are present, the text argument wins (stdin is ignored).

-r/--raw and -t/--tags are mutually exclusive.

Focus Loss Detection

The type and keys commands (and MCP send_keys / send_key_sequence tools) monitor whether the target window remains in the foreground during input. If another window takes focus, input is immediately aborted:

# type command
Aborted: target window lost focus (typed 42 characters)

# keys command
Aborted: target window lost focus (sent 2/5 key steps)

This prevents keystrokes from being sent to an unintended window. Focus checking is disabled in no-focus mode (-n for CLI, no_focus: true for MCP).

Key Delay (`delay_ms`)

After each key press, vwctl waits for a configurable delay before proceeding to the next action. This gives the target application time to process the keystroke (especially important for remote desktop apps, menus, and GUI transitions).

Default delays (when delay_ms is not specified):

Context	Default delay
`key` / `keys` commands (focus mode)	600 ms
`key` / `keys` commands (no-focus mode, `-n`)	100 ms
Inline `{tag}` in `type` command	100 ms
Plain text in `type` command	20 ms (per character)

Overriding the delay:

keys command (CLI): set delay_ms per step in the JSON array.

vwctl -H HWND keys '[{"key":"alt+f","delay_ms":800},{"key":"s","delay_ms":200}]'

send_special_key MCP tool: set the delay_ms parameter directly.
send_key_sequence MCP tool: set delay_ms per step in the steps array.
key command (CLI): set --delay/-d in milliseconds.
```
vwctl -H HWND key f -m alt -d 800
```

When to adjust: Increase the delay for slow UI transitions (e.g. menu opening, dialog loading). Decrease it for fast sequential keypresses where the default 600 ms is too slow.

Limitations

Focus stealing: When sending input to the target window, focus is moved to that window by default. This is required for the input to be received by the target application.
No-focus mode (-n / --no-focus): An option exists to send input via PostMessage without stealing focus, but this only works with certain native Windows applications (e.g. cmd.exe, Git Bash, PuTTY). Remote desktop applications (RDP, Guacamole, VNC, etc.) do not support no-focus input — they require the window to be focused and in the foreground to receive keyboard/mouse events.
Admin privileges: When the target application runs as admin, the controlling process must also run as admin due to Windows UIPI restrictions.

OCR Tips

Use monospace fonts (JetBrains Mono, Hack, Fira Code) at 24pt+
Use high-contrast terminal themes
Larger window sizes improve accuracy

For LLM Agents

See LLM.md for a CLI reference designed for LLM agents — recommended workflow, command examples, and common patterns.

Tested With

Windows 11, Python 3.12+, Tesseract 5.x
Remote Desktop (mstsc.exe)

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sunasaji

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Mar 20, 2026

0.2.0

Mar 13, 2026

0.1.0

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visual_window_control-0.3.0.tar.gz (41.2 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

visual_window_control-0.3.0-py3-none-any.whl (29.5 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file visual_window_control-0.3.0.tar.gz.

File metadata

Download URL: visual_window_control-0.3.0.tar.gz
Upload date: Mar 20, 2026
Size: 41.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for visual_window_control-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`0378a2677aecb6a53b81080b26bc00748caae05ac82f00cae59ff123952a3667`
MD5	`5511797c3ea03f9fc26ec7f42e1f73ce`
BLAKE2b-256	`4e4affb6eeb66baf9286648b23d3f43e31b8ffa9cad85f4ada5e3570990e7a3a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for visual_window_control-0.3.0.tar.gz:

Publisher: publish.yml on sunasaji/visual-window-control

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: visual_window_control-0.3.0.tar.gz
- Subject digest: 0378a2677aecb6a53b81080b26bc00748caae05ac82f00cae59ff123952a3667
- Sigstore transparency entry: 1148180173
- Sigstore integration time: Mar 20, 2026
Source repository:
- Permalink: sunasaji/visual-window-control@57b2a411865f5fec682716cf3745ecd99ac21b7e
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/sunasaji
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@57b2a411865f5fec682716cf3745ecd99ac21b7e
- Trigger Event: release

File details

Details for the file visual_window_control-0.3.0-py3-none-any.whl.

File metadata

Download URL: visual_window_control-0.3.0-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 29.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for visual_window_control-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`91339f5e2b5fe2f80b6a06f1fb283790eff916f416fd599be782253bdc680300`
MD5	`b09f20c3428a4c5de780d14cfdec0aad`
BLAKE2b-256	`64d60a688644c262b967fde819dd184cf09b96d79fa061650e4809e10d439c31`

See more details on using hashes here.

Provenance

The following attestation bundles were made for visual_window_control-0.3.0-py3-none-any.whl:

Publisher: publish.yml on sunasaji/visual-window-control

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: visual_window_control-0.3.0-py3-none-any.whl
- Subject digest: 91339f5e2b5fe2f80b6a06f1fb283790eff916f416fd599be782253bdc680300
- Sigstore transparency entry: 1148180638
- Sigstore integration time: Mar 20, 2026
Source repository:
- Permalink: sunasaji/visual-window-control@57b2a411865f5fec682716cf3745ecd99ac21b7e
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/sunasaji
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@57b2a411865f5fec682716cf3745ecd99ac21b7e
- Trigger Event: release

visual-window-control 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Visual Window Control

Requirements

Installation

Usage

CLI (vwctl)

Subcommands

Global Options

Configuration

Config File

Environment Variables

MCP Server

Inline Tags (send_keys / type)

Supported Characters

Raw Mode

Stdin and File Input

Focus Loss Detection

Key Delay (delay_ms)

Limitations

OCR Tips

For LLM Agents

Tested With

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

CLI (`vwctl`)

Inline Tags (`send_keys` / `type`)

Key Delay (`delay_ms`)