MCP server for controlling VirtualBox VMs — screenshots, keyboard input, PowerShell, vagrant, WinRM, podman, and CI build pipelines
Project description
vm-flightsimulator
A Claude Code plugin that gives AI agents a complete control surface for VM automation: screenshot-driven GUI interaction, structured background task orchestration, non-intrusive progress observation, and frame-accurate session recording. Backend-agnostic by design — VirtualBox is the current implementation.
What It Does
You tell the agent "install this software on the VM" and the plugin handles everything: starting Vagrant, taking screenshots to verify state, typing into windows, running PowerShell over WinRM, polling scheduled tasks to completion, tailing logs, and returning a structured result you can act on.
The plugin enforces a clear separation of concerns:
Skills → define approved loops and tooling the orchestrator follows
Agents → take actions (vm-pilot) or observe state (vm-pilot-inspector)
MCP → executes tool calls against the actual VM
Nothing is a black box. Every action goes through the loop. Every result comes back structured.
Quick Start
1. Install the MCP server
# Run on demand
uvx mcp-vm-blackbox
# Or add persistently to Claude Code
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox
2. Install the plugin
Option A — Marketplace (Claude Code only):
claude plugin marketplace add bitflight-devops/vm-flightsimulator
Then open /plugins in Claude Code and install vm-flightsimulator.
Option B — vm-blackbox-installer (Claude, OpenCode, Gemini CLI, Codex):
# Install for all platforms, globally (~/.claude, ~/.gemini, etc.)
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global
# Or pick specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global
# Or install locally to the current project directory
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --local
The installer copies skills/ and agents/ to each platform's plugin directory and registers the mcp-vm-blackbox MCP server in the platform's config file. See Installer Reference for full details.
3. Start working
"Take a screenshot of my-vm and describe what's on screen"
"Run the installer on my-vm and tell me when it's done"
"Record the boot sequence of my-vm for the next 2 minutes"
The plugin automatically selects the right skill and agent for each task.
Prerequisites
| Requirement | Version |
|---|---|
| Python | 3.11+ |
| uv | latest |
| VirtualBox | 7.1+ |
| Vagrant | 2.3+ |
| Packer (for VM builds) | 1.10+ |
| tmux (for detached builds) | any |
| WinRM on guest | configured for Windows VMs |
Architecture
The plugin uses a three-layer architecture. Each layer has a single job.
┌─────────────────────────────────────────────────────────┐
│ Skills │
│ vm-vision-control vm-ground-control vm-radio-control │
│ vm-blackbox-record │
│ (define approved loops and tooling) │
└────────────────────┬────────────────────────────────────┘
│ dispatches
┌────────────────────▼────────────────────────────────────┐
│ Agents │
│ vm-pilot vm-pilot-inspector │
│ (acts on the VM) (observes VM state) │
└────────────────────┬────────────────────────────────────┘
│ calls
┌────────────────────▼────────────────────────────────────┐
│ MCP Server │
│ vm_screenshot vm_powershell vm_type vm_key │
│ vm_mouse_click vagrant_* ci_* podman_* build_* │
│ (executes against real infrastructure) │
└─────────────────────────────────────────────────────────┘
Skills are instruction logic — they define the approved loop and tooling the orchestrator follows. They do not take actions directly.
Agents take actions. vm-pilot drives the VM. vm-pilot-inspector reads state without touching anything.
The MCP server (mcp-vm-blackbox on PyPI) executes tool calls. It connects to VirtualBox via vboxapi (local XPCOM) and VBoxManage (local or SSH), to Windows guests via WinRM, and to remote CI hosts via SSH tunnels.
Skills
vm-vision-control — GUI Interaction Loop
The mandatory entry point for any task that touches a VM's desktop. Clicking, typing, reading the screen — all of it goes through this skill first.
The loop is strict:
1. Screenshot → vm_screenshot
2. Read image → Read tool on the saved_to path
3. Decide → Analyse screen, determine next action
4. Act → vm_mouse_click / vm_type / vm_key / vm_powershell
5. Repeat → Return to step 1
Never act without a fresh screenshot. Never skip the read step.
Natural language triggers: "click on the VM", "type into the VM", "what's on the screen", "navigate the installer", "take a screenshot"
Timing to observe between steps:
| Operation | Wait before next screenshot |
|---|---|
| Click a button | 0.5 – 1 s |
| Open an application | 3 – 5 s |
| Launch an installer | 10 – 15 s |
| Installer panel transition | 2 – 3 s |
| Installer completion | 30 – 60 s |
| VM boot | 60 – 120 s |
Full reference: docs/skills/vm-vision-control.md
vm-ground-control — Orchestrator Coordination
Use for any VM operation that will take more than ~30 seconds. Dispatches vm-pilot as a background Task and gives you a structured return block to parse.
agent_id = Task(
description="Run the installer",
subagent_type="vm-pilot",
prompt="""
GOAL:
Run the silent installer via scheduled task and report whether it succeeded.
STEPS:
1. Invoke vm-vision-control skill.
2. Test-NetConnection <HOST> -Port <PORT> -InformationLevel Quiet
3. Register-ScheduledTask ...
4. Poll every 30 s until State = Ready or 15 min elapsed
5. Read the install log
6. Take a screenshot; describe what is on screen.
EVIDENCE TO COLLECT:
- Log file full contents
- Artefact path existence (yes/no)
- Final VM screen description
RETURN FORMAT:
STATUS: SUCCESS | FAILED | PARTIAL | TIMEOUT
SUMMARY: <2-4 sentences>
FILES_READ:
install.log: <contents or "not found">
SCREEN_STATE: <description>
ISSUES: <or "none">
NEXT_STEP: <recommended action>
""",
run_in_background=True,
)
Store the agent ID. You need it for progress checks and resumption.
The pilot owns the VM for the duration. The orchestrator does not call vm_screenshot or vm_powershell while a pilot task is running.
Routing on STATUS:
| STATUS | Meaning | Action |
|---|---|---|
SUCCESS |
Task completed, artefact confirmed | Proceed |
FAILED |
Task failed with known cause | Check ISSUES, fix and re-dispatch |
PARTIAL |
Evidence incomplete | Resume pilot to collect missing evidence |
TIMEOUT |
Poll limit reached | Check SCREEN_STATE + FILES_READ |
Built-in templates: installer via scheduled task, task poll, config file read, network connectivity check.
Full reference: docs/skills/vm-ground-control.md
vm-radio-control — Progress Observer
Check what a running pilot is doing without interrupting it. Dispatches vm-pilot-inspector as a foreground Task that reads the pilot's transcript and queries VM state independently.
Task(
description="Check installer progress",
subagent_type="vm-pilot-inspector",
prompt="""
output_type: progress
pilot_agent_id: <agent-id-from-ground-control>
vm_name: <vm-name>
project_path: /absolute/path/to/project
""",
run_in_background=False,
)
Output types:
output_type |
Collects | Use when |
|---|---|---|
quick |
STATUS + SCREEN_STATE only | Fast pulse check, context is tight |
progress |
Full 6-step report (default) | Normal progress check |
screenshot |
Full report + UI element coordinates | Need to verify exact screen state |
transcript |
Full report + last 10 pilot turns | Pilot appears stuck |
Structured report fields: STATUS, TASK_STATE, LOG_TAIL, PATH_EXISTS, SCREEN_STATE, PILOT_PROGRESS, ELAPSED_ESTIMATE, ISSUES
Route on STATUS only — not on SCREEN_STATE or PILOT_PROGRESS.
Full reference: docs/skills/vm-radio-control.md
vm-blackbox-record — Session Recording
Record VM screen sessions as WebM/VP8 video and extract frames at specific timestamps. Recording runs entirely on the host via VBoxManage — no guest changes required.
# Start recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record start "my-vm"
# → Recording: scratch/recordings/my-vm-20260305-143022-screen0.webm
# Run your operation (VM is live while recording)
# Stop recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record stop "my-vm"
# Extract frames for inspection
uv run skills/vm-blackbox-record/scripts/extract_frames.py \
scratch/recordings/my-vm-20260305-143022-screen0.webm \
--interval 30 \
--outdir /tmp/frames
Frame extraction uses PyAV directly — no ffmpeg binary required.
Recording parameters are locked once recording starts (VirtualBox 7.1 constraint). Configure resolution, bitrate, and frame rate before enabling. vm_capture.py handles the correct sequence automatically.
Backend table:
| Backend | Status | Notes |
|---|---|---|
vboxmanage |
Ready | VirtualBox 7.1+. Default. |
ffmpeg |
Planned | v4l2/X11 capture |
mcp |
Planned | MCP screenshot sequences assembled into video |
winrm |
Planned | PowerShell-based capture over WinRM |
Full reference: docs/skills/vm-blackbox-record.md
Agents
vm-pilot
Hands-and-eyes agent. Takes screenshots, runs PowerShell via WinRM, sends keystrokes, and returns structured results. Dispatched by vm-ground-control.
Exactly five tools:
| Tool | Does |
|---|---|
vm_screenshot |
Capture screen, return base64_png + saved_to |
vm_powershell |
Run PowerShell; return stdout/stderr/exit_code |
vm_type |
Type text (256-char limit per call) |
vm_key |
Send enter/tab/escape/space/backspace |
vm_info |
Return VM hardware metadata and state |
The pilot acts and observes — it never analyses, plans, or recommends. When it cannot proceed, it populates ISSUES in the return block and returns STATUS: PARTIAL or STATUS: FAILED. The orchestrator decides next steps.
Full reference: docs/agents/vm-pilot.md
vm-pilot-inspector
Observer agent. Reads the pilot's transcript, queries VM state via WinRM, takes a screenshot or extracts a recording frame, and returns one structured report. Dispatched by vm-radio-control.
Six-step workflow:
- Read pilot transcript (
~/.claude/projects/<encoded_path>/<agent-id>.jsonl) - Check process or scheduled task state via WinRM
- Tail the 3 most recent log files (last 10 lines each)
- Check sentinel path existence
- Take a live screenshot or extract a recording frame
- Compose and return the structured report
Never takes control actions. No vm_type, no vm_key, no process invocation.
Full reference: docs/agents/vm-pilot-inspector.md
MCP Tools
The vm-blackbox MCP server exposes 23 tools across six domains.
VM Inspection (4 tools)
| Tool | Description |
|---|---|
vm_list |
List all VMs with running state |
vm_info |
Return memory, CPU, VRAM, Guest Additions, state |
vm_screenshot |
Capture screen; image embedded inline in response |
vm_screenshot_api |
Capture via vboxapi (no subprocess, local only) |
VM Interaction (4 tools)
| Tool | Description |
|---|---|
vm_powershell |
Run PowerShell via WinRM; SSH tunnel for remotes |
vm_type |
Type text; 256-char limit; handles chunking |
vm_key |
Send enter/tab/escape/space/backspace |
vm_mouse_click |
Click at absolute coordinates via vboxapi (local only) |
Vagrant (5 tools)
| Tool | Description |
|---|---|
vagrant_status |
Show Vagrantfile VM states |
vagrant_up |
Start a VM |
vagrant_provision |
Run provisioners |
vagrant_destroy |
Destroy a VM (optionally in tmux) |
vagrant_winrm |
Run a command via WinRM on a Windows Vagrant VM |
Build Orchestration (3 tools)
| Tool | Description |
|---|---|
build_start |
Start a background build (Packer, tmux session) |
build_watch |
Tail build log until pattern matches or timeout |
build_status |
Check if build is running; return last N log lines |
CI Tools (4 tools)
| Tool | Description |
|---|---|
ci_check |
SSH connectivity + host stats |
ci_run |
Run a shell command on the CI host |
ci_pipeline_status |
Get GitLab pipeline status for a project |
ci_preflight |
Verify required tools are installed on CI host |
Podman / Containers (4 tools)
| Tool | Description |
|---|---|
podman_ps |
List containers |
podman_exec |
Run a command inside a container |
podman_logs |
Fetch last N log lines |
podman_restart |
Restart a container |
podman_service_status |
Check systemd service status for a Podman service |
Target Parameter
All tools accept a target parameter:
| Value | Connects to |
|---|---|
"local" |
Local host (default) |
"ci" |
Named target from server config |
"user@host" |
Raw SSH string |
"user@host:port" |
Raw SSH string with explicit port |
Full signatures: docs/mcp-tools.md
Skill-to-Task Decision Guide
| You want to... | Use |
|---|---|
| Click a button / type text / read the screen | vm-vision-control |
| Run a multi-step operation (>30 seconds) | vm-ground-control → dispatches vm-pilot |
| Check on a running background task | vm-radio-control → dispatches vm-pilot-inspector |
| Continue a completed pilot with more work | vm-ground-control with resume=agent_id |
| Record an operation as video | vm-blackbox-record |
| Extract frames from a recording | vm-blackbox-record extract_frames.py |
| Get the current screen without interrupting pilot | vm-radio-control with output_type: screenshot |
Hard Constraints
These are not suggestions — they are required by the plugin architecture:
-
vm-vision-controlis mandatory before any GUI interaction. Do not callvm_screenshot,vm_mouse_click,vm_type, orvm_keydirectly from the orchestrator. -
MCP is the only approved path. Do not use raw Bash for vagrant, VBoxManage, podman, or WinRM — always go through
mcp__vm-blackbox__*tools. -
The orchestrator does not call VM tools while a pilot is running. The pilot owns the VM. Interrupt only by resuming the agent.
-
Skills are scoped. Each skill has a single responsibility. Do not combine vision-control and recording in one invocation.
Conventions
vm_typehas a hard 256-character limit per call. Chunk long text across multiple calls.- Password fields may double-type. Clear with Ctrl+A → backspace before typing.
vm_mouse_clickandvm_screenshot_apirequiretarget="local"(vboxapi uses local XPCOM only).- Recording parameters lock when recording is enabled — configure before starting, not after.
- The pilot's transcript lives at
~/.claude/projects/<encoded_project_path>/<agent-id>.jsonl.
Installer Reference
vm-blackbox-installer installs skills, agents, and MCP server registration across AI coding tools in one command.
Supported Platforms
| Flag | Tool | Config directory |
|---|---|---|
--claude |
Claude Code | ~/.claude/ or ./.claude/ |
--opencode |
OpenCode | ~/.config/opencode/ or ./.opencode/ |
--gemini |
Gemini CLI | ~/.gemini/ or ./.gemini/ |
--codex |
Codex | ~/.codex/ or ./.codex/ |
Usage
# Install for all platforms globally
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global
# Install for specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global
# Install locally (current directory instead of home)
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --local
--global and --local are mutually exclusive. --global is the default when neither is specified.
What the Installer Does
For each selected platform:
- Copies
skills/→<config>/plugins/vm-flightsimulator/skills/ - Copies
agents/→<config>/plugins/vm-flightsimulator/agents/ - Registers
{ "command": "uvx", "args": ["mcp-vm-blackbox"] }in the platform's MCP config file
MCP config files written:
| Platform | Config file |
|---|---|
| Claude | ~/.claude.json (global) or .claude.json (local) |
| OpenCode | ~/.config/opencode/opencode.json |
| Gemini | ~/.gemini/settings.json |
| Codex | ~/.codex/config.toml |
Gemini frontmatter transformation: The installer converts agent .md frontmatter for Gemini's schema — removes the color: field and converts comma-separated tools: and skills: strings to YAML list format.
Local Development
# Install dependencies
uv sync
# Run tests
uv run pytest
# Run a single test file
uv run pytest packages/mcp_vm_blackbox/tests/test_vm_interaction.py
# Format
uv run ruff format
# Lint
uv run ruff check
# Type check
uv run ty check packages/
# Test the plugin locally in Claude Code
claude --plugin-dir ./
Coverage threshold: 60%. Modules requiring live VMs (WinRM, SSH tunnel, VBoxManage backends) are excluded from CI coverage.
Installation Reference
# Marketplace
claude plugin marketplace add bitflight-devops/vm-flightsimulator
# MCP server only (PyPI)
uvx mcp-vm-blackbox
# Persistent MCP registration
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox
License
MIT — see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_vm_blackbox-0.3.1.tar.gz.
File metadata
- Download URL: mcp_vm_blackbox-0.3.1.tar.gz
- Upload date:
- Size: 75.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80d0c7eb82c4253891ecc93ef0975a68d4689917508cf0c4c6c63c92d0e8a07f
|
|
| MD5 |
af1f908b8baa07bc913cb604cd4065ed
|
|
| BLAKE2b-256 |
0d9db4b75a0410718c5a30f7c0759370e62d52eaaa1babd7edd8ecc68ba25bf9
|
File details
Details for the file mcp_vm_blackbox-0.3.1-py3-none-any.whl.
File metadata
- Download URL: mcp_vm_blackbox-0.3.1-py3-none-any.whl
- Upload date:
- Size: 84.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c29da8f8cfd27420cb60de5aed335c607b0956ff9692a2bd7e66322f57b55444
|
|
| MD5 |
cedf0007b62542e291aea49ef85c9a21
|
|
| BLAKE2b-256 |
6b1ae468dcbae7253de69fc1cca1d5361ae230b07e146f694de8ceeef070b7da
|