MCP server for controlling VirtualBox VMs — screenshots, keyboard input, PowerShell, vagrant, WinRM, podman, and CI build pipelines
Project description
vm-flightsimulator
A Claude Code plugin that gives AI agents the ability to see and act on real virtual machines, turning them into a unified machine control substrate for validating software across complete workflows.
The plugin provides a collaborative testing partner: describe what you want to test, and AI-driven pilots autonomously work through it with full observability. A real-time dashboard lets you watch progress and steer agents mid-flight.
What This Is
Not a specific-scenario tool. A toolkit for long-running, AI-driven validation work on any virtual machine.
You ask: "Install this app and verify it works end-to-end." The plugin provides the control surface (screenshot, keyboard, mouse, PowerShell, WinRM). Pilots fly the VM, inspectors analyze what happened, and a distributed coordinator keeps everything moving without blocking.
Currently targets VirtualBox + Vagrant. Designed for expansion to AWS, OpenTofu, Cloudflare Workers, bare metal, and beyond.
The Actor Model
Understanding who does what:
Pilots
AI agents that act on VMs. They see (screenshots), type (keyboard/mouse), run commands (PowerShell/SSH), and report outcomes. Follow a tight declare-execute-reflect loop: log intent → perform action → log outcome. Autonomous; do not return to the orchestrator after every action.
Inspectors
AI agents that analyze without touching. Dispatched on-demand to read pilot transcripts, cross-reference recordings, detect stalls, and produce classified reports. Can work even when the pilot is gone — they reconstruct what happened from durable evidence. Enable scheduled monitoring and post-hoc analysis.
Mechanics
Handle the infrastructure layer: cross-VM networking, database setup, VM snapshots, Vagrant provisioning, port forwarding. Run as loadable skills (for quick infra tasks) or dedicated agents (for complex multi-VM setup). Set up the stage; pilots perform on it; inspectors review the performance.
Ground Control
The orchestrator's coordination skill. Dispatches pilots in background, monitors progress via inspectors, never blocks on any single task. Routes work, handles steering, schedules follow-ups.
Dashboard
Real-time UI showing fleet status, job timelines, screenshots, and video playback. Two-way channel: operators watch pilots and can send steering prompts mid-flight. TODO list rendered live — pilots and engineers both add work as needed.
Core Principles
The harness enables, it doesn't do. It provides generic control (keyboard/mouse), coordination (job tracking, scheduling), and observability (recordings, dashboards, step logs). The AI figures out the task-specific steps.
Orchestrator is never blocked. Pilots run asynchronously; communication is file- and database-mediated. When a pilot is stuck, the inspector detects it and reports without interrupting the pilot.
Progressive disclosure. The orchestrator knows WHERE things are (file paths, job IDs), not WHAT they contain (full logs, recordings). Reads only what it needs to make routing decisions. Raw evidence flows through inspectors.
Hook-based steering. Pilots check for incoming messages via a PreToolUse hook on every tool call. Messages from the orchestrator or dashboard inject course corrections in real time. The pilot does not poll — the hook fires automatically.
Onboarding Experience
The plugin opens with a conversation:
"What VM scenarios will we be piloting in this simulator? Let's chat. Tell me what you have already, your ideas, your constraints and goals, and we will plan out the testing system for it."
The AI listens, discovers what infrastructure you have, helps you plan the environment (loading reference skills for Vagrant, Packer, Docker, PostgreSQL, Windows Server, etc.), and then builds it. You never write a Vagrantfile or answer a configuration form.
Current Capabilities
VirtualBox & Vagrant
- VM lifecycle (start, stop, snapshot, restore)
- Provisioning and configuration
- Cross-VM networking
Windows & Linux Interaction
- GUI automation (screenshot, keyboard, mouse) via vision-control
- PowerShell scripts and commands via WinRM
- SSH / Bash on Linux VMs
- Hardware control (keyboard combos, clipboard)
Observation & Coordination
- Session recording (WebM/VP8) via VBoxManage
- Frame extraction at specific timestamps
- Job tracking with step-level audit logs
- Issue register for classified problems
- Real-time dashboard with live video playback
Fleet Management
- Multi-VM coordination
- Job store (durable across sessions)
- Destroy guards (prevent accidents)
- Scheduled task support for unattended operation
Future Direction
The unified control model stays the same; the adapters grow:
- OpenTofu — Infrastructure as code for cloud resources
- AWS — EC2 instances, service discovery, networking
- Cloudflare Workers — Edge compute validation
- Bare metal — Physical machine provisioning and testing
One set of pilot/mechanic/inspector/dashboard code. Different adapters underneath. The plugin becomes a true unified machine control substrate.
Quick Start
1. Install the MCP server
# Run on demand
uvx mcp-vm-blackbox
# Or add persistently to Claude Code
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox
2. Install the plugin
Option A — Marketplace (Claude Code only):
claude plugin marketplace add bitflight-devops/vm-flightsimulator
Then open /plugins and install vm-flightsimulator.
Option B — vm-blackbox-installer (all platforms):
# Install for all platforms globally
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global
# Or pick specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global
# Or install locally to the current project directory
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --local
The installer copies skills and agents to your plugin directory and registers the MCP server.
3. Start collaborating
"I need a two-VM setup with PostgreSQL on Ubuntu and a Java webapp on Windows. Let's plan it."
"Install this app on my-vm and record the full process."
"Check on the installer progress — is it done yet?"
"What went wrong in the last run? Give me a timeline."
The plugin automatically loads the right skill and agent for the task.
Configuration
Paths, durable job state, and optional overrides are controlled by environment variables (for example where .mcp-vm-blackbox resolves on disk). See docs/configuration.md for the full list, data-root resolution order, and troubleshooting notes for operators and AI-assisted debugging.
Prerequisites
| Requirement | Version |
|---|---|
| Python | 3.11+ |
| uv | latest |
| VirtualBox | 7.1+ |
| Vagrant | 2.3+ |
| Packer (for VM builds) | 1.10+ |
| tmux (for detached builds) | any |
| WinRM on guest | for Windows VMs |
Architecture
Three-layer cooperative design:
┌─────────────────────────────────────────────────────────┐
│ Skills │
│ vm-vision-control vm-ground-control vm-radio-control │
│ vm-blackbox-record │
│ (define approved loops and tooling) │
└────────────────────┬────────────────────────────────────┘
│ dispatches
┌────────────────────▼────────────────────────────────────┐
│ Agents │
│ vm-pilot vm-pilot-inspector vm-mechanic │
│ (acts) (observes + reports) (infrastructure) │
└────────────────────┬────────────────────────────────────┘
│ calls
┌────────────────────▼────────────────────────────────────┐
│ MCP Server │
│ vm_screenshot vm_powershell vm_type vm_key │
│ vm_mouse_click vagrant_* ci_* podman_* build_* │
│ (executes against real infrastructure) │
└─────────────────────────────────────────────────────────┘
Skills define the approved loop. They do not take actions.
Agents take actions. Pilots drive VMs. Inspectors read state. Mechanics set up infrastructure.
MCP server executes tool calls against VirtualBox, Vagrant, WinRM, SSH, CI hosts, and container runtimes.
Skills
vm-vision-control — GUI Interaction
The mandatory entry point for any task that touches a VM's desktop. The loop is strict:
1. Screenshot → vm_screenshot
2. Read image → Read tool on the saved_to path
3. Decide → Analyse screen, determine next action
4. Act → vm_mouse_click / vm_type / vm_key / vm_powershell
5. Repeat → Return to step 1
Natural language triggers: "click on the VM", "type into the VM", "what's on the screen", "navigate the installer"
Timing between steps:
| Operation | Wait |
|---|---|
| Click a button | 0.5 – 1 s |
| Open an application | 3 – 5 s |
| Launch an installer | 10 – 15 s |
| Installer panel transition | 2 – 3 s |
| Installer completion | 30 – 60 s |
| VM boot | 60 – 120 s |
Full reference: docs/skills/vm-vision-control.md
vm-ground-control — Orchestration
Use for any operation taking more than ~30 seconds. Dispatches vm-pilot in background and returns a structured block to parse.
agent_id = Task(
description="Run the installer",
subagent_type="vm-pilot",
prompt="""
GOAL: Run silent installer and report success.
STEPS:
1. Invoke vm-vision-control skill.
2. Test-NetConnection <HOST> -Port <PORT>
3. Register-ScheduledTask ...
4. Poll every 30s until State = Ready or 15 min elapsed
5. Read install log; take screenshot.
RETURN FORMAT:
JOB_ID: <uuid>
STATUS: SUCCESS | FAILED | BLOCKED | IN_PROGRESS
SCOPE: <vm_name> / <task_description>
OUTCOME: <2-4 sentences>
ISSUES: <count> (<classifications>) | none
BLOCKED_BY: <description> | —
DETAIL:
steps: <path_to_step_log>
issues: <path_to_issue_register> | none
recording: <path_to_recording> | none
screenshots: <glob_pattern> | none
video: <true | false>
pilot.screen_state: <description>
pilot.files_read: <comma-separated filenames>
NEXT: <recommended action>
""",
run_in_background=True,
)
Store the agent ID — you need it to check progress and resume.
Route on STATUS:
| STATUS | Action |
|---|---|
SUCCESS |
Proceed |
FAILED |
Check ISSUES, fix and re-dispatch |
BLOCKED |
Read BLOCKED_BY, resolve, re-dispatch |
IN_PROGRESS |
Wait and re-check via radio-control |
Full reference: docs/skills/vm-ground-control.md
vm-radio-control — Progress Observation
Check a running pilot without interrupting. Dispatches vm-pilot-inspector to read transcripts and query VM state.
Task(
description="Check installer progress",
subagent_type="vm-pilot-inspector",
prompt="""
output_type: progress
pilot_agent_id: <agent-id-from-ground-control>
vm_name: <vm-name>
project_path: /absolute/path/to/project
""",
run_in_background=False,
)
Output types:
| Type | Returns | Use when |
|---|---|---|
quick |
JOB_ID + STATUS + SCOPE + OUTCOME | Fast pulse, context is tight |
progress |
Full canonical template | Normal progress check (default) |
screenshot |
Report + UI coordinates | Need exact screen state |
transcript |
Report + last 10 turns | Pilot appears stuck |
Full reference: docs/skills/vm-radio-control.md
vm-blackbox-record — Session Recording
Record VM screen as WebM/VP8 video and extract frames.
# Start recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record start "my-vm"
# Run your operation (VM is live)
# Stop recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record stop "my-vm"
# Extract frames — MCP only: vm_extract_frames (see docs/skills/vm-blackbox-record.md)
Recording runs on the host via VirtualBox — no guest changes needed.
Full reference: docs/skills/vm-blackbox-record.md
Agents
vm-pilot
Hands-and-eyes agent. Takes screenshots, runs PowerShell, sends keystrokes. Returns structured results.
Five tools:
| Tool | Does |
|---|---|
vm_screenshot |
Capture screen, return PNG + path |
vm_powershell |
Run PowerShell; return output |
vm_type |
Type text (256 char limit) |
vm_key |
Send enter/tab/escape/space |
vm_info |
Return VM hardware + state |
Follows declare-execute-reflect loop. When blocked, populates ISSUES and BLOCKED_BY and returns STATUS: BLOCKED or FAILED.
Full reference: docs/agents/vm-pilot.md
vm-pilot-inspector
Observer agent. Reads pilot transcript, queries VM state, takes screenshots, analyzes recordings. Returns structured reports.
Never takes control actions. No typing, no key sends, no process invocation.
Full reference: docs/agents/vm-pilot-inspector.md
MCP Tools (28 total)
VM Inspection (4 tools)
vm_list, vm_info, vm_screenshot, vm_screenshot_api
VM Interaction (5 tools)
vm_powershell, vm_type, vm_key, vm_key_combo, vm_mouse_click
VM Lifecycle (3 tools)
vm_start, vm_stop, vm_wait_ready
Vagrant (5 tools)
vagrant_status, vagrant_up, vagrant_provision, vagrant_destroy, vagrant_winrm
Build Orchestration (3 tools)
build_start, build_watch, build_status
CI Tools (4 tools)
ci_check, ci_run, ci_pipeline_status, ci_preflight
Podman / Containers (4 tools)
podman_ps, podman_exec, podman_logs, podman_service_status
KVM Tools (2 tools)
kvm_unload, kvm_reload
Full signatures: docs/mcp-tools.md
Hard Constraints
These are required by the plugin architecture:
-
vm-vision-controlis mandatory before any GUI interaction. Do not callvm_screenshot,vm_mouse_click,vm_type, orvm_keydirectly from the orchestrator. -
MCP is the only approved path. Do not use raw Bash for vagrant, VBoxManage, podman, or WinRM — always go through
mcp__plugin_vm-flightsimulator_vm-blackbox__*tools. -
The orchestrator does not call VM tools while a pilot is running. The pilot owns the VM. Interrupt only by resuming the agent.
-
Skills are scoped. Each skill has a single responsibility. Do not combine vision-control and recording in one invocation.
Conventions
vm_typehas a 256-character limit per call. Chunk long text across multiple calls.- Password fields may double-type. Clear with Ctrl+A → backspace before typing.
vm_mouse_clickandvm_screenshot_apirequiretarget="local"(vboxapi uses local XPCOM only).- Recording parameters lock when enabled — configure before starting, not after.
- The pilot's transcript lives at
~/.claude/projects/<encoded_project_path>/<agent-id>.jsonl.
Skill-to-Task Decision Guide
| You want to... | Use |
|---|---|
| Click a button / type text / read the screen | vm-vision-control |
| Run a multi-step operation (>30 seconds) | vm-ground-control → dispatches vm-pilot |
| Check on a running background task | vm-radio-control → dispatches vm-pilot-inspector |
| Continue a completed pilot with more work | vm-ground-control with resume=agent_id |
| Record an operation as video | vm-blackbox-record |
| Extract frames from a recording | MCP vm_extract_frames |
| Get the current screen without interrupting pilot | vm-radio-control with output_type: screenshot |
Local Development
# Install dependencies
uv sync
# Run all tests
uv run pytest
# Format
uv run ruff format
# Lint
uv run ruff check
# Type check
uv run ty check packages/
# Test the plugin locally in Claude Code
claude --plugin-dir ./
Coverage threshold: 60%. Modules requiring live VMs (WinRM, SSH tunnel, VBoxManage) are excluded from CI coverage.
Installation Reference
# Marketplace
claude plugin marketplace add bitflight-devops/vm-flightsimulator
# MCP server only (PyPI)
uvx mcp-vm-blackbox
# Persistent MCP registration
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox
License
MIT — see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_vm_blackbox-0.19.4.tar.gz.
File metadata
- Download URL: mcp_vm_blackbox-0.19.4.tar.gz
- Upload date:
- Size: 352.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33d3a456b3bd7e72b7f09afa0d028a4a09498379bd19ced2693c74151e651d50
|
|
| MD5 |
c148f24bffd3c8d670eed4eca4aa1dfd
|
|
| BLAKE2b-256 |
bcb82cd723af18edb90bf4ee183e0e1a1d15f8fce6f713ba24f0e27984651fbd
|
File details
Details for the file mcp_vm_blackbox-0.19.4-py3-none-any.whl.
File metadata
- Download URL: mcp_vm_blackbox-0.19.4-py3-none-any.whl
- Upload date:
- Size: 412.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed65454bf8e7bf1ac974b7fbf8ed962b94f4268a7736506acf1b49ce38223f74
|
|
| MD5 |
27fc6ae17c3047b489c2fa0ebb87a2ef
|
|
| BLAKE2b-256 |
4db52f6860c349f717a2500c78e96e248b4494f91856fa680a4375d58cc87e98
|