Skip to main content

MCP server for controlling VirtualBox VMs — screenshots, keyboard input, PowerShell, vagrant, WinRM, podman, and CI build pipelines

Project description

vm-flightsimulator

PyPI version Python 3.11+ License: MIT Tests

A Claude Code plugin that gives AI agents the ability to see and act on real virtual machines, turning them into a unified machine control substrate for validating software across complete workflows.

The plugin provides a collaborative testing partner: describe what you want to test, and AI-driven pilots autonomously work through it with full observability. A real-time dashboard lets you watch progress and steer agents mid-flight.


What This Is

Not a specific-scenario tool. A toolkit for long-running, AI-driven validation work on any virtual machine.

You ask: "Install this app and verify it works end-to-end." The plugin provides the control surface (screenshot, keyboard, mouse, PowerShell, WinRM). Pilots fly the VM, inspectors analyze what happened, and a distributed coordinator keeps everything moving without blocking.

Currently targets VirtualBox + Vagrant. Designed for expansion to AWS, OpenTofu, Cloudflare Workers, bare metal, and beyond.


The Actor Model

Understanding who does what:

Pilots

AI agents that act on VMs. They see (screenshots), type (keyboard/mouse), run commands (PowerShell/SSH), and report outcomes. Follow a tight declare-execute-reflect loop: log intent → perform action → log outcome. Autonomous; do not return to the orchestrator after every action.

Inspectors

AI agents that analyze without touching. Dispatched on-demand to read pilot transcripts, cross-reference recordings, detect stalls, and produce classified reports. Can work even when the pilot is gone — they reconstruct what happened from durable evidence. Enable scheduled monitoring and post-hoc analysis.

Mechanics

Handle the infrastructure layer: cross-VM networking, database setup, VM snapshots, Vagrant provisioning, port forwarding. Run as loadable skills (for quick infra tasks) or dedicated agents (for complex multi-VM setup). Set up the stage; pilots perform on it; inspectors review the performance.

Ground Control

The orchestrator's coordination skill. Dispatches pilots in background, monitors progress via inspectors, never blocks on any single task. Routes work, handles steering, schedules follow-ups.

Dashboard

Real-time UI showing fleet status, job timelines, screenshots, and video playback. Two-way channel: operators watch pilots and can send steering prompts mid-flight. TODO list rendered live — pilots and engineers both add work as needed.


Core Principles

The harness enables, it doesn't do. It provides generic control (keyboard/mouse), coordination (job tracking, scheduling), and observability (recordings, dashboards, step logs). The AI figures out the task-specific steps.

Orchestrator is never blocked. Pilots run asynchronously; communication is file- and database-mediated. When a pilot is stuck, the inspector detects it and reports without interrupting the pilot.

Progressive disclosure. The orchestrator knows WHERE things are (file paths, job IDs), not WHAT they contain (full logs, recordings). Reads only what it needs to make routing decisions. Raw evidence flows through inspectors.

Hook-based steering. Pilots check for incoming messages via a PreToolUse hook on every tool call. Messages from the orchestrator or dashboard inject course corrections in real time. The pilot does not poll — the hook fires automatically.


Onboarding Experience

The plugin opens with a conversation:

"What VM scenarios will we be piloting in this simulator? Let's chat. Tell me what you have already, your ideas, your constraints and goals, and we will plan out the testing system for it."

The AI listens, discovers what infrastructure you have, helps you plan the environment (loading reference skills for Vagrant, Packer, Docker, PostgreSQL, Windows Server, etc.), and then builds it. You never write a Vagrantfile or answer a configuration form.


Current Capabilities

VirtualBox & Vagrant

  • VM lifecycle (start, stop, snapshot, restore)
  • Provisioning and configuration
  • Cross-VM networking

Windows & Linux Interaction

  • GUI automation (screenshot, keyboard, mouse) via vision-control
  • PowerShell scripts and commands via WinRM
  • SSH / Bash on Linux VMs
  • Hardware control (keyboard combos, clipboard)

Observation & Coordination

  • Session recording (WebM/VP8) via VBoxManage
  • Frame extraction at specific timestamps
  • Job tracking with step-level audit logs
  • Issue register for classified problems
  • Real-time dashboard with live video playback

Fleet Management

  • Multi-VM coordination
  • Job store (durable across sessions)
  • Destroy guards (prevent accidents)
  • Scheduled task support for unattended operation

Future Direction

The unified control model stays the same; the adapters grow:

  • OpenTofu — Infrastructure as code for cloud resources
  • AWS — EC2 instances, service discovery, networking
  • Cloudflare Workers — Edge compute validation
  • Bare metal — Physical machine provisioning and testing

One set of pilot/mechanic/inspector/dashboard code. Different adapters underneath. The plugin becomes a true unified machine control substrate.


Quick Start

1. Install the MCP server

# Run on demand
uvx mcp-vm-blackbox

# Or add persistently to Claude Code
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox

2. Install the plugin

Option A — Marketplace (Claude Code only):

claude plugin marketplace add bitflight-devops/vm-flightsimulator

Then open /plugins and install vm-flightsimulator.

Option B — vm-blackbox-installer (all platforms):

# Install for all platforms globally
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global

# Or pick specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global

# Or install locally to the current project directory
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --local

The installer copies skills and agents to your plugin directory and registers the MCP server.

3. Start collaborating

"I need a two-VM setup with PostgreSQL on Ubuntu and a Java webapp on Windows. Let's plan it."
"Install this app on my-vm and record the full process."
"Check on the installer progress — is it done yet?"
"What went wrong in the last run? Give me a timeline."

The plugin automatically loads the right skill and agent for the task.


Configuration

Paths, durable job state, and optional overrides are controlled by environment variables (for example where .mcp-vm-blackbox resolves on disk). See docs/configuration.md for the full list, data-root resolution order, and troubleshooting notes for operators and AI-assisted debugging.


Prerequisites

Requirement Version
Python 3.11+
uv latest
VirtualBox 7.1+
Vagrant 2.3+
Packer (for VM builds) 1.10+
tmux (for detached builds) any
WinRM on guest for Windows VMs

Architecture

Three-layer cooperative design:

┌─────────────────────────────────────────────────────────┐
│                      Skills                              │
│  vm-vision-control  vm-ground-control  vm-radio-control  │
│  vm-blackbox-record                                      │
│         (define approved loops and tooling)              │
└────────────────────┬────────────────────────────────────┘
                     │ dispatches
┌────────────────────▼────────────────────────────────────┐
│                      Agents                              │
│  vm-pilot     vm-pilot-inspector     vm-mechanic         │
│  (acts)       (observes + reports)   (infrastructure)    │
└────────────────────┬────────────────────────────────────┘
                     │ calls
┌────────────────────▼────────────────────────────────────┐
│                   MCP Server                             │
│   vm_screenshot  vm_powershell  vm_type  vm_key          │
│   vm_mouse_click  vagrant_*  ci_*  podman_*  build_*     │
│              (executes against real infrastructure)      │
└─────────────────────────────────────────────────────────┘

Skills define the approved loop. They do not take actions.

Agents take actions. Pilots drive VMs. Inspectors read state. Mechanics set up infrastructure.

MCP server executes tool calls against VirtualBox, Vagrant, WinRM, SSH, CI hosts, and container runtimes.


Skills

vm-vision-control — GUI Interaction

The mandatory entry point for any task that touches a VM's desktop. The loop is strict:

1. Screenshot     →  vm_screenshot
2. Read image     →  Read tool on the saved_to path
3. Decide         →  Analyse screen, determine next action
4. Act            →  vm_mouse_click / vm_type / vm_key / vm_powershell
5. Repeat         →  Return to step 1

Natural language triggers: "click on the VM", "type into the VM", "what's on the screen", "navigate the installer"

Timing between steps:

Operation Wait
Click a button 0.5 – 1 s
Open an application 3 – 5 s
Launch an installer 10 – 15 s
Installer panel transition 2 – 3 s
Installer completion 30 – 60 s
VM boot 60 – 120 s

Full reference: docs/skills/vm-vision-control.md


vm-ground-control — Orchestration

Use for any operation taking more than ~30 seconds. Dispatches vm-pilot in background and returns a structured block to parse.

agent_id = Task(
    description="Run the installer",
    subagent_type="vm-pilot",
    prompt="""
GOAL: Run silent installer and report success.

STEPS:
1. Invoke vm-vision-control skill.
2. Test-NetConnection <HOST> -Port <PORT>
3. Register-ScheduledTask ...
4. Poll every 30s until State = Ready or 15 min elapsed
5. Read install log; take screenshot.

RETURN FORMAT:
JOB_ID: <uuid>
STATUS: SUCCESS | FAILED | BLOCKED | IN_PROGRESS
SCOPE: <vm_name> / <task_description>
OUTCOME: <2-4 sentences>
ISSUES: <count> (<classifications>) | none
BLOCKED_BY: <description> | —
DETAIL:
  steps: <path_to_step_log>
  issues: <path_to_issue_register> | none
  recording: <path_to_recording> | none
  screenshots: <glob_pattern> | none
  video: <true | false>
  pilot.screen_state: <description>
  pilot.files_read: <comma-separated filenames>
NEXT: <recommended action>
""",
    run_in_background=True,
)

Store the agent ID — you need it to check progress and resume.

Route on STATUS:

STATUS Action
SUCCESS Proceed
FAILED Check ISSUES, fix and re-dispatch
BLOCKED Read BLOCKED_BY, resolve, re-dispatch
IN_PROGRESS Wait and re-check via radio-control

Full reference: docs/skills/vm-ground-control.md


vm-radio-control — Progress Observation

Check a running pilot without interrupting. Dispatches vm-pilot-inspector to read transcripts and query VM state.

Task(
    description="Check installer progress",
    subagent_type="vm-pilot-inspector",
    prompt="""
output_type: progress
pilot_agent_id: <agent-id-from-ground-control>
vm_name: <vm-name>
project_path: /absolute/path/to/project
""",
    run_in_background=False,
)

Output types:

Type Returns Use when
quick JOB_ID + STATUS + SCOPE + OUTCOME Fast pulse, context is tight
progress Full canonical template Normal progress check (default)
screenshot Report + UI coordinates Need exact screen state
transcript Report + last 10 turns Pilot appears stuck

Full reference: docs/skills/vm-radio-control.md


vm-blackbox-record — Session Recording

Record VM screen as WebM/VP8 video and extract frames.

# Start recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record start "my-vm"

# Run your operation (VM is live)

# Stop recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record stop "my-vm"

# Extract frames — MCP only: vm_extract_frames (see docs/skills/vm-blackbox-record.md)

Recording runs on the host via VirtualBox — no guest changes needed.

Full reference: docs/skills/vm-blackbox-record.md


Agents

vm-pilot

Hands-and-eyes agent. Takes screenshots, runs PowerShell, sends keystrokes. Returns structured results.

Five tools:

Tool Does
vm_screenshot Capture screen, return PNG + path
vm_powershell Run PowerShell; return output
vm_type Type text (256 char limit)
vm_key Send enter/tab/escape/space
vm_info Return VM hardware + state

Follows declare-execute-reflect loop. When blocked, populates ISSUES and BLOCKED_BY and returns STATUS: BLOCKED or FAILED.

Full reference: docs/agents/vm-pilot.md


vm-pilot-inspector

Observer agent. Reads pilot transcript, queries VM state, takes screenshots, analyzes recordings. Returns structured reports.

Never takes control actions. No typing, no key sends, no process invocation.

Full reference: docs/agents/vm-pilot-inspector.md


MCP Tools (28 total)

VM Inspection (4 tools)

vm_list, vm_info, vm_screenshot, vm_screenshot_api

VM Interaction (5 tools)

vm_powershell, vm_type, vm_key, vm_key_combo, vm_mouse_click

VM Lifecycle (3 tools)

vm_start, vm_stop, vm_wait_ready

Vagrant (5 tools)

vagrant_status, vagrant_up, vagrant_provision, vagrant_destroy, vagrant_winrm

Build Orchestration (3 tools)

build_start, build_watch, build_status

CI Tools (4 tools)

ci_check, ci_run, ci_pipeline_status, ci_preflight

Podman / Containers (4 tools)

podman_ps, podman_exec, podman_logs, podman_service_status

KVM Tools (2 tools)

kvm_unload, kvm_reload

Full signatures: docs/mcp-tools.md


Hard Constraints

These are required by the plugin architecture:

  1. vm-vision-control is mandatory before any GUI interaction. Do not call vm_screenshot, vm_mouse_click, vm_type, or vm_key directly from the orchestrator.

  2. MCP is the only approved path. Do not use raw Bash for vagrant, VBoxManage, podman, or WinRM — always go through mcp__plugin_vm-flightsimulator_vm-blackbox__* tools.

  3. The orchestrator does not call VM tools while a pilot is running. The pilot owns the VM. Interrupt only by resuming the agent.

  4. Skills are scoped. Each skill has a single responsibility. Do not combine vision-control and recording in one invocation.


Conventions

  • vm_type has a 256-character limit per call. Chunk long text across multiple calls.
  • Password fields may double-type. Clear with Ctrl+A → backspace before typing.
  • vm_mouse_click and vm_screenshot_api require target="local" (vboxapi uses local XPCOM only).
  • Recording parameters lock when enabled — configure before starting, not after.
  • The pilot's transcript lives at ~/.claude/projects/<encoded_project_path>/<agent-id>.jsonl.

Skill-to-Task Decision Guide

You want to... Use
Click a button / type text / read the screen vm-vision-control
Run a multi-step operation (>30 seconds) vm-ground-control → dispatches vm-pilot
Check on a running background task vm-radio-control → dispatches vm-pilot-inspector
Continue a completed pilot with more work vm-ground-control with resume=agent_id
Record an operation as video vm-blackbox-record
Extract frames from a recording MCP vm_extract_frames
Get the current screen without interrupting pilot vm-radio-control with output_type: screenshot

Local Development

# Install dependencies
uv sync

# Run all tests
uv run pytest

# Format
uv run ruff format

# Lint
uv run ruff check

# Type check
uv run ty check packages/

# Test the plugin locally in Claude Code
claude --plugin-dir ./

Coverage threshold: 60%. Modules requiring live VMs (WinRM, SSH tunnel, VBoxManage) are excluded from CI coverage.


Installation Reference

# Marketplace
claude plugin marketplace add bitflight-devops/vm-flightsimulator

# MCP server only (PyPI)
uvx mcp-vm-blackbox

# Persistent MCP registration
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_vm_blackbox-0.11.3.tar.gz (345.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_vm_blackbox-0.11.3-py3-none-any.whl (404.6 kB view details)

Uploaded Python 3

File details

Details for the file mcp_vm_blackbox-0.11.3.tar.gz.

File metadata

  • Download URL: mcp_vm_blackbox-0.11.3.tar.gz
  • Upload date:
  • Size: 345.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_vm_blackbox-0.11.3.tar.gz
Algorithm Hash digest
SHA256 0e17bf8bb6815a2acb5363e2d203165e1b7a9ae0061f8603ca8f2c97d5473f74
MD5 23d14d2337dcb1637cf8e7e2fba37d5a
BLAKE2b-256 73404e8feda47db5c93df21574776622890e2450f9c2fd7d14f71f928440a11b

See more details on using hashes here.

File details

Details for the file mcp_vm_blackbox-0.11.3-py3-none-any.whl.

File metadata

  • Download URL: mcp_vm_blackbox-0.11.3-py3-none-any.whl
  • Upload date:
  • Size: 404.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_vm_blackbox-0.11.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3ed1ea891d55c1cbf30074a06fd3b4b4a9cec9c126f69f3a40cc796e4707463d
MD5 175af0e82fdb4cf29791bcaeddbc6f46
BLAKE2b-256 f19a4559d30ac1c3e5d5ca63f43ec47bc759a6eda6adc3df39a9e3caf7c86134

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page