MCP server for controlling VirtualBox VMs — screenshots, keyboard input, PowerShell, vagrant, WinRM, podman, and CI build pipelines

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jamie-bitflight

These details have not been verified by PyPI

Project description

vm-flightsimulator

A Claude Code plugin that gives AI agents the ability to see and act on real virtual machines, turning them into a unified machine control substrate for validating software across complete workflows.

The plugin provides a collaborative testing partner: describe what you want to test, and AI-driven pilots autonomously work through it with full observability. A real-time dashboard lets you watch progress and steer agents mid-flight.

What This Is

Not a specific-scenario tool. A toolkit for long-running, AI-driven validation work on any virtual machine.

You ask: "Install this app and verify it works end-to-end." The plugin provides the control surface (screenshot, keyboard, mouse, PowerShell, WinRM). Pilots fly the VM, inspectors analyze what happened, and a distributed coordinator keeps everything moving without blocking.

Currently targets VirtualBox + Vagrant. Designed for expansion to AWS, OpenTofu, Cloudflare Workers, bare metal, and beyond.

The Actor Model

Understanding who does what:

Pilots

AI agents that act on VMs. They see (screenshots), type (keyboard/mouse), run commands (PowerShell/SSH), and report outcomes. Follow a tight declare-execute-reflect loop: log intent → perform action → log outcome. Autonomous; do not return to the orchestrator after every action.

Inspectors

AI agents that analyze without touching. Dispatched on-demand to read pilot transcripts, cross-reference recordings, detect stalls, and produce classified reports. Can work even when the pilot is gone — they reconstruct what happened from durable evidence. Enable scheduled monitoring and post-hoc analysis.

Mechanics

Handle the infrastructure layer: cross-VM networking, database setup, VM snapshots, Vagrant provisioning, port forwarding. Run as loadable skills (for quick infra tasks) or dedicated agents (for complex multi-VM setup). Set up the stage; pilots perform on it; inspectors review the performance.

Ground Control

The orchestrator's coordination skill. Dispatches pilots in background, monitors progress via inspectors, never blocks on any single task. Routes work, handles steering, schedules follow-ups.

Dashboard

Real-time UI showing fleet status, job timelines, screenshots, and video playback. Two-way channel: operators watch pilots and can send steering prompts mid-flight. TODO list rendered live — pilots and engineers both add work as needed.

Core Principles

The harness enables, it doesn't do. It provides generic control (keyboard/mouse), coordination (job tracking, scheduling), and observability (recordings, dashboards, step logs). The AI figures out the task-specific steps.

Orchestrator is never blocked. Pilots run asynchronously; communication is file- and database-mediated. When a pilot is stuck, the inspector detects it and reports without interrupting the pilot.

Progressive disclosure. The orchestrator knows WHERE things are (file paths, job IDs), not WHAT they contain (full logs, recordings). Reads only what it needs to make routing decisions. Raw evidence flows through inspectors.

Hook-based steering. Pilots check for incoming messages via a PreToolUse hook on every tool call. Messages from the orchestrator or dashboard inject course corrections in real time. The pilot does not poll — the hook fires automatically.

Onboarding Experience

The plugin opens with a conversation:

"What VM scenarios will we be piloting in this simulator? Let's chat. Tell me what you have already, your ideas, your constraints and goals, and we will plan out the testing system for it."

The AI listens, discovers what infrastructure you have, helps you plan the environment (loading reference skills for Vagrant, Packer, Docker, PostgreSQL, Windows Server, etc.), and then builds it. You never write a Vagrantfile or answer a configuration form.

Current Capabilities

VirtualBox & Vagrant

VM lifecycle (start, stop, snapshot, restore)
Provisioning and configuration
Cross-VM networking

Windows & Linux Interaction

GUI automation (screenshot, keyboard, mouse) via vision-control
PowerShell scripts and commands via WinRM
SSH / Bash on Linux VMs
Hardware control (keyboard combos, clipboard)

Observation & Coordination

Session recording (WebM/VP8) via VBoxManage
Frame extraction at specific timestamps
Job tracking with step-level audit logs
Issue register for classified problems
Real-time dashboard with live video playback

Fleet Management

Multi-VM coordination
Job store (durable across sessions)
Destroy guards (prevent accidents)
Scheduled task support for unattended operation

Future Direction

The unified control model stays the same; the adapters grow:

OpenTofu — Infrastructure as code for cloud resources
AWS — EC2 instances, service discovery, networking
Cloudflare Workers — Edge compute validation
Bare metal — Physical machine provisioning and testing

One set of pilot/mechanic/inspector/dashboard code. Different adapters underneath. The plugin becomes a true unified machine control substrate.

Quick Start

1. Install the MCP server

# Run on demand
uvx mcp-vm-blackbox

# Or add persistently to Claude Code
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox

2. Install the plugin

Option A — Marketplace (Claude Code only):

claude plugin marketplace add bitflight-devops/vm-flightsimulator

Then open /plugins and install vm-flightsimulator.

Option B — vm-blackbox-installer (all platforms):

# Install for all platforms globally
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global

# Or pick specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global

# Or install locally to the current project directory
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --local

The installer copies skills and agents to your plugin directory and registers the MCP server.

3. Start collaborating

"I need a two-VM setup with PostgreSQL on Ubuntu and a Java webapp on Windows. Let's plan it."
"Install this app on my-vm and record the full process."
"Check on the installer progress — is it done yet?"
"What went wrong in the last run? Give me a timeline."

The plugin automatically loads the right skill and agent for the task.

Configuration

Paths, durable job state, and optional overrides are controlled by environment variables (for example where .mcp-vm-blackbox resolves on disk). See docs/configuration.md for the full list, data-root resolution order, and troubleshooting notes for operators and AI-assisted debugging.

Prerequisites

Requirement	Version
Python	3.11+
uv	latest
VirtualBox	7.1+
Vagrant	2.3+
Packer (for VM builds)	1.10+
tmux (for detached builds)	any
WinRM on guest	for Windows VMs

Architecture

Three-layer cooperative design:

┌─────────────────────────────────────────────────────────┐
│                      Skills                              │
│  vm-vision-control  vm-ground-control  vm-radio-control  │
│  vm-blackbox-record                                      │
│         (define approved loops and tooling)              │
└────────────────────┬────────────────────────────────────┘
                     │ dispatches
┌────────────────────▼────────────────────────────────────┐
│                      Agents                              │
│  vm-pilot     vm-pilot-inspector     vm-mechanic         │
│  (acts)       (observes + reports)   (infrastructure)    │
└────────────────────┬────────────────────────────────────┘
                     │ calls
┌────────────────────▼────────────────────────────────────┐
│                   MCP Server                             │
│   vm_screenshot  vm_powershell  vm_type  vm_key          │
│   vm_mouse_click  vagrant_*  ci_*  podman_*  build_*     │
│              (executes against real infrastructure)      │
└─────────────────────────────────────────────────────────┘

Skills define the approved loop. They do not take actions.

Agents take actions. Pilots drive VMs. Inspectors read state. Mechanics set up infrastructure.

MCP server executes tool calls against VirtualBox, Vagrant, WinRM, SSH, CI hosts, and container runtimes.

Skills

vm-vision-control — GUI Interaction

The mandatory entry point for any task that touches a VM's desktop. The loop is strict:

1. Screenshot     →  vm_screenshot
2. Read image     →  Read tool on the saved_to path
3. Decide         →  Analyse screen, determine next action
4. Act            →  vm_mouse_click / vm_type / vm_key / vm_powershell
5. Repeat         →  Return to step 1

Natural language triggers: "click on the VM", "type into the VM", "what's on the screen", "navigate the installer"

Timing between steps:

Operation	Wait
Click a button	0.5 – 1 s
Open an application	3 – 5 s
Launch an installer	10 – 15 s
Installer panel transition	2 – 3 s
Installer completion	30 – 60 s
VM boot	60 – 120 s

Full reference: docs/skills/vm-vision-control.md

vm-ground-control — Orchestration

Use for any operation taking more than ~30 seconds. Dispatches vm-pilot in background and returns a structured block to parse.

agent_id = Task(
    description="Run the installer",
    subagent_type="vm-pilot",
    prompt="""
GOAL: Run silent installer and report success.

STEPS:
1. Invoke vm-vision-control skill.
2. Test-NetConnection <HOST> -Port <PORT>
3. Register-ScheduledTask ...
4. Poll every 30s until State = Ready or 15 min elapsed
5. Read install log; take screenshot.

RETURN FORMAT:
JOB_ID: <uuid>
STATUS: SUCCESS | FAILED | BLOCKED | IN_PROGRESS
SCOPE: <vm_name> / <task_description>
OUTCOME: <2-4 sentences>
ISSUES: <count> (<classifications>) | none
BLOCKED_BY: <description> | —
DETAIL:
  steps: <path_to_step_log>
  issues: <path_to_issue_register> | none
  recording: <path_to_recording> | none
  screenshots: <glob_pattern> | none
  video: <true | false>
  pilot.screen_state: <description>
  pilot.files_read: <comma-separated filenames>
NEXT: <recommended action>
""",
    run_in_background=True,
)

Store the agent ID — you need it to check progress and resume.

Route on STATUS:

STATUS	Action
`SUCCESS`	Proceed
`FAILED`	Check ISSUES, fix and re-dispatch
`BLOCKED`	Read BLOCKED_BY, resolve, re-dispatch
`IN_PROGRESS`	Wait and re-check via radio-control

Full reference: docs/skills/vm-ground-control.md

vm-radio-control — Progress Observation

Check a running pilot without interrupting. Dispatches vm-pilot-inspector to read transcripts and query VM state.

Task(
    description="Check installer progress",
    subagent_type="vm-pilot-inspector",
    prompt="""
output_type: progress
pilot_agent_id: <agent-id-from-ground-control>
vm_name: <vm-name>
project_path: /absolute/path/to/project
""",
    run_in_background=False,
)

Output types:

Type	Returns	Use when
`quick`	JOB_ID + STATUS + SCOPE + OUTCOME	Fast pulse, context is tight
`progress`	Full canonical template	Normal progress check (default)
`screenshot`	Report + UI coordinates	Need exact screen state
`transcript`	Report + last 10 turns	Pilot appears stuck

Full reference: docs/skills/vm-radio-control.md

vm-blackbox-record — Session Recording

Record VM screen as WebM/VP8 video and extract frames.

# Start recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record start "my-vm"

# Run your operation (VM is live)

# Stop recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record stop "my-vm"

# Extract frames — MCP only: vm_extract_frames (see docs/skills/vm-blackbox-record.md)

Recording runs on the host via VirtualBox — no guest changes needed.

Full reference: docs/skills/vm-blackbox-record.md

Agents

vm-pilot

Hands-and-eyes agent. Takes screenshots, runs PowerShell, sends keystrokes. Returns structured results.

Five tools:

Tool	Does
`vm_screenshot`	Capture screen, return PNG + path
`vm_powershell`	Run PowerShell; return output
`vm_type`	Type text (256 char limit)
`vm_key`	Send enter/tab/escape/space
`vm_info`	Return VM hardware + state

Follows declare-execute-reflect loop. When blocked, populates ISSUES and BLOCKED_BY and returns STATUS: BLOCKED or FAILED.

Full reference: docs/agents/vm-pilot.md

vm-pilot-inspector

Observer agent. Reads pilot transcript, queries VM state, takes screenshots, analyzes recordings. Returns structured reports.

Never takes control actions. No typing, no key sends, no process invocation.

Full reference: docs/agents/vm-pilot-inspector.md

MCP Tools (28 total)

VM Inspection (4 tools)

vm_list, vm_info, vm_screenshot, vm_screenshot_api

VM Interaction (5 tools)

vm_powershell, vm_type, vm_key, vm_key_combo, vm_mouse_click

VM Lifecycle (3 tools)

vm_start, vm_stop, vm_wait_ready

Vagrant (5 tools)

vagrant_status, vagrant_up, vagrant_provision, vagrant_destroy, vagrant_winrm

Build Orchestration (3 tools)

build_start, build_watch, build_status

CI Tools (4 tools)

ci_check, ci_run, ci_pipeline_status, ci_preflight

Podman / Containers (4 tools)

podman_ps, podman_exec, podman_logs, podman_service_status

KVM Tools (2 tools)

kvm_unload, kvm_reload

Full signatures: docs/mcp-tools.md

Hard Constraints

These are required by the plugin architecture:

vm-vision-control is mandatory before any GUI interaction. Do not call vm_screenshot, vm_mouse_click, vm_type, or vm_key directly from the orchestrator.
MCP is the only approved path. Do not use raw Bash for vagrant, VBoxManage, podman, or WinRM — always go through mcp__plugin_vm-flightsimulator_vm-blackbox__* tools.
The orchestrator does not call VM tools while a pilot is running. The pilot owns the VM. Interrupt only by resuming the agent.
Skills are scoped. Each skill has a single responsibility. Do not combine vision-control and recording in one invocation.

Conventions

vm_type has a 256-character limit per call. Chunk long text across multiple calls.
Password fields may double-type. Clear with Ctrl+A → backspace before typing.
vm_mouse_click and vm_screenshot_api require target="local" (vboxapi uses local XPCOM only).
Recording parameters lock when enabled — configure before starting, not after.
The pilot's transcript lives at ~/.claude/projects/<encoded_project_path>/<agent-id>.jsonl.

Skill-to-Task Decision Guide

You want to...	Use
Click a button / type text / read the screen	`vm-vision-control`
Run a multi-step operation (>30 seconds)	`vm-ground-control` → dispatches `vm-pilot`
Check on a running background task	`vm-radio-control` → dispatches `vm-pilot-inspector`
Continue a completed pilot with more work	`vm-ground-control` with `resume=agent_id`
Record an operation as video	`vm-blackbox-record`
Extract frames from a recording	MCP `vm_extract_frames`
Get the current screen without interrupting pilot	`vm-radio-control` with `output_type: screenshot`

Local Development

# Install dependencies
uv sync

# Run all tests
uv run pytest

# Format
uv run ruff format

# Lint
uv run ruff check

# Type check
uv run ty check packages/

# Test the plugin locally in Claude Code
claude --plugin-dir ./

Coverage threshold: 60%. Modules requiring live VMs (WinRM, SSH tunnel, VBoxManage) are excluded from CI coverage.

Installation Reference

# Marketplace
claude plugin marketplace add bitflight-devops/vm-flightsimulator

# MCP server only (PyPI)
uvx mcp-vm-blackbox

# Persistent MCP registration
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox

License

MIT — see LICENSE

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jamie-bitflight

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.2.14

Mar 27, 2026

1.2.13

Mar 27, 2026

0.20.3

Apr 24, 2026

0.19.4

Apr 5, 2026

0.17.1

Apr 5, 2026

0.17.0

Apr 5, 2026

0.16.0

Apr 5, 2026

0.15.0

Apr 5, 2026

0.14.1

Apr 4, 2026

0.13.0

Apr 4, 2026

0.12.1

Apr 1, 2026

0.12.0

Apr 1, 2026

This version

0.11.3

Mar 31, 2026

0.9.1

Mar 30, 2026

0.9.0

Mar 29, 2026

0.8.0

Mar 28, 2026

0.7.0

Mar 28, 2026

0.6.1

Mar 28, 2026

0.3.1

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_vm_blackbox-0.11.3.tar.gz (345.9 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_vm_blackbox-0.11.3-py3-none-any.whl (404.6 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file mcp_vm_blackbox-0.11.3.tar.gz.

File metadata

Download URL: mcp_vm_blackbox-0.11.3.tar.gz
Upload date: Mar 31, 2026
Size: 345.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_vm_blackbox-0.11.3.tar.gz
Algorithm	Hash digest
SHA256	`0e17bf8bb6815a2acb5363e2d203165e1b7a9ae0061f8603ca8f2c97d5473f74`
MD5	`23d14d2337dcb1637cf8e7e2fba37d5a`
BLAKE2b-256	`73404e8feda47db5c93df21574776622890e2450f9c2fd7d14f71f928440a11b`

See more details on using hashes here.

File details

Details for the file mcp_vm_blackbox-0.11.3-py3-none-any.whl.

File metadata

Download URL: mcp_vm_blackbox-0.11.3-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 404.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_vm_blackbox-0.11.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ed1ea891d55c1cbf30074a06fd3b4b4a9cec9c126f69f3a40cc796e4707463d`
MD5	`175af0e82fdb4cf29791bcaeddbc6f46`
BLAKE2b-256	`f19a4559d30ac1c3e5d5ca63f43ec47bc759a6eda6adc3df39a9e3caf7c86134`

See more details on using hashes here.

mcp-vm-blackbox 0.11.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

vm-flightsimulator

What This Is

The Actor Model

Pilots

Inspectors

Mechanics

Ground Control

Dashboard

Core Principles

Onboarding Experience

Current Capabilities

Future Direction

Quick Start

1. Install the MCP server

2. Install the plugin

3. Start collaborating

Configuration

Prerequisites

Architecture

Skills

vm-vision-control — GUI Interaction

vm-ground-control — Orchestration

vm-radio-control — Progress Observation

vm-blackbox-record — Session Recording

Agents

vm-pilot

vm-pilot-inspector

MCP Tools (28 total)

VM Inspection (4 tools)

VM Interaction (5 tools)

VM Lifecycle (3 tools)

Vagrant (5 tools)

Build Orchestration (3 tools)

CI Tools (4 tools)

Podman / Containers (4 tools)

KVM Tools (2 tools)

Hard Constraints

Conventions

Skill-to-Task Decision Guide

Local Development

Installation Reference

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes