Skip to main content

MCP server for controlling VirtualBox VMs — screenshots, keyboard input, PowerShell, vagrant, WinRM, podman, and CI build pipelines

Project description

vm-flightsimulator

PyPI version Python 3.11+ License: MIT Tests

A Claude Code plugin that gives AI agents a complete control surface for VM automation: screenshot-driven GUI interaction, structured background task orchestration, non-intrusive progress observation, and frame-accurate session recording. Backend-agnostic by design — VirtualBox is the current implementation.


What It Does

You tell the agent "install this software on the VM" and the plugin handles everything: starting Vagrant, taking screenshots to verify state, typing into windows, running PowerShell over WinRM, polling scheduled tasks to completion, tailing logs, and returning a structured result you can act on.

The plugin enforces a clear separation of concerns:

Skills  →  define approved loops and tooling the orchestrator follows
Agents  →  take actions (vm-pilot) or observe state (vm-pilot-inspector)
MCP     →  executes tool calls against the actual VM

Nothing is a black box. Every action goes through the loop. Every result comes back structured.


Quick Start

1. Install the MCP server

# Run on demand
uvx mcp-vm-blackbox

# Or add persistently to Claude Code
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox

2. Install the plugin

Option A — Marketplace (Claude Code only):

claude plugin marketplace add bitflight-devops/vm-flightsimulator

Then open /plugins in Claude Code and install vm-flightsimulator.

Option B — vm-blackbox-installer (Claude, OpenCode, Gemini CLI, Codex):

# Install for all platforms, globally (~/.claude, ~/.gemini, etc.)
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global

# Or pick specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global

# Or install locally to the current project directory
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --local

The installer copies skills/ and agents/ to each platform's plugin directory and registers the mcp-vm-blackbox MCP server in the platform's config file. See Installer Reference for full details.

3. Start working

"Take a screenshot of my-vm and describe what's on screen"
"Run the installer on my-vm and tell me when it's done"
"Record the boot sequence of my-vm for the next 2 minutes"

The plugin automatically selects the right skill and agent for each task.


Prerequisites

Requirement Version
Python 3.11+
uv latest
VirtualBox 7.1+
Vagrant 2.3+
Packer (for VM builds) 1.10+
tmux (for detached builds) any
WinRM on guest configured for Windows VMs

Architecture

The plugin uses a three-layer architecture. Each layer has a single job.

┌─────────────────────────────────────────────────────────┐
│                      Skills                              │
│  vm-vision-control  vm-ground-control  vm-radio-control  │
│  vm-blackbox-record                                      │
│         (define approved loops and tooling)              │
└────────────────────┬────────────────────────────────────┘
                     │ dispatches
┌────────────────────▼────────────────────────────────────┐
│                      Agents                              │
│     vm-pilot               vm-pilot-inspector            │
│  (acts on the VM)         (observes VM state)            │
└────────────────────┬────────────────────────────────────┘
                     │ calls
┌────────────────────▼────────────────────────────────────┐
│                   MCP Server                             │
│   vm_screenshot  vm_powershell  vm_type  vm_key          │
│   vm_mouse_click  vagrant_*  ci_*  podman_*  build_*     │
│              (executes against real infrastructure)      │
└─────────────────────────────────────────────────────────┘

Skills are instruction logic — they define the approved loop and tooling the orchestrator follows. They do not take actions directly.

Agents take actions. vm-pilot drives the VM. vm-pilot-inspector reads state without touching anything.

The MCP server (mcp-vm-blackbox on PyPI) executes tool calls. It connects to VirtualBox via vboxapi (local XPCOM) and VBoxManage (local or SSH), to Windows guests via WinRM, and to remote CI hosts via SSH tunnels.


Skills

vm-vision-control — GUI Interaction Loop

The mandatory entry point for any task that touches a VM's desktop. Clicking, typing, reading the screen — all of it goes through this skill first.

The loop is strict:

1. Screenshot     →  vm_screenshot
2. Read image     →  Read tool on the saved_to path
3. Decide         →  Analyse screen, determine next action
4. Act            →  vm_mouse_click / vm_type / vm_key / vm_powershell
5. Repeat         →  Return to step 1

Never act without a fresh screenshot. Never skip the read step.

Natural language triggers: "click on the VM", "type into the VM", "what's on the screen", "navigate the installer", "take a screenshot"

Timing to observe between steps:

Operation Wait before next screenshot
Click a button 0.5 – 1 s
Open an application 3 – 5 s
Launch an installer 10 – 15 s
Installer panel transition 2 – 3 s
Installer completion 30 – 60 s
VM boot 60 – 120 s

Full reference: docs/skills/vm-vision-control.md


vm-ground-control — Orchestrator Coordination

Use for any VM operation that will take more than ~30 seconds. Dispatches vm-pilot as a background Task and gives you a structured return block to parse.

agent_id = Task(
    description="Run the installer",
    subagent_type="vm-pilot",
    prompt="""
GOAL:
Run the silent installer via scheduled task and report whether it succeeded.

STEPS:
1. Invoke vm-vision-control skill.
2. Test-NetConnection <HOST> -Port <PORT> -InformationLevel Quiet
3. Register-ScheduledTask ...
4. Poll every 30 s until State = Ready or 15 min elapsed
5. Read the install log
6. Take a screenshot; describe what is on screen.

EVIDENCE TO COLLECT:
- Log file full contents
- Artefact path existence (yes/no)
- Final VM screen description

RETURN FORMAT:
STATUS: SUCCESS | FAILED | PARTIAL | TIMEOUT
SUMMARY: <2-4 sentences>
FILES_READ:
  install.log: <contents or "not found">
SCREEN_STATE: <description>
ISSUES: <or "none">
NEXT_STEP: <recommended action>
""",
    run_in_background=True,
)

Store the agent ID. You need it for progress checks and resumption.

The pilot owns the VM for the duration. The orchestrator does not call vm_screenshot or vm_powershell while a pilot task is running.

Routing on STATUS:

STATUS Meaning Action
SUCCESS Task completed, artefact confirmed Proceed
FAILED Task failed with known cause Check ISSUES, fix and re-dispatch
PARTIAL Evidence incomplete Resume pilot to collect missing evidence
TIMEOUT Poll limit reached Check SCREEN_STATE + FILES_READ

Built-in templates: installer via scheduled task, task poll, config file read, network connectivity check.

Full reference: docs/skills/vm-ground-control.md


vm-radio-control — Progress Observer

Check what a running pilot is doing without interrupting it. Dispatches vm-pilot-inspector as a foreground Task that reads the pilot's transcript and queries VM state independently.

Task(
    description="Check installer progress",
    subagent_type="vm-pilot-inspector",
    prompt="""
output_type: progress
pilot_agent_id: <agent-id-from-ground-control>
vm_name: <vm-name>
project_path: /absolute/path/to/project
""",
    run_in_background=False,
)

Output types:

output_type Collects Use when
quick STATUS + SCREEN_STATE only Fast pulse check, context is tight
progress Full 6-step report (default) Normal progress check
screenshot Full report + UI element coordinates Need to verify exact screen state
transcript Full report + last 10 pilot turns Pilot appears stuck

Structured report fields: STATUS, TASK_STATE, LOG_TAIL, PATH_EXISTS, SCREEN_STATE, PILOT_PROGRESS, ELAPSED_ESTIMATE, ISSUES

Route on STATUS only — not on SCREEN_STATE or PILOT_PROGRESS.

Full reference: docs/skills/vm-radio-control.md


vm-blackbox-record — Session Recording

Record VM screen sessions as WebM/VP8 video and extract frames at specific timestamps. Recording runs entirely on the host via VBoxManage — no guest changes required.

# Start recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record start "my-vm"
# → Recording: scratch/recordings/my-vm-20260305-143022-screen0.webm

# Run your operation (VM is live while recording)

# Stop recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record stop "my-vm"

# Extract frames for inspection
uv run skills/vm-blackbox-record/scripts/extract_frames.py \
  scratch/recordings/my-vm-20260305-143022-screen0.webm \
  --interval 30 \
  --outdir /tmp/frames

Frame extraction uses PyAV directly — no ffmpeg binary required.

Recording parameters are locked once recording starts (VirtualBox 7.1 constraint). Configure resolution, bitrate, and frame rate before enabling. vm_capture.py handles the correct sequence automatically.

Backend table:

Backend Status Notes
vboxmanage Ready VirtualBox 7.1+. Default.
ffmpeg Planned v4l2/X11 capture
mcp Planned MCP screenshot sequences assembled into video
winrm Planned PowerShell-based capture over WinRM

Full reference: docs/skills/vm-blackbox-record.md


Agents

vm-pilot

Hands-and-eyes agent. Takes screenshots, runs PowerShell via WinRM, sends keystrokes, and returns structured results. Dispatched by vm-ground-control.

Exactly five tools:

Tool Does
vm_screenshot Capture screen, return base64_png + saved_to
vm_powershell Run PowerShell; return stdout/stderr/exit_code
vm_type Type text (256-char limit per call)
vm_key Send enter/tab/escape/space/backspace
vm_info Return VM hardware metadata and state

The pilot acts and observes — it never analyses, plans, or recommends. When it cannot proceed, it populates ISSUES in the return block and returns STATUS: PARTIAL or STATUS: FAILED. The orchestrator decides next steps.

Full reference: docs/agents/vm-pilot.md


vm-pilot-inspector

Observer agent. Reads the pilot's transcript, queries VM state via WinRM, takes a screenshot or extracts a recording frame, and returns one structured report. Dispatched by vm-radio-control.

Six-step workflow:

  1. Read pilot transcript (~/.claude/projects/<encoded_path>/<agent-id>.jsonl)
  2. Check process or scheduled task state via WinRM
  3. Tail the 3 most recent log files (last 10 lines each)
  4. Check sentinel path existence
  5. Take a live screenshot or extract a recording frame
  6. Compose and return the structured report

Never takes control actions. No vm_type, no vm_key, no process invocation.

Full reference: docs/agents/vm-pilot-inspector.md


MCP Tools

The vm-blackbox MCP server exposes 23 tools across six domains.

VM Inspection (4 tools)

Tool Description
vm_list List all VMs with running state
vm_info Return memory, CPU, VRAM, Guest Additions, state
vm_screenshot Capture screen; image embedded inline in response
vm_screenshot_api Capture via vboxapi (no subprocess, local only)

VM Interaction (4 tools)

Tool Description
vm_powershell Run PowerShell via WinRM; SSH tunnel for remotes
vm_type Type text; 256-char limit; handles chunking
vm_key Send enter/tab/escape/space/backspace
vm_mouse_click Click at absolute coordinates via vboxapi (local only)

Vagrant (5 tools)

Tool Description
vagrant_status Show Vagrantfile VM states
vagrant_up Start a VM
vagrant_provision Run provisioners
vagrant_destroy Destroy a VM (optionally in tmux)
vagrant_winrm Run a command via WinRM on a Windows Vagrant VM

Build Orchestration (3 tools)

Tool Description
build_start Start a background build (Packer, tmux session)
build_watch Tail build log until pattern matches or timeout
build_status Check if build is running; return last N log lines

CI Tools (4 tools)

Tool Description
ci_check SSH connectivity + host stats
ci_run Run a shell command on the CI host
ci_pipeline_status Get GitLab pipeline status for a project
ci_preflight Verify required tools are installed on CI host

Podman / Containers (4 tools)

Tool Description
podman_ps List containers
podman_exec Run a command inside a container
podman_logs Fetch last N log lines
podman_restart Restart a container
podman_service_status Check systemd service status for a Podman service

Target Parameter

All tools accept a target parameter:

Value Connects to
"local" Local host (default)
"ci" Named target from server config
"user@host" Raw SSH string
"user@host:port" Raw SSH string with explicit port

Full signatures: docs/mcp-tools.md


Skill-to-Task Decision Guide

You want to... Use
Click a button / type text / read the screen vm-vision-control
Run a multi-step operation (>30 seconds) vm-ground-control → dispatches vm-pilot
Check on a running background task vm-radio-control → dispatches vm-pilot-inspector
Continue a completed pilot with more work vm-ground-control with resume=agent_id
Record an operation as video vm-blackbox-record
Extract frames from a recording vm-blackbox-record extract_frames.py
Get the current screen without interrupting pilot vm-radio-control with output_type: screenshot

Hard Constraints

These are not suggestions — they are required by the plugin architecture:

  1. vm-vision-control is mandatory before any GUI interaction. Do not call vm_screenshot, vm_mouse_click, vm_type, or vm_key directly from the orchestrator.

  2. MCP is the only approved path. Do not use raw Bash for vagrant, VBoxManage, podman, or WinRM — always go through mcp__vm-blackbox__* tools.

  3. The orchestrator does not call VM tools while a pilot is running. The pilot owns the VM. Interrupt only by resuming the agent.

  4. Skills are scoped. Each skill has a single responsibility. Do not combine vision-control and recording in one invocation.


Conventions

  • vm_type has a hard 256-character limit per call. Chunk long text across multiple calls.
  • Password fields may double-type. Clear with Ctrl+A → backspace before typing.
  • vm_mouse_click and vm_screenshot_api require target="local" (vboxapi uses local XPCOM only).
  • Recording parameters lock when recording is enabled — configure before starting, not after.
  • The pilot's transcript lives at ~/.claude/projects/<encoded_project_path>/<agent-id>.jsonl.

Installer Reference

vm-blackbox-installer installs skills, agents, and MCP server registration across AI coding tools in one command.

Supported Platforms

Flag Tool Config directory
--claude Claude Code ~/.claude/ or ./.claude/
--opencode OpenCode ~/.config/opencode/ or ./.opencode/
--gemini Gemini CLI ~/.gemini/ or ./.gemini/
--codex Codex ~/.codex/ or ./.codex/

Usage

# Install for all platforms globally
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global

# Install for specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global

# Install locally (current directory instead of home)
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --local

--global and --local are mutually exclusive. --global is the default when neither is specified.

What the Installer Does

For each selected platform:

  1. Copies skills/<config>/plugins/vm-flightsimulator/skills/
  2. Copies agents/<config>/plugins/vm-flightsimulator/agents/
  3. Registers { "command": "uvx", "args": ["mcp-vm-blackbox"] } in the platform's MCP config file

MCP config files written:

Platform Config file
Claude ~/.claude.json (global) or .claude.json (local)
OpenCode ~/.config/opencode/opencode.json
Gemini ~/.gemini/settings.json
Codex ~/.codex/config.toml

Gemini frontmatter transformation: The installer converts agent .md frontmatter for Gemini's schema — removes the color: field and converts comma-separated tools: and skills: strings to YAML list format.


Local Development

# Install dependencies
uv sync

# Run tests
uv run pytest

# Run a single test file
uv run pytest packages/mcp_vm_blackbox/tests/test_vm_interaction.py

# Format
uv run ruff format

# Lint
uv run ruff check

# Type check
uv run ty check packages/

# Test the plugin locally in Claude Code
claude --plugin-dir ./

Coverage threshold: 60%. Modules requiring live VMs (WinRM, SSH tunnel, VBoxManage backends) are excluded from CI coverage.


Installation Reference

# Marketplace
claude plugin marketplace add bitflight-devops/vm-flightsimulator

# MCP server only (PyPI)
uvx mcp-vm-blackbox

# Persistent MCP registration
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_vm_blackbox-0.3.1.tar.gz (75.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_vm_blackbox-0.3.1-py3-none-any.whl (84.8 kB view details)

Uploaded Python 3

File details

Details for the file mcp_vm_blackbox-0.3.1.tar.gz.

File metadata

  • Download URL: mcp_vm_blackbox-0.3.1.tar.gz
  • Upload date:
  • Size: 75.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_vm_blackbox-0.3.1.tar.gz
Algorithm Hash digest
SHA256 80d0c7eb82c4253891ecc93ef0975a68d4689917508cf0c4c6c63c92d0e8a07f
MD5 af1f908b8baa07bc913cb604cd4065ed
BLAKE2b-256 0d9db4b75a0410718c5a30f7c0759370e62d52eaaa1babd7edd8ecc68ba25bf9

See more details on using hashes here.

File details

Details for the file mcp_vm_blackbox-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: mcp_vm_blackbox-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 84.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_vm_blackbox-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c29da8f8cfd27420cb60de5aed335c607b0956ff9692a2bd7e66322f57b55444
MD5 cedf0007b62542e291aea49ef85c9a21
BLAKE2b-256 6b1ae468dcbae7253de69fc1cca1d5361ae230b07e146f694de8ceeef070b7da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page