MCP server for autonomous multi-VM cluster orchestration on libvirt/QEMU

These details have not been verified by PyPI

Project links

Project description

vmcluster-mcp

An MCP server for autonomous multi-VM cluster orchestration on libvirt/QEMU. Manages the full lifecycle of KVM virtual machine clusters — provisioning, starting, stopping, snapshotting, SSH execution, artifact distribution, and fault injection — through a structured tool interface designed for AI agents.

Overview
Prerequisites
Installation
Quick Start (First 15 Minutes)
Configuration
Topology Files
Integration: VS Code (GitHub Copilot)
Integration: Claude CLI
Available Tools
Canonical Agent Workflow
Troubleshooting
Development

Overview

vmcluster-mcp is a general-purpose MCP server. It manages clusters of KVM/QEMU virtual machines and produces a ClusterHandle — a typed descriptor passed to downstream consumers for direct SSH access. The server has no knowledge of what runs inside VMs; it knows nodes, networks, snapshots, and artifacts.

Design principles:

Topology-as-data — cluster shape is declared in a YAML file, not constructed imperatively
Structured outputs — all tools return typed Pydantic models serialized as JSON; no free-text parsing
Stateless server — all persistent state lives in libvirt and on disk; safe to restart at any time
Idempotent operations — cluster_define and related tools are safe to call multiple times

Prerequisites

Linux host with KVM/QEMU and libvirt installed (libvirtd running)
Python 3.11+
qemu-img available in PATH
genisoimage or mkisofs for cloud-init ISO generation
iptables, tc (from iproute2), and rsync for fault/artifact tools
Permission to run libvirt and host network commands (sudo access is usually required)
uv (recommended) or pip for installation

Typical package set on Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y \
  qemu-kvm libvirt-daemon-system libvirt-clients \
  qemu-utils cloud-image-utils genisoimage \
  iproute2 iptables rsync

# Verify libvirt access
virsh list --all

# Verify qemu-img
qemu-img --version

# Verify tc and iptables
tc -V
iptables --version

Installation

From PyPI (recommended)

Install the latest release:

pip install vmcluster-mcp

Or run directly without installing (via uv):

uvx vmcluster-mcp

Prerequisite: libvirt-python requires system-level development headers. Install them before running pip install:
# Ubuntu/Debian
sudo apt-get install -y libvirt-dev pkg-config gcc

# Fedora/RHEL
sudo dnf install -y libvirt-devel pkgconf-pkg-config gcc

# Arch Linux
sudo pacman -S libvirt pkgconf gcc

From source (development)

git clone https://github.com/hornc/vmcluster-mcp.git
cd vmcluster-mcp

# Create virtual environment and install
uv venv
uv pip install -e .

Quick Start (First 15 Minutes)

This path is for first-time setup on a single Linux host.

Create required directories and SSH key:

sudo mkdir -p /etc/vmcluster/topologies /etc/vmcluster/ssh
sudo mkdir -p /var/lib/vmcluster/{overlays,artifacts/trees,faults}
sudo ssh-keygen -t ed25519 -f /etc/vmcluster/ssh/vmcluster_id_ed25519 -N ""

Create /etc/vmcluster/config.yaml:

topology_dir: /etc/vmcluster/topologies
overlay_dir: /var/lib/vmcluster/overlays
artifact_registry: /var/lib/vmcluster/artifacts/registry.json
artifact_store_dir: /var/lib/vmcluster/artifacts/trees
fault_registry: /var/lib/vmcluster/faults/registry.json
ssh_key_path: /etc/vmcluster/ssh/vmcluster_id_ed25519
ssh_user: root
libvirt_uri: qemu:///system
log_level: INFO

Prepare a base image used by the example topology:

sudo mkdir -p /var/lib/vmcluster/images
sudo wget -O /tmp/ubuntu-24.04-server-cloudimg-amd64.img \
  https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img
sudo qemu-img convert -f qcow2 -O qcow2 \
  /tmp/ubuntu-24.04-server-cloudimg-amd64.img \
  /var/lib/vmcluster/images/ubuntu-6.8-base.qcow2
sudo qemu-img info /var/lib/vmcluster/images/ubuntu-6.8-base.qcow2

Add your first topology file in /etc/vmcluster/topologies/ (see example below).
Run the server locally to verify it starts:

VMCLUSTER_CONFIG=/etc/vmcluster/config.yaml .venv/bin/python -m vmcluster_mcp

Connect from your MCP client (VS Code or Claude) and run this smoke flow:

cluster_define("example-3node")
cluster_start("example-3node", wait_for_ssh=True)
cluster_status("example-3node")
node_exec("example-3node", "controller", "uname -r")
snapshot_create("example-3node", "baseline")
cluster_stop("example-3node")

Clean up when finished:

cluster_destroy("example-3node", remove_overlays=True)

Configuration

The server can be configured via a YAML file and/or environment variables. Environment variables take precedence over the config file, which takes precedence over defaults.

Config file

Default location: /etc/vmcluster/config.yaml. Override with VMCLUSTER_CONFIG env var.

# /etc/vmcluster/config.yaml

topology_dir: /etc/vmcluster/topologies      # Where topology YAML files live
overlay_dir: /var/lib/vmcluster/overlays     # Where per-node qcow2 overlays are created
artifact_registry: /var/lib/vmcluster/artifacts/registry.json
artifact_store_dir: /var/lib/vmcluster/artifacts/trees
fault_registry: /var/lib/vmcluster/faults/registry.json
ssh_key_path: /etc/vmcluster/ssh/vmcluster_id_ed25519
ssh_user: root
libvirt_uri: qemu:///system
log_level: INFO

Environment variables

Variable	Config key	Default
`VMCLUSTER_CONFIG`	(config file path)	`/etc/vmcluster/config.yaml`
`VMCLUSTER_TOPOLOGY_DIR`	`topology_dir`	`/etc/vmcluster/topologies`
`VMCLUSTER_OVERLAY_DIR`	`overlay_dir`	`/var/lib/vmcluster/overlays`
`VMCLUSTER_ARTIFACT_REGISTRY`	`artifact_registry`	`/var/lib/vmcluster/artifacts/registry.json`
`VMCLUSTER_ARTIFACT_STORE_DIR`	`artifact_store_dir`	`/var/lib/vmcluster/artifacts/trees`
`VMCLUSTER_FAULT_REGISTRY`	`fault_registry`	`/var/lib/vmcluster/faults/registry.json`
`VMCLUSTER_SSH_KEY_PATH`	`ssh_key_path`	`/etc/vmcluster/ssh/vmcluster_id_ed25519`
`VMCLUSTER_SSH_USER`	`ssh_user`	`root`
`VMCLUSTER_LIBVIRT_URI`	`libvirt_uri`	`qemu:///system`
`VMCLUSTER_LOG_LEVEL`	`log_level`	`INFO`

Quick setup

# Create directories
sudo mkdir -p /etc/vmcluster/topologies /etc/vmcluster/ssh
sudo mkdir -p /var/lib/vmcluster/{overlays,artifacts/trees,faults}

# Generate SSH key for VM access
sudo ssh-keygen -t ed25519 -f /etc/vmcluster/ssh/vmcluster_id_ed25519 -N ""

Topology Files

Topology files are YAML files placed in topology_dir. The agent references topologies by filename (without .yaml).

# /etc/vmcluster/topologies/example-3node.yaml

cluster_name: example-3node
base_image: /var/lib/vmcluster/images/ubuntu-6.8-base.qcow2
overlay_dir: /var/lib/vmcluster/overlays/

network:
  name: clusternet-example
  bridge: virbr-example0
  subnet: 192.168.100.0/24

nodes:
  - name: controller
    role: control
    vcpus: 2
    memory_mb: 2048
    ip: 192.168.100.10
    extra_disks:
      - path: /var/lib/vmcluster/disks/data0.qcow2
        size_gb: 20
        bus: virtio

  - name: worker-0
    role: worker
    vcpus: 2
    memory_mb: 2048
    ip: 192.168.100.11

  - name: client-0
    role: client
    vcpus: 2
    memory_mb: 1024
    ip: 192.168.100.20

ssh:
  key_path: /etc/vmcluster/ssh/vmcluster_id_ed25519
  user: root
  connect_timeout_s: 30

snapshots:
  baseline: clean-boot   # Logical name for snapshot_revert("baseline")

Node IPs are configured statically via cloud-init — no DHCP is used. Each node gets a NoCloud ISO injected at first boot. The libvirt bridge named under network.bridge is created when cluster_define defines the topology network, so it does not need to pre-exist on the host.

Integration: VS Code (GitHub Copilot)

Add the server to your VS Code MCP configuration. Open Settings → MCP or edit .vscode/mcp.json in your workspace (or ~/.vscode/mcp.json globally).

If you installed from source into a virtualenv (recommended):

{
  "servers": {
    "vmcluster-mcp": {
      "type": "stdio",
      "command": "/path/to/vmcluster-mcp/.venv/bin/python",
      "args": ["-m", "vmcluster_mcp"],
      "env": {
        "VMCLUSTER_CONFIG": "/etc/vmcluster/config.yaml"
      }
    }
  }
}

If you prefer ephemeral launch with uv run --with:

{
  "servers": {
    "vmcluster-mcp": {
      "type": "stdio",
      "command": "uv",
      "args": [
        "run",
        "--with", "git+https://github.com/chompinbits/vmcluster-mcp.git",
        "python", "-m", "vmcluster_mcp"
      ],
      "env": {
        "VMCLUSTER_TOPOLOGY_DIR": "/etc/vmcluster/topologies",
        "VMCLUSTER_OVERLAY_DIR": "/var/lib/vmcluster/overlays",
        "VMCLUSTER_SSH_KEY_PATH": "/etc/vmcluster/ssh/vmcluster_id_ed25519",
        "VMCLUSTER_LIBVIRT_URI": "qemu:///system"
      }
    }
  }
}

After saving, restart the MCP server from the VS Code MCP panel. The tools will appear in Copilot Chat under the vmcluster-mcp server.

Integration: Claude CLI

`claude` (Anthropic Claude CLI / Claude Desktop)

Add to ~/.claude/claude_desktop_config.json (Claude Desktop) or ~/.config/claude/config.json (Claude CLI):

{
  "mcpServers": {
    "vmcluster-mcp": {
      "command": "python",
      "args": ["-m", "vmcluster_mcp"],
      "env": {
        "VMCLUSTER_CONFIG": "/etc/vmcluster/config.yaml"
      }
    }
  }
}

If using a virtualenv:

{
  "mcpServers": {
    "vmcluster-mcp": {
      "command": "/path/to/vmcluster-mcp/.venv/bin/python",
      "args": ["-m", "vmcluster_mcp"],
      "env": {
        "VMCLUSTER_CONFIG": "/etc/vmcluster/config.yaml"
      }
    }
  }
}

For the claude CLI (interactive terminal), you can also pass it inline:

claude --mcp-server "vmcluster-mcp:python -m vmcluster_mcp"

Or register it persistently:

claude mcp add vmcluster-mcp -- python -m vmcluster_mcp

Verify the server is loaded:

claude mcp list

Available Tools

All tools return ToolResult[T] — a structured JSON object with success: bool, result: T | null, and error: { code, message, recoverable } | null.

Cluster Lifecycle and Recovery

Tool	Description
`cluster_define(topology_name)`	Provision a cluster from a topology file: create network, per-node overlay disks, cloud-init ISOs, and libvirt domain definitions. Idempotent.
`cluster_start(cluster_name, wait_for_ssh, ssh_timeout_s)`	Boot all stopped nodes. Optionally waits for SSH on all nodes (strict: one failure = `success=False`).
`cluster_stop(cluster_name, mode)`	Stop all running nodes. `mode="shutdown"` (ACPI) or `mode="destroy"` (force-off).
`cluster_destroy(cluster_name, remove_overlays)`	Undefine all domains, destroy the network. Optionally delete overlay disk files.
`cluster_status(cluster_name)`	Return per-node domain state and SSH reachability. SSH is checked in parallel only for running nodes.
`cluster_handle(cluster_name)`	Return a `ClusterHandle` with node SSH descriptors, `artifact_path`, and `kernel_version` (fetched via SSH). Requires running cluster.
`node_crash(cluster_name, node, restart_after, wait_for_ssh, ssh_timeout_s)`	Simulate an unclean node failure (`virsh destroy`) and optionally restart/wait for SSH.

Remote Command Execution

Tool	Description
`node_exec(cluster_name, node_name, command, timeout_s)`	Run a command on one node and return structured stdout/stderr/exit metadata.
`node_exec_all(cluster_name, command, nodes, require_all, timeout_s)`	Run a command on many nodes in parallel with per-node results and failure map.

Snapshot Management

Tool	Description
`snapshot_create(cluster_name, snapshot_name, include_memory)`	Create disk snapshots for all nodes in the cluster.
`snapshot_list(cluster_name)`	List snapshots with per-node disk metadata.
`snapshot_revert(cluster_name, snapshot_name, restart_after, wait_for_ssh, ssh_timeout_s)`	Revert all nodes to a named snapshot and optionally restart/verify SSH.
`snapshot_delete(cluster_name, snapshot_name)`	Delete a named snapshot across all nodes (best effort with per-node status).

Artifact Management

Tool	Description
`artifact_register(source_path, build_type, kernel_version, metadata)`	Register a local build tree and get a content-addressed artifact id.
`artifact_list()`	List registered artifacts.
`artifact_diff(artifact_id_a, artifact_id_b)`	Diff modules/binaries between two artifacts.
`artifact_sync(cluster_name, artifact_id, nodes, force, dest_base)`	Sync artifact content to target nodes over SSH/rsync.
`artifact_install(cluster_name, artifact_id, nodes, install_mode, dest_base)`	Install synced artifacts on nodes with structured per-node install status.

Network Fault Injection

Tool	Description
`net_partition(cluster_name, partition_a, partition_b)`	Insert symmetric iptables partition rules between node groups.
`net_impair(cluster_name, source_node, target_node, latency_ms, jitter_ms, loss_pct, corrupt_pct, reorder_pct)`	Apply tc netem impairment on a source node tap interface.
`net_heal(cluster_name, fault_handle)`	Remove a specific fault and deregister its handle.
`net_heal_all(cluster_name)`	Remove all active faults for a cluster.
`net_fault_list(cluster_name)`	List all active fault handles and parameters from fault registry.

Kernel Observability

Tool	Description
`dmesg_mark(cluster_name, nodes)`	Write a shared marker into `/dev/kmsg` on target nodes.
`dmesg_collect(cluster_name, nodes, since_marker, filter_level)`	Collect and classify dmesg lines (`all`, `warn+`, `err+`).

Return types

ClusterStatus — returned by cluster_define, cluster_start, cluster_stop, cluster_destroy, cluster_status:

{
  "cluster_name": "example-3node",
  "network_active": true,
  "nodes": [
    {
      "name": "controller",
      "role": "control",
      "ip": "192.168.100.10",
      "domain_state": "running",
      "ssh_reachable": true
    }
  ]
}

ClusterHandle — returned by cluster_handle:

{
  "cluster_name": "example-3node",
  "artifact_path": "/opt/vmcluster/artifacts",
  "kernel_version": "6.8.0-51-generic",
  "nodes": [
    {
      "name": "controller",
      "role": "control",
      "ip": "192.168.100.10",
      "ssh_port": 22,
      "ssh_user": "root",
      "ssh_key_path": "/etc/vmcluster/ssh/vmcluster_id_ed25519"
    }
  ]
}

Most non-lifecycle tools follow the same envelope with their own typed result payload (for example ExecResult, SnapshotInfo, NetFaultInfo, SyncStatus).

Canonical Agent Workflow

# 1. Define the cluster (idempotent — safe to call multiple times)
cluster_define("example-3node")

# 2. Start all nodes and wait for SSH
cluster_start("example-3node", wait_for_ssh=True)

# 3. Get cluster handle for downstream SSH use
handle = cluster_handle("example-3node")

# 4. Check status at any time
cluster_status("example-3node")

# 5. Graceful shutdown
cluster_stop("example-3node", mode="shutdown")

# 6. Full teardown (remove overlays too)
cluster_destroy("example-3node", remove_overlays=True)

Extended workflow (artifacts + faults + observability, pseudo-notation)

The flow below shows the intended sequence of tool calls.

# Register and deploy build artifacts
artifact_id = artifact_register("/path/to/build/tree").result.artifact_id
artifact_sync("example-3node", artifact_id)
artifact_install("example-3node", artifact_id)

# Add a network impairment and inspect active faults
fault = net_impair("example-3node", source_node="worker-0", latency_ms=150)
net_fault_list("example-3node")

# Mark and collect dmesg around your test window
markers = dmesg_mark("example-3node")
dmesg_collect("example-3node", since_marker=markers["worker-0"], filter_level="warn+")

# Heal injected faults
net_heal("example-3node", fault.result.handle_id)

Troubleshooting

`cluster_define` fails creating overlays

Ensure base image path in topology exists and is readable.
Validate host tool availability: qemu-img --version.
Confirm overlay directory is writable by the user running the MCP server.

SSH timeouts in `cluster_start` or `snapshot_revert`

Confirm cloud-init configured the static IPs expected by the topology.
Verify key/user pair: VMCLUSTER_SSH_KEY_PATH, VMCLUSTER_SSH_USER.
Increase ssh_timeout_s for cold boots.

Fault tools fail (`iptables`/`tc` errors)

Ensure the MCP process has required privileges for host networking commands.
Confirm iptables and tc are installed and executable.
Validate libvirt bridge name in topology matches the active host interface.

`artifact_sync` or `artifact_install` partial failures

Use node_exec_all(..., command="df -h") to verify remote disk space.
Verify SSH connectivity and remote path permissions under dest_base.
Re-run with narrowed nodes=[...] to isolate problematic hosts.

Snapshot delete blocked

snapshot_delete refuses to remove active backing snapshots by design.
Revert or switch active disk chain first, then delete snapshot.

Useful host checks

virsh list --all
virsh net-list --all
ip -br link
sudo iptables -S | head
sudo tc qdisc show

Development

# Clone and install with dev dependencies
git clone https://github.com/chompinbits/vmcluster-mcp.git
cd vmcluster-mcp
uv venv && uv pip install -e '.[dev]'

# Run tests
.venv/bin/pytest

# Lint
.venv/bin/ruff check vmcluster_mcp/

# Run the server directly (stdio mode)
.venv/bin/python -m vmcluster_mcp

Project structure

vmcluster_mcp/
  cluster/          # Cluster lifecycle tools (define, start, stop, destroy, status, handle, crash)
    libvirt_client.py   # Thread-safe async libvirt wrapper
    domain_builder.py   # KVM domain XML generation
    network_builder.py  # libvirt NAT network XML generation
    cloud_init.py       # cloud-init NoCloud ISO generation
  exec/             # Remote command execution tools (node_exec, node_exec_all)
    ssh.py          # SSH client and connection pool management
  snapshot/         # Snapshot tools (create, list, revert, delete)
    manager.py      # Snapshot operations
  artifact/         # Artifact tools (register, list, diff, sync, install)
    installer.py    # Remote artifact installation
    registry.py     # Content-addressed artifact registry
    syncer.py       # rsync-based artifact synchronization
  net/              # Network fault tools (partition, impair, heal, list)
    fault_registry.py   # Persistent fault registry
    fault.py        # iptables/tc fault implementation
  observe/          # Kernel observability tools (dmesg_mark, dmesg_collect)
    classifier.py   # dmesg line classification
    dmesg.py        # dmesg collection and parsing
  topology/         # Topology YAML parsing and schema
    parser.py       # Topology loader
    schema.py       # Topology models
  models.py         # Shared Pydantic models (ToolResult, ClusterStatus, ClusterHandle, …)
  config.py         # Configuration loading (YAML + env vars)
  server.py         # FastMCP server instance and structured_tool_handler

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Mar 27, 2026

1.0.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vmcluster_mcp-1.1.0.tar.gz (429.1 kB view details)

Uploaded Mar 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vmcluster_mcp-1.1.0-py3-none-any.whl (66.9 kB view details)

Uploaded Mar 27, 2026 Python 3

File details

Details for the file vmcluster_mcp-1.1.0.tar.gz.

File metadata

Download URL: vmcluster_mcp-1.1.0.tar.gz
Upload date: Mar 27, 2026
Size: 429.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vmcluster_mcp-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1d14a174889be8383377cd4dbeddc5e358b5d67d618b53c7cc1d0ba07aa207bd`
MD5	`dec7eabb26cb166c537ba8f98e00a2de`
BLAKE2b-256	`24ce8d01ce535e518e19f666c6152180bf4560205ddfadaa633de778a814a062`

See more details on using hashes here.

File details

Details for the file vmcluster_mcp-1.1.0-py3-none-any.whl.

File metadata

Download URL: vmcluster_mcp-1.1.0-py3-none-any.whl
Upload date: Mar 27, 2026
Size: 66.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vmcluster_mcp-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e00d7af5bdf782254f03040ef8c5ac20ecb69fffc9e2a9c3d0e02e041499bb9a`
MD5	`3fa927f8ae93afe6b8df8a719d3a3859`
BLAKE2b-256	`52253462f2c0d1f104219175daece8dbd006335b12cbb9e6599e356659f9b77a`

See more details on using hashes here.

vmcluster-mcp 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vmcluster-mcp

Table of Contents

Overview

Prerequisites

Installation

From PyPI (recommended)

From source (development)

Quick Start (First 15 Minutes)

Configuration

Config file

Environment variables

Quick setup

Topology Files

Integration: VS Code (GitHub Copilot)

Integration: Claude CLI

claude (Anthropic Claude CLI / Claude Desktop)

Available Tools

Cluster Lifecycle and Recovery

Remote Command Execution

Snapshot Management

Artifact Management

Network Fault Injection

Kernel Observability

Return types

Canonical Agent Workflow

Extended workflow (artifacts + faults + observability, pseudo-notation)

Troubleshooting

cluster_define fails creating overlays

SSH timeouts in cluster_start or snapshot_revert

Fault tools fail (iptables/tc errors)

artifact_sync or artifact_install partial failures

Snapshot delete blocked

Useful host checks

Development

Project structure

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`claude` (Anthropic Claude CLI / Claude Desktop)

`cluster_define` fails creating overlays

SSH timeouts in `cluster_start` or `snapshot_revert`

Fault tools fail (`iptables`/`tc` errors)

`artifact_sync` or `artifact_install` partial failures