Skip to main content

MCP server for Linux desktop GUI automation on KDE Plasma 6 Wayland via isolated KWin virtual sessions

Project description

kwin-mcp

Model Context Protocol server for Linux desktop GUI automation on KDE Plasma 6 Wayland

PyPI version Downloads Python 3.12+ License: MIT CI

A Model Context Protocol (MCP) server that enables AI agents (Claude Code, Cursor, and other MCP clients) to launch, interact with, and observe any Wayland application in a fully isolated virtual KWin session -- without affecting the user's desktop. With 29 MCP tools covering mouse, keyboard, touch, clipboard, accessibility tree inspection, screenshot capture, and window management, kwin-mcp provides everything needed for end-to-end GUI testing and desktop automation on Linux.

Table of Contents

Why kwin-mcp?

  • Isolated sessions -- Each session runs in its own dbus-run-session + kwin_wayland --virtual sandbox. Your host desktop is never affected.
  • No screenshots required for interaction -- The AT-SPI2 accessibility tree gives the AI agent structured widget data (roles, names, coordinates, states, available actions), so it can interact with UI elements without relying solely on vision.
  • Zero authorization prompts -- Uses KWin's private EIS (Emulated Input Server) D-Bus interface directly, bypassing the XDG RemoteDesktop portal. No user confirmation dialogs.
  • Works with any Wayland app -- Anything that runs on KDE Plasma 6 Wayland works: Qt, GTK, Electron, and more. Input is injected via the standard libei protocol.
  • Full input coverage -- Mouse, keyboard, multi-touch, and clipboard -- all injected through the isolated session for complete desktop automation.

Use Cases

Automated GUI Testing

Run end-to-end GUI tests for KDE/Qt/GTK applications in headless isolated sessions. kwin-mcp launches each app in its own virtual KWin compositor, interacts via mouse, keyboard, and touch input, then verifies results through screenshots and the accessibility tree -- all without a physical display.

AI-Driven Desktop Automation

Let AI agents like Claude Code autonomously operate desktop applications. The agent reads the accessibility tree to understand the UI, performs actions through 29 MCP tools, and observes the results via screenshots -- creating a complete feedback loop for any Wayland application.

Headless GUI Testing in CI/CD

Integrate Linux desktop GUI testing into CI/CD pipelines. kwin-mcp's virtual sessions require no X11 or physical display server, making it suitable for headless environments like GitHub Actions or GitLab CI runners on Linux.

Quick Start

Requires KDE Plasma 6 on Wayland. See System Requirements for details.

1. Install

# Using uv (recommended)
uv tool install kwin-mcp

# Or using pip
pip install kwin-mcp

2. Configure Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "uvx",
      "args": ["kwin-mcp"]
    }
  }
}

3. Use it

Ask Claude Code to launch and interact with any GUI application:

Start a KWin session, launch kcalc, and press the buttons to calculate 2 + 3.

Claude Code will autonomously start an isolated session, launch the app, read the accessibility tree to find buttons, click them, and take a screenshot to verify the result.

Configuration

Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "uvx",
      "args": ["kwin-mcp"]
    }
  }
}

Or if installed globally:

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "kwin-mcp"
    }
  }
}

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "uvx",
      "args": ["kwin-mcp"]
    }
  }
}

Running Directly

# As an installed script
kwin-mcp

# As a Python module
python -m kwin_mcp

# Interactive CLI (REPL for rapid testing)
kwin-mcp-cli

Available Tools

Session Management (2 tools)

Tool Parameters Description
session_start app_command? str, screen_width? int (1920), screen_height? int (1080), enable_clipboard? bool (false), env? dict Start an isolated KWin Wayland session, optionally launching an app. Set enable_clipboard=true to enable clipboard tools (requires wl-clipboard). Pass extra environment variables via env.
session_stop (none) Stop the session and clean up all processes

Observation (3 tools)

Tool Parameters Description
screenshot include_cursor? bool (false) Capture a screenshot of the virtual display (saved as PNG, returns file path)
accessibility_tree app_name? str, max_depth? int (15) Get the AT-SPI2 widget tree with roles, names, states, and coordinates
find_ui_elements query str, app_name? str Search for UI elements by name, role, or description (case-insensitive)

Mouse Input (6 tools)

Tool Parameters Description
mouse_click x int, y int, button? str ("left"), double? bool, triple? bool, modifiers? list[str], hold_ms? int (0), screenshot_after_ms? list[int] Click at coordinates. Supports left/right/middle, single/double/triple click, modifier keys (e.g. ["ctrl", "shift"]), and long-press via hold_ms.
mouse_move x int, y int, screenshot_after_ms? list[int] Move the cursor (hover) to coordinates without clicking
mouse_scroll x int, y int, delta int, horizontal? bool, discrete? bool, steps? int (1) Scroll at coordinates. delta positive = down/right, negative = up/left. Use discrete=true for wheel ticks, steps to split into smooth increments.
mouse_drag from_x int, from_y int, to_x int, to_y int, button? str ("left"), modifiers? list[str], waypoints? list[[x,y,dwell_ms]], screenshot_after_ms? list[int] Drag from one point to another with smooth interpolation. Supports custom waypoints for complex drag paths.
mouse_button_down x int, y int, button? str ("left") Press a mouse button at coordinates without releasing. Use with mouse_button_up for manual drag control.
mouse_button_up x int, y int, button? str ("left") Release a previously pressed mouse button at coordinates

Keyboard Input (5 tools)

Tool Parameters Description
keyboard_type text str, screenshot_after_ms? list[int] Type a string of text character by character (US QWERTY layout)
keyboard_type_unicode text str, screenshot_after_ms? list[int] Type arbitrary Unicode text (Korean, CJK, etc.) via wtype or clipboard fallback (wl-copy + Ctrl+V). Requires wtype or wl-clipboard installed.
keyboard_key key str, screenshot_after_ms? list[int] Press a key or key combination (e.g., Return, ctrl+c, alt+F4, shift+Tab)
keyboard_key_down key str Press and hold a key without releasing. Useful for holding modifiers across multiple actions (e.g., hold Ctrl while clicking items).
keyboard_key_up key str Release a previously held key

Touch Input (4 tools)

Tool Parameters Description
touch_tap x int, y int, hold_ms? int (0), screenshot_after_ms? list[int] Tap at coordinates. Use hold_ms for long-press gestures.
touch_swipe from_x int, from_y int, to_x int, to_y int, duration_ms? int (300), screenshot_after_ms? list[int] Swipe from one point to another with configurable duration
touch_pinch center_x int, center_y int, start_distance int, end_distance int, duration_ms? int (500), screenshot_after_ms? list[int] Two-finger pinch gesture. end_distance < start_distance = pinch in, end_distance > start_distance = pinch out.
touch_multi_swipe from_x int, from_y int, to_x int, to_y int, fingers? int (3), duration_ms? int (300), screenshot_after_ms? list[int] Multi-finger swipe gesture (2-5 fingers) for system gestures like workspace switching

Clipboard (2 tools)

Tool Parameters Description
clipboard_get (none) Read the current clipboard text content. Requires enable_clipboard=true in session_start and wl-clipboard installed.
clipboard_set text str Set the clipboard text content. Same requirements as clipboard_get.

Window Management (3 tools)

Tool Parameters Description
launch_app command str, env? dict Launch an application inside the running session. Returns PID and log path.
list_windows (none) List all accessible application windows in the session via AT-SPI2
focus_window app_name str Focus a window by application name (case-insensitive match)

UI Polling (1 tool)

Tool Parameters Description
wait_for_element query str, app_name? str, timeout_ms? int (5000), poll_interval_ms? int (200) Poll the accessibility tree until an element matching the query appears or timeout expires. Useful for waiting on loading states and async UI updates.

Advanced (3 tools)

Tool Parameters Description
dbus_call service str, path str, interface str, method str, args? list[str] Call any D-Bus method in the isolated session. Useful for controlling KWin scripting, app-specific D-Bus APIs, and system services.
read_app_log pid int, last_n_lines? int (50) Read stdout/stderr output of a launched app by PID. Set last_n_lines=0 for all output.
wayland_info filter_protocol? str List Wayland protocols available in the session. Useful for verifying protocol access (e.g., plasma_window_management).

Frame capture: Many action tools accept an optional screenshot_after_ms parameter (e.g., [0, 50, 100, 200, 500]) that captures screenshots at specified delays (in milliseconds) after the action completes. This is useful for observing transient UI states like hover effects, click animations, and menu transitions without extra MCP round-trips. Frame capture uses the fast KWin ScreenShot2 D-Bus interface (~30-70ms per frame).

How It Works

Claude Code / AI Agent
  |
  |  MCP (stdio)
  v
kwin-mcp server  (29 tools)       kwin-mcp-cli (interactive REPL)
  |                                  |
  +--- both delegate to AutomationEngine (core.py) ---+
  |
  |-- session_start / stop -----> dbus-run-session
  |                                 |-- at-spi-bus-launcher
  |                                 +-- kwin_wayland --virtual
  |                                       +-- [your app]
  |
  |-- screenshot ---------------> spectacle (via D-Bus)
  |
  |-- accessibility_tree -------> AT-SPI2 (via PyGObject)
  |-- find_ui_elements ---------> AT-SPI2 (via PyGObject)
  |-- wait_for_element ----------> AT-SPI2 (polling)
  |
  |-- mouse_* ------------------> KWin EIS D-Bus --> libei
  |-- keyboard_* ---------------> KWin EIS D-Bus --> libei
  |-- touch_* ------------------> KWin EIS D-Bus --> libei
  |    +-- screenshot_after_ms -> KWin ScreenShot2 D-Bus (fast frame capture)
  |
  |-- keyboard_type_unicode ----> wtype / wl-copy + Ctrl+V
  |-- clipboard_* --------------> wl-copy / wl-paste (wl-clipboard)
  |
  |-- launch_app / list_windows / focus_window
  |                                |-- subprocess spawn
  |                                +-- AT-SPI2 (via PyGObject)
  |
  |-- dbus_call -----------------> dbus-send (generic D-Bus)
  |-- read_app_log --------------> log file read
  +-- wayland_info --------------> wayland-info

Triple Isolation

kwin-mcp provides three layers of isolation from the host desktop:

  1. D-Bus isolation -- dbus-run-session creates a private session bus. The isolated session's services (KWin, AT-SPI2, portals) are invisible to the host.
  2. Display isolation -- kwin_wayland --virtual creates its own Wayland compositor with a virtual framebuffer. No windows appear on the host display.
  3. Input isolation -- Input events are injected through KWin's EIS interface into the isolated compositor only. The host desktop receives no input from kwin-mcp.

Input Injection

Mouse, keyboard, and touch events are injected through KWin's private org.kde.KWin.EIS.RemoteDesktop D-Bus interface. This returns a libei file descriptor that allows low-level input emulation without requiring the XDG RemoteDesktop portal (which would show a user authorization dialog). The connection uses:

  • Absolute pointer positioning for precise coordinate-based interaction
  • evdev keycodes with full US QWERTY mapping for keyboard input
  • Smooth drag interpolation (10+ intermediate steps) for realistic drag operations
  • EIS touch emulation for multi-touch gestures (tap, swipe, pinch, multi-finger swipe)

Screenshot Capture

The screenshot tool uses spectacle CLI for reliable full-screen capture. For action tools with the screenshot_after_ms parameter, screenshots are captured directly via the KWin org.kde.KWin.ScreenShot2 D-Bus interface, which is much faster (~30-70ms vs ~200-300ms per frame) because it avoids process spawn overhead. Raw ARGB pixel data is read from a pipe and converted to PNG using Pillow.

Accessibility Tree

The AT-SPI2 accessibility bus within the isolated session is queried via PyGObject (gi.repository.Atspi). This provides a structured tree of all UI widgets with their roles (button, text field, menu item, etc.), names, states (focused, enabled, visible, etc.), screen coordinates, and available actions (click, toggle, etc.).

System Requirements

Requirement Details
OS Linux with KDE Plasma 6 (Wayland session)
Python 3.12 or later
KWin kwin_wayland with --virtual flag support (KDE Plasma 6.x)
libei Usually bundled with KWin 6.x (EIS input emulation)
spectacle KDE screenshot tool (CLI mode)
AT-SPI2 at-spi2-core for accessibility tree support
PyGObject GObject introspection Python bindings
D-Bus dbus-python bindings

Optional dependencies:

Package Required for
wl-clipboard (wl-copy, wl-paste) clipboard_get, clipboard_set, and keyboard_type_unicode clipboard fallback
wtype keyboard_type_unicode (preferred over clipboard fallback)
wayland-utils (wayland-info) wayland_info tool

Installing System Dependencies

Arch Linux / Manjaro
sudo pacman -S kwin spectacle at-spi2-core python-gobject dbus-python-common

# Optional: for clipboard and Unicode input
sudo pacman -S wl-clipboard wtype wayland-utils
Fedora (KDE Spin)
sudo dnf install kwin-wayland spectacle at-spi2-core python3-gobject dbus-python

# Optional: for clipboard and Unicode input
sudo dnf install wl-clipboard wtype wayland-utils
openSUSE (KDE)
sudo zypper install kwin6 spectacle at-spi2-core python3-gobject python3-dbus-python

# Optional: for clipboard and Unicode input
sudo zypper install wl-clipboard wtype wayland-utils
Kubuntu / KDE Neon
sudo apt install kwin-wayland spectacle at-spi2-core python3-gi gir1.2-atspi-2.0 python3-dbus

# Optional: for clipboard and Unicode input
sudo apt install wl-clipboard wtype wayland-utils

Installation

Using uv (recommended)

uv tool install kwin-mcp

Using pip

pip install kwin-mcp

From source

git clone https://github.com/isac322/kwin-mcp.git
cd kwin-mcp
uv sync
uv run kwin-mcp

Limitations

  • US QWERTY keyboard layout only -- keyboard_type supports US QWERTY only. For non-ASCII text (Korean, CJK, etc.), use keyboard_type_unicode, which requires wtype or wl-clipboard installed.
  • KDE Plasma 6+ required -- Older KDE versions or other Wayland compositors (GNOME, Sway) are not supported.
  • AT-SPI2 availability varies -- Some applications may not fully expose their widget tree via AT-SPI2.
  • Touch input is EIS-emulated -- Touch events are emulated through KWin's EIS interface, not from a real touchscreen device. Most applications handle emulated touch correctly, but some may behave differently from physical touch.
  • Clipboard requires opt-in -- Clipboard tools (clipboard_get, clipboard_set) are disabled by default because wl-copy can hang in isolated sessions. Enable with enable_clipboard=true in session_start, and ensure wl-clipboard is installed.

Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

git clone https://github.com/isac322/kwin-mcp.git
cd kwin-mcp
uv sync
uv run ruff check src/
uv run ruff format --check src/
uv run ty check src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kwin_mcp-0.5.0.tar.gz (36.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kwin_mcp-0.5.0-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file kwin_mcp-0.5.0.tar.gz.

File metadata

  • Download URL: kwin_mcp-0.5.0.tar.gz
  • Upload date:
  • Size: 36.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kwin_mcp-0.5.0.tar.gz
Algorithm Hash digest
SHA256 dcc78b17facae5051d0a426e41d760f3f6ac8b6533e39a1896e2489a808b2c86
MD5 ecfc068918d6bf320224f97bae00100b
BLAKE2b-256 915dd1f30380dffa0d0336d575e13bf88e168316f97e75273d1afe1dad80d078

See more details on using hashes here.

Provenance

The following attestation bundles were made for kwin_mcp-0.5.0.tar.gz:

Publisher: ci.yml on isac322/kwin-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kwin_mcp-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: kwin_mcp-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kwin_mcp-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2abd23f8cbadece54a19ec2e83b60f0f795df41ab526dc47e3ad2dfbd53129f
MD5 9fef645a500dbf648f2b7b54cc3c14c2
BLAKE2b-256 a04346894946eb322d234edbb1f3d97d0c20d844ae430399186d4ce9e7cfa2d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for kwin_mcp-0.5.0-py3-none-any.whl:

Publisher: ci.yml on isac322/kwin-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page