Skip to main content

MCP server for Linux desktop GUI automation on KDE Plasma 6 Wayland via isolated KWin virtual sessions

Project description

kwin-mcp

PyPI version Python 3.12+ License: MIT CI

A Model Context Protocol (MCP) server for Linux desktop GUI automation on KDE Plasma 6 Wayland. It lets AI agents like Claude Code launch, interact with, and observe any Wayland application (Qt, GTK, Electron) in a fully isolated virtual KWin session — without touching the user's desktop.

Why kwin-mcp?

  • Isolated sessions — Each session runs in its own dbus-run-session + kwin_wayland --virtual sandbox. Your host desktop is never affected.
  • No screenshots required for interaction — The AT-SPI2 accessibility tree gives the AI agent structured widget data (roles, names, coordinates, states, available actions), so it can interact with UI elements without relying solely on vision.
  • Zero authorization prompts — Uses KWin's private EIS (Emulated Input Server) D-Bus interface directly, bypassing the XDG RemoteDesktop portal. No user confirmation dialogs.
  • Works with any Wayland app — Anything that runs on KDE Plasma 6 Wayland works: Qt, GTK, Electron, and more. Input is injected via the standard libei protocol.

Quick Start

Requires KDE Plasma 6 on Wayland. See System Requirements for details.

1. Install

# Using uv (recommended)
uv tool install kwin-mcp

# Or using pip
pip install kwin-mcp

2. Configure Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "uvx",
      "args": ["kwin-mcp"]
    }
  }
}

3. Use it

Ask Claude Code to launch and interact with any GUI application:

Start a KWin session, launch kcalc, and press the buttons to calculate 2 + 3.

Claude Code will autonomously start an isolated session, launch the app, read the accessibility tree to find buttons, click them, and take a screenshot to verify the result.

Features

  • Session management — Start and stop isolated KWin Wayland sessions with configurable screen resolution
  • Screenshot capture — Capture the virtual display as PNG via KWin's ScreenShot2 D-Bus interface
  • Accessibility tree — Read the full AT-SPI2 widget tree with roles, names, states, coordinates, and available actions
  • Element search — Find UI elements by name, role, or description (case-insensitive)
  • Mouse input — Click (left/right/middle, single/double), move, scroll (vertical/horizontal), and drag with smooth interpolation
  • Keyboard input — Type text (full US QWERTY layout) and press key combinations with modifier support (Ctrl, Alt, Shift, Super)

System Requirements

Requirement Details
OS Linux with KDE Plasma 6 (Wayland session)
Python 3.12 or later
KWin kwin_wayland with --virtual flag support (KDE Plasma 6.x)
libei Usually bundled with KWin 6.x (EIS input emulation)
spectacle KDE screenshot tool (CLI mode)
AT-SPI2 at-spi2-core for accessibility tree support
PyGObject GObject introspection Python bindings
D-Bus dbus-python bindings

Installing System Dependencies

Arch Linux / Manjaro
sudo pacman -S kwin spectacle at-spi2-core python-gobject dbus-python-common
Fedora (KDE Spin)
sudo dnf install kwin-wayland spectacle at-spi2-core python3-gobject dbus-python
openSUSE (KDE)
sudo zypper install kwin6 spectacle at-spi2-core python3-gobject python3-dbus-python
Kubuntu / KDE Neon
sudo apt install kwin-wayland spectacle at-spi2-core python3-gi gir1.2-atspi-2.0 python3-dbus

Installation

Using uv (recommended)

uv tool install kwin-mcp

Using pip

pip install kwin-mcp

From source

git clone https://github.com/isac322/kwin-mcp.git
cd kwin-mcp
uv sync
uv run kwin-mcp

Configuration

Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "uvx",
      "args": ["kwin-mcp"]
    }
  }
}

Or if installed globally:

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "kwin-mcp"
    }
  }
}

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "kwin-mcp": {
      "command": "uvx",
      "args": ["kwin-mcp"]
    }
  }
}

Running Directly

# As an installed script
kwin-mcp

# As a Python module
python -m kwin_mcp

Available Tools

Session Management

Tool Parameters Description
session_start app_command?, screen_width?, screen_height? Start an isolated KWin Wayland session, optionally launching an app
session_stop (none) Stop the session and clean up all processes

Observation

Tool Parameters Description
screenshot include_cursor? Capture a screenshot of the virtual display (saved as PNG, returns file path)
accessibility_tree app_name?, max_depth? Get the AT-SPI2 widget tree with roles, names, states, and coordinates
find_ui_elements query, app_name? Search for UI elements by name, role, or description (case-insensitive)

Mouse Input

Tool Parameters Description
mouse_click x, y, button?, double?, screenshot_after_ms? Click at coordinates (left/right/middle, single/double)
mouse_move x, y, screenshot_after_ms? Move the cursor to coordinates without clicking
mouse_scroll x, y, delta, horizontal? Scroll at coordinates (positive = down/right, negative = up/left)
mouse_drag from_x, from_y, to_x, to_y, screenshot_after_ms? Drag from one point to another with smooth interpolation

Keyboard Input

Tool Parameters Description
keyboard_type text, screenshot_after_ms? Type a string of text character by character (US QWERTY layout)
keyboard_key key, screenshot_after_ms? Press a key or key combination (e.g., Return, ctrl+c, alt+F4, shift+Tab)

Frame capture: Action tools accept an optional screenshot_after_ms parameter (e.g., [0, 50, 100, 200, 500]) that captures screenshots at specified delays (in milliseconds) after the action completes. This is useful for observing transient UI states like hover effects, click animations, and menu transitions without extra MCP round-trips.

How It Works

Claude Code / AI Agent
  │
  │  MCP (stdio)
  ▼
kwin-mcp server
  │
  ├── session_start ─────────► dbus-run-session
  │                               ├── at-spi-bus-launcher
  │                               └── kwin_wayland --virtual
  │                                      └── [your app]
  │
  ├── screenshot ────────────► spectacle (via D-Bus)
  │
  ├── accessibility_tree ────► AT-SPI2 (via PyGObject)
  ├── find_ui_elements ──────► AT-SPI2 (via PyGObject)
  │
  ├── mouse_* / keyboard_* ─► KWin EIS D-Bus ──► libei
  │   └── screenshot_after_ms ► KWin ScreenShot2 D-Bus (fast frame capture)

Triple Isolation

kwin-mcp provides three layers of isolation from the host desktop:

  1. D-Bus isolationdbus-run-session creates a private session bus. The isolated session's services (KWin, AT-SPI2, portals) are invisible to the host.
  2. Display isolationkwin_wayland --virtual creates its own Wayland compositor with a virtual framebuffer. No windows appear on the host display.
  3. Input isolation — Input events are injected through KWin's EIS interface into the isolated compositor only. The host desktop receives no input from kwin-mcp.

Input Injection

Mouse and keyboard events are injected through KWin's private org.kde.KWin.EIS.RemoteDesktop D-Bus interface. This returns a libei file descriptor that allows low-level input emulation without requiring the XDG RemoteDesktop portal (which would show a user authorization dialog). The connection uses:

  • Absolute pointer positioning for precise coordinate-based interaction
  • evdev keycodes with full US QWERTY mapping for keyboard input
  • Smooth drag interpolation (10+ intermediate steps) for realistic drag operations

Screenshot Capture

The screenshot tool uses spectacle CLI for reliable full-screen capture. For action tools with the screenshot_after_ms parameter, screenshots are captured directly via the KWin org.kde.KWin.ScreenShot2 D-Bus interface, which is much faster (~30-70ms vs ~200-300ms per frame) because it avoids process spawn overhead. Raw ARGB pixel data is read from a pipe and converted to PNG using Pillow.

Accessibility Tree

The AT-SPI2 accessibility bus within the isolated session is queried via PyGObject (gi.repository.Atspi). This provides a structured tree of all UI widgets with their roles (button, text field, menu item, etc.), names, states (focused, enabled, visible, etc.), screen coordinates, and available actions (click, toggle, etc.).

Limitations

  • US QWERTY keyboard layout only — Other keyboard layouts are not yet supported for text typing.
  • KDE Plasma 6+ required — Older KDE versions or other Wayland compositors (GNOME, Sway) are not supported.
  • AT-SPI2 availability varies — Some applications may not fully expose their widget tree via AT-SPI2.

Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

git clone https://github.com/isac322/kwin-mcp.git
cd kwin-mcp
uv sync
uv run ruff check src/
uv run ruff format --check src/
uv run ty check src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kwin_mcp-0.4.0.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kwin_mcp-0.4.0-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file kwin_mcp-0.4.0.tar.gz.

File metadata

  • Download URL: kwin_mcp-0.4.0.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kwin_mcp-0.4.0.tar.gz
Algorithm Hash digest
SHA256 cf06e71576d433cbdc58516763e95250b36d7c0f9ebed4f1096d98f6c9f51d01
MD5 ff9ceeeb4a65e6539cb52b4fb1f33aa1
BLAKE2b-256 b75d1d432d722a4cdb1d79718dff1b9c5a19ab8620ca0612711378d72e8d9b77

See more details on using hashes here.

Provenance

The following attestation bundles were made for kwin_mcp-0.4.0.tar.gz:

Publisher: ci.yml on isac322/kwin-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kwin_mcp-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: kwin_mcp-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kwin_mcp-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c7af6973aed05ca41511596313d41616e8821a0df083ecd499f3ef61b47deb84
MD5 2cdb18425d9c929662020e7bf5df45bc
BLAKE2b-256 5e8658ab6972b705019dd591acb097b77f223e1c631eb51e1d73c9de63788df3

See more details on using hashes here.

Provenance

The following attestation bundles were made for kwin_mcp-0.4.0-py3-none-any.whl:

Publisher: ci.yml on isac322/kwin-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page