Skip to main content

AI-powered Selenium browser automation MCP server with test agents (Planner, Generator, Healer)

Project description

Selenium MCP Server

A Model Context Protocol (MCP) server that provides browser automation capabilities using Selenium WebDriver. This server offers 40+ comprehensive tools for web automation, testing, and interaction, plus AI-powered test agents for automated test planning, generation, and healing.

Features

  • 🤖 AI Test Agents - Planner, Generator, and Healer agents for automated testing workflows
  • 40+ Browser Automation Tools covering all aspects of web interaction
  • Pure MCP Protocol Implementation - compatible with MCP clients
  • Selenium WebDriver Backend with automatic driver management
  • Multi-Framework Code Generation - pytest, Robot Framework, unittest
  • Element-based Interactions using DOM snapshots and references
  • Verification & Assertions - Built-in tools for test validation
  • Self-Healing Tests - Automatically fix failing tests with the Healer agent
  • Cross-Platform Support - Works with any Selenium-compatible automation framework

Quick Start

📖 Installation Guides Available

Installation

Option 1: Install from PyPI (Python Users)

pip install ai-agent-selenium

Option 2: Install from npm (JavaScript/TypeScript Users)

npm install ai-agent-selenium
# Auto-installs Python package via post-install script

Option 2: Install from Source (Development)

# Clone the repository
git clone https://github.com/yourusername/selenium-mcp-server.git
cd selenium-mcp-server

# Install in development mode
pip install -e .

# Or install with Robot Framework support
pip install -e ".[robot]"

# Or install with all optional dependencies
pip install -e ".[all]"

Option 3: Install from GitHub (Latest)

pip install git+https://github.com/yourusername/selenium-mcp-server.git

Option 4: Install via npm (Node.js Projects)

For Node.js/JavaScript developers:

npm install selenium-mcp-server
# Then run with:
npx selenium-mcp

See the npm wrapper documentation for Node.js-specific usage.

Running the Server

Via command-line (after installation):

selenium-mcp

Or directly with Python:

python mcp_server.py

Or with uvx (no installation needed):

uvx selenium-mcp-server

Configure MCP Client

📖 Complete Setup Guide Available See MCP_CLIENT_SETUP.md for detailed configuration instructions for:

  • Claude Desktop
  • Cursor
  • Cline (VSCode)
  • Continue.dev
  • Other MCP clients

Quick Setup for Claude Desktop:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "selenium-mcp"
    }
  }
}

Quick Setup for Cursor:

Settings → MCP Servers → Add:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "selenium-mcp"
    }
  }
}

For npm installation, use:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["ai-agent-selenium"]
    }
  }
}

Or use with full Python path:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "python",
      "args": ["-m", "mcp_server"]
    }
  }
}

Available Tools

Navigation (3 tools)

  • navigate_to - Navigate to a URL
  • go_back - Go back to previous page
  • go_forward - Go forward to next page

Page Analysis (1 tool)

  • capture_page - Capture DOM snapshot for element analysis

Element Interactions (4 tools)

  • click_element - Click elements using page references
  • hover_element - Hover over elements
  • select_option - Select dropdown options
  • input_text - Type text into input fields

Mouse Operations (3 tools)

  • mouse_move_xy - Move mouse to coordinates
  • mouse_click_xy - Click at coordinates
  • mouse_drag_xy - Drag between coordinates

Keyboard Operations (1 tool)

  • press_key - Press keyboard keys (Enter, Tab, arrows, etc.)

Tab Management (4 tools)

  • list_tabs - List all browser tabs
  • switch_to_tab - Switch between tabs
  • open_new_tab - Open new tabs
  • close_tab - Close tabs

JavaScript & Advanced (8 tools)

  • evaluate_js - Execute JavaScript code
  • handle_dialog - Handle alerts/confirms/prompts
  • drag_and_drop - Drag elements between locations
  • upload_file - Upload files through input elements
  • get_console_messages - Monitor console logs
  • get_network_requests - Monitor network activity
  • save_as_pdf - Generate PDF of current page
  • wait_for_element - Wait for element conditions

Browser Management (3 tools)

  • take_screenshot - Capture page screenshots
  • set_window_size - Resize browser window
  • close_session - Close browser session

MCP Protocol

The server implements the MCP protocol via JSON-RPC over stdin/stdout:

{"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}}
{"jsonrpc": "2.0", "id": 2, "method": "tools/call", "params": {"name": "navigate_to", "arguments": {"url": "https://example.com"}}}

Architecture

  • Context Management - Handles browser sessions and page state
  • Tool-based Design - Each capability is a discrete, callable tool
  • Element References - Uses DOM snapshots with element references for interactions
  • Lazy Initialization - Browser starts only when needed
  • Robust Error Handling - Graceful fallbacks and error reporting

Selenium Test Agents

The Selenium MCP Server now includes AI-powered test agents modeled after Playwright's agent architecture. These specialized agents help you create test plans, generate test code, and fix failing tests automatically.

📖 Documentation Available

Available Agents

🟢 Planner Agent (selenium-test-planner)

Purpose: Explore web applications and create comprehensive test plans

What it does:

  • Navigates through your application to understand workflows
  • Identifies critical user journeys and edge cases
  • Creates detailed, step-by-step test plans in markdown format
  • Organizes scenarios with prerequisites, steps, and expected results
  • ⚠️ STOPS after saving the plan and waits for your review before code generation

Key Tools:

  • planner_setup_page - Initialize planning session
  • planner_save_plan - Save test plan to markdown file
  • All browser navigation and interaction tools

Usage Example:

Ask your AI: "Use the planner agent to create a test plan for the login feature at https://example.com"

The agent will:
1. Navigate and explore the login page
2. Identify test scenarios (valid login, invalid credentials, password reset, etc.)
3. Create a comprehensive test plan document
4. Save it to test-plans/login-feature.plan.md

🔵 Generator Agent (selenium-test-generator)

Purpose: Transform approved test plans into executable Selenium test code

What it does:

  • Reads approved test plans and understands requirements
  • Executes each test step in a real browser to validate interactions
  • Records all actions and generates reliable test code
  • Supports pytest, unittest, and Robot Framework
  • ⚠️ Requires an approved test plan from the Planner agent before starting

Key Tools:

  • generator_setup_page - Initialize test generation session
  • generator_read_log - Get recorded action history
  • generator_write_test - Save generated test code
  • Verification tools for assertions

Usage Example:

Ask your AI: "Use the generator agent to create pytest tests from the login test plan"

The agent will:
1. Read the test plan
2. Execute each step in real browser
3. Record successful interactions
4. Generate clean, maintainable pytest code
5. Save to tests/test_login.py

Generated Test Example:

import pytest
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

class TestLogin:
    @pytest.fixture
    def driver(self):
        driver = webdriver.Chrome()
        yield driver
        driver.quit()

    def test_valid_login(self, driver):
        """Test successful login with valid credentials"""
        driver.get("https://example.com/login")

        # Enter username
        username_field = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.ID, "username"))
        )
        username_field.send_keys("testuser")

        # Enter password
        driver.find_element(By.ID, "password").send_keys("password123")

        # Click login button
        driver.find_element(By.CSS_SELECTOR, "button[type='submit']").click()

        # Verify successful login
        assert WebDriverWait(driver, 10).until(
            EC.visibility_of_element_located((By.XPATH, "//div[@class='welcome']"))
        )

🔴 Healer Agent (selenium-test-healer)

Purpose: Debug and fix failing Selenium tests automatically

What it does:

  • Runs your test suite and identifies failures
  • Debugs each failing test with enhanced logging
  • Investigates root causes (selector issues, timing, data problems)
  • Applies fixes to make tests reliable
  • Re-runs tests to verify fixes work

Key Tools:

  • healer_run_tests - Execute test suite
  • healer_debug_test - Debug specific failing test
  • healer_fix_test - Apply fixes to test code
  • browser_generate_locator - Find better selectors

Common Fixes:

  • Updates brittle selectors (XPath → CSS → ID)
  • Adds proper wait conditions
  • Handles stale element references
  • Fixes timing and race conditions
  • Updates assertions for changed application behavior

Usage Example:

Ask your AI: "Use the healer agent to fix failing tests in tests/test_login.py"

The agent will:
1. Run all tests and collect failures
2. Debug each failure individually
3. Identify root causes (e.g., selector changed, timing issue)
4. Apply appropriate fixes
5. Verify tests now pass
6. Create backups of original files

Agent Workflow: End-to-End Test Automation

Complete Workflow Example

# Step 1: Create Test Plan
"Use the planner agent to create a comprehensive test plan for the e-commerce
checkout flow at https://myshop.com, covering product selection, cart management,
and payment processing"

→ Output: test-plans/checkout-flow.plan.md

# Step 2: Generate Tests
"Use the generator agent to create pytest tests from the checkout flow test plan"

→ Output: tests/test_checkout.py

# Step 3: Run and Fix
"Run the checkout tests. If any fail, use the healer agent to fix them"

→ Output: Fixed tests with proper waits, selectors, and assertions

Agent-Specific Tools

Planner Tools

  • planner_setup_page(url, feature) - Initialize planning session
  • planner_save_plan(plan_content, filename) - Save test plan

Generator Tools

  • generator_setup_page(url, test_plan) - Start test generation
  • generator_read_log() - Get recorded actions
  • generator_write_test(test_code, filename, framework) - Save test code

Healer Tools

  • healer_run_tests(test_path, framework) - Execute tests
  • healer_debug_test(test_name, test_path) - Debug specific test
  • healer_fix_test(test_path, fixed_code, fix_description) - Apply fix

Verification Tools (for all agents)

  • browser_verify_element_visible(element, ref) - Check element visibility
  • browser_verify_text_visible(text) - Verify text on page
  • browser_verify_value(element, ref, expected_value) - Check input values
  • browser_verify_list_visible(items) - Verify multiple items

How to Use Agents

With Claude Code or Compatible AI Clients

Simply describe what you want the agent to do:

# For Planning
"Create a test plan for the user registration feature"
"Plan tests for the shopping cart functionality at https://shop.example.com"

# For Test Generation
"Generate pytest tests from the registration test plan"
"Create Robot Framework tests for the login scenarios"

# For Healing/Fixing
"Fix the failing tests in tests/test_checkout.py"
"Debug why test_user_login is failing and fix it"

The AI will automatically:

  1. Select the appropriate agent
  2. Use the right tools in the correct sequence
  3. Provide you with the results

Agent Configuration Files

Agent definitions are located in /agents/:

  • selenium-test-planner.agent.md - Planner configuration
  • selenium-test-generator.agent.md - Generator configuration
  • selenium-test-healer.agent.md - Healer configuration

These files define:

  • Agent purpose and model
  • Available tools
  • Instructions and methodology
  • Best practices

Benefits of Using Agents

Faster Test Creation - Agents explore, plan, and generate tests automatically ✅ Higher Quality - Tests follow best practices with proper waits and selectors ✅ Self-Healing - Automatically fix failing tests instead of manual debugging ✅ Comprehensive Coverage - Planner identifies edge cases you might miss ✅ Maintainable Code - Generated code is clean, documented, and follows patterns ✅ Multi-Framework - Generate tests for pytest, unittest, or Robot Framework


Installation Guide - Selenium MCP Server

Automatic Installation (Recommended)

Prerequisites

  • Python 3.8 or newer
  • Chrome browser installed

Quick Install

  1. Download or clone the project:

    git clone <your-repo-url>
    cd selenium-mcp-server
    
  2. Run the automatic installer:

    python3 install.py
    

The installer will:

  • Create a virtual environment
  • Install all dependencies
  • Configure MCP settings for Claude Desktop
  • Create Cursor configuration file
  • Test the installation

That's it! Your MCP server is ready to use.

Manual Installation

If you prefer manual setup:

1. Clone and Setup

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
# macOS/Linux:
source venv/bin/activate
# Windows:
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Test Installation

python3 test_mcp_direct.py

You should see "All tests passed! MCP server is ready for use."

Client Configuration

After Automatic Installation

Claude Desktop: The installer automatically configures Claude Desktop. Just restart the app.

Cursor: Copy the configuration from the generated cursor_mcp_config.json file to your Cursor settings:

  1. Open Cursor Settings (Cmd+, or Ctrl+,)
  2. Search for "MCP" or "Model Context Protocol"
  3. Copy the content from cursor_mcp_config.json
  4. Restart Cursor completely

Manual Configuration

If you installed manually, configure your client:

Cursor

  1. Open Cursor Settings (Cmd+, or Ctrl+,)
  2. Search for "MCP" or "Model Context Protocol"
  3. Add this configuration (replace paths with your actual paths):
{
  "mcpServers": {
    "selenium-mcp": {
      "command": "/absolute/path/to/selenium-mcp-server/venv/bin/python",
      "args": ["mcp_server.py"],
      "cwd": "/absolute/path/to/selenium-mcp-server"
    }
  }
}

Claude Desktop

  1. Find your Claude Desktop config file:

    • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
    • Windows: %APPDATA%\Claude\claude_desktop_config.json
    • Linux: ~/.config/Claude/claude_desktop_config.json
  2. Add the server configuration (replace paths with your actual paths):

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "/absolute/path/to/selenium-mcp-server/venv/bin/python",
      "args": ["mcp_server.py"],
      "cwd": "/absolute/path/to/selenium-mcp-server"
    }
  }
}
  1. Restart Claude Desktop

VS Code

  1. Install the MCP extension from marketplace
  2. Open VS Code Settings (Cmd+, or Ctrl+,)
  3. Search for "MCP"
  4. Add to settings.json:
{
  "mcp.servers": {
    "selenium-mcp": {
      "command": "/path/to/selenium-mcp-server/venv/bin/python",
      "args": ["./mcp_server.py"],
      "cwd": "/path/to/selenium-mcp-server"
    }
  }
}
  1. Reload VS Code window

Important Notes

Using the Automatic Installer

  • Run python3 install.py from the project root directory
  • The installer handles all path configuration automatically
  • Works on macOS, Windows, and Linux
  • No need to manually edit paths or create virtual environments

Path Configuration (Manual Setup Only)

  • Always use absolute paths in the configuration
  • Windows users: use forward slashes or double backslashes in JSON
  • Point to the Python executable inside your virtual environment:
    • macOS/Linux: venv/bin/python
    • Windows: venv\Scripts\python.exe

Verification

After installation, you should see:

  • Green connection status in your MCP client
  • 27+ tools available
  • Tools like navigate_to, click_element, generate_script etc.

Troubleshooting

macOS Chrome Issues

If Chrome fails to start:

xattr -cr ~/.wdm/drivers/chromedriver/

Connection Issues

  1. Check logs: tail -f mcp_server.log
  2. Verify paths are absolute
  3. Ensure virtual environment is activated
  4. Test manually: python3 mcp_server.py

Reset Browser Session

If the automation browser gets stuck, use the reset_automation_session tool in your MCP client.

Next Steps

Once installed, try:

  1. "Navigate to example.com"
  2. "Start recording, go to google.com, search for 'selenium', stop recording, generate pytest script"
  3. "Reset automation session" (if browser gets stuck)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selenium_ai_agent-1.0.0.tar.gz (95.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

selenium_ai_agent-1.0.0-py3-none-any.whl (74.1 kB view details)

Uploaded Python 3

File details

Details for the file selenium_ai_agent-1.0.0.tar.gz.

File metadata

  • Download URL: selenium_ai_agent-1.0.0.tar.gz
  • Upload date:
  • Size: 95.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for selenium_ai_agent-1.0.0.tar.gz
Algorithm Hash digest
SHA256 04b289b5335faec88396961c2c51763f75cb5a8fc4fb3714c180734d2f5dd42b
MD5 7b2af11adf9a85a3672500ccce346d19
BLAKE2b-256 0071a19a30efbbd5748121fc4fc308fe385f53178549c63092a78539fd5bdb91

See more details on using hashes here.

File details

Details for the file selenium_ai_agent-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for selenium_ai_agent-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ec16193042e6b920aaf0bc80a3c5a5e43b68c57a8471bf64eb0b08a57c28782
MD5 014cef2fe9b2221e937caf36000897bf
BLAKE2b-256 b15e0cec276191ec340bea742cfcf618420ded30f5b36117796dbb46826adc3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page