MCP server for browser automation with Set of Marks (SoM) - AI agents can see and interact with web pages using numbered element IDs
Project description
BrowserControl
Give your AI agent real browser superpowers.
Vision-first browser automation for any MCP-compatible AI agent.
Quick Start • How It Works • Tools • Configuration • Examples • Contributing
Ever wished Claude or Gemini could actually browse the web? Not just fetch URLs, but truly see, click, type, and interact with any website like a human?
BrowserControl is an MCP server that gives your AI agent full browser access with a vision-first approach—no CSS selectors, no XPath, no guessing. Just point at numbers.
✨ What Makes This Different
🎯 The Secret: Set of Marks (SoM)
Every screenshot comes annotated with numbered red boxes on interactive elements:
Found 15 interactive elements:
[1] button - Sign In
[2] input - Search...
[3] a - Products
[4] a - Pricing
[5] button - Get Started
Your agent sees the numbers and simply calls click(1) to sign in. No CSS selectors. No XPath. No guessing.
🚀 Quick Start
Installation
# Using pip
pip install browsercontrol
# Or with uv (recommended for faster installs)
uv add browsercontrol
# Chromium is auto-installed on first run—no extra steps needed!
Run the Server
# Using the CLI
browsercontrol
# Or as a Python module
python -m browsercontrol
# Or with FastMCP
fastmcp run browsercontrol.server:mcp
Connect to Your AI Agent
BrowserControl works with any MCP-compatible AI agent or IDE. Choose your platform:
Claude Desktop
Add to your Claude configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"browsercontrol": {
"command": "browsercontrol"
}
}
}
Restart Claude Desktop, then ask:
"Go to GitHub and star the browsercontrol repo"
� Gemini CLI / Google AI Studio
If using the Gemini CLI or Google AI Studio with MCP support:
# Set up MCP configuration
export MCP_SERVERS='{"browsercontrol": {"command": "browsercontrol"}}'
# Or add to your Gemini config file
For Google AI Studio, configure in the MCP settings panel.
🔧 Cline (VS Code Extension)
- Install the Cline extension
- Open Cline settings (gear icon)
- Navigate to "MCP Servers"
- Add a new server:
{
"browsercontrol": {
"command": "browsercontrol"
}
}
🤖 Continue.dev (VS Code/JetBrains)
Add to your Continue configuration (~/.continue/config.json):
{
"mcpServers": [
{
"name": "browsercontrol",
"command": "browsercontrol"
}
]
}
🎯 Cursor IDE
- Open Cursor Settings
- Navigate to "Features" → "Model Context Protocol"
- Add server configuration:
{
"browsercontrol": {
"command": "browsercontrol"
}
}
🔌 Zed Editor
Add to your Zed settings (~/.config/zed/settings.json):
{
"context_servers": {
"browsercontrol": {
"command": {
"path": "browsercontrol"
}
}
}
}
🐍 Custom Python Integration
Use the MCP Python SDK to integrate BrowserControl into your own agent:
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
# Connect to BrowserControl
server_params = StdioServerParameters(
command="browsercontrol",
args=[],
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize
await session.initialize()
# List available tools
tools = await session.list_tools()
# Call a tool
result = await session.call_tool("navigate_to", {
"url": "https://github.com"
})
🚀 Using with uv or pipx
If you installed with uv or pipx, use the full path:
{
"mcpServers": {
"browsercontrol": {
"command": "uvx",
"args": ["browsercontrol"]
}
}
}
Or with pipx:
{
"mcpServers": {
"browsercontrol": {
"command": "pipx",
"args": ["run", "browsercontrol"]
}
}
}
🔧 Advanced Configuration
You can pass environment variables to customize BrowserControl:
{
"mcpServers": {
"browsercontrol": {
"command": "browsercontrol",
"env": {
"BROWSER_HEADLESS": "false",
"BROWSER_VIEWPORT_WIDTH": "1920",
"BROWSER_VIEWPORT_HEIGHT": "1080",
"LOG_LEVEL": "DEBUG"
}
}
}
}
See Configuration for all available options.
🥊 Head-to-Head Comparison
| Feature | BrowserControl | Playwright MCP | Stagehand | Browser-Use | AgentQL |
|---|---|---|---|---|---|
| Vision-First (SoM) | ✅ Numbered boxes | ❌ Text tree | ⚠️ AI vision | ⚠️ AI vision | ❌ Selectors |
| Multi-Tab Support | ✅ Full control | ⚠️ Implicit | ⚠️ Implicit | ⚠️ Basic | ❌ None |
| Cookie Management | ✅ Direct tools | ⚠️ JS only | ⚠️ JS only | ⚠️ Basic | ❌ None |
| File Uploads | ✅ Native tool | ⚠️ Manual | ❌ No | ❌ No | ❌ No |
| Developer Tools | ✅ 8 tools | ❌ None | ❌ None | ❌ None | ❌ None |
| Session Recording | ✅ Built-in | ⚠️ Manual | ❌ None | ❌ None | ❌ None |
| Persistent Sessions | ✅ Automatic | ⚠️ Manual | ❌ None | ❌ None | ❌ None |
| Token Efficiency | ✅ Tiny IDs | ⚠️ Large tree | ❌ Full images | ❌ Full images | ⚠️ Query results |
| 100% Local/Offline | ✅ Yes | ✅ Yes | ❌ Needs LLM API | ❌ Needs LLM API | ❌ Cloud only |
| Monthly Cost (1k actions) | $0 | $0 | ~$30-50 | ~$20-40 | ~$50+ |
💪 Key Advantages
1. Multi-Tab Orchestration
Unlike other tools that get "lost" when a new window opens:
list_tabs()— See every open page, title, and URLswitch_tab(index)— Multitask between different sitescreate_tab(url)— Open references or parallel workflows
2. Session & Cookie Management
Stop fighting with login forms. Inject or inspect session state directly:
set_cookie()— Log in instantly by injecting an auth tokenget_cookies()— Debug session issues or export stateclear_cookies()— Fresh start without clearing the whole profile
3. Reliable File Uploads
Most AI agents fail when they hit a <input type="file">. BrowserControl uses native browser engine hooks:
upload_file(id, path)— Just point at the button and the local file
4. Developer Tools Suite
Debug like a pro with tools no one else provides:
get_console_logs() # See browser errors
get_network_requests() # Monitor API calls
get_page_errors() # Catch JS exceptions
run_in_console(code) # Debug in real-time
inspect_element(5) # Get computed styles
get_page_performance() # Core Web Vitals
5. Session Recording
start_recording() → Browse around → stop_recording()
↓
session_20260202.zip
(View with Playwright trace viewer)
6. Dynamic Viewport Control
Test responsive designs or emulate mobile screens on the fly:
set_viewport(width, height)— Change resolution without restarting
7. True Persistence
| What Persists | BrowserControl | Others |
|---|---|---|
| Cookies | ✅ | ❌ |
| localStorage | ✅ | ❌ |
| Session tokens | ✅ | ❌ |
| Login state | ✅ | ❌ |
| Browser history | ✅ | ❌ |
Result: Log in once, stay logged in across sessions.
🛠️ Available Tools
Navigation
| Tool | Description |
|---|---|
navigate_to(url) |
Go to a URL |
go_back() |
Navigate back |
go_forward() |
Navigate forward |
refresh_page() |
Reload the page |
scroll(direction, amount) |
Scroll up/down/left/right |
Interaction
| Tool | Description |
|---|---|
click(element_id) |
Click element by number |
click_at(x, y) |
Click at coordinates |
type_text(element_id, text) |
Type into input field |
press_key(key) |
Press keyboard key (Enter, Tab, etc.) |
hover(element_id) |
Hover over element |
scroll_to_element(element_id) |
Scroll element into view |
wait(seconds) |
Wait for page loading |
Tab Management
| Tool | Description |
|---|---|
create_tab(url) |
Open a new browser tab |
switch_tab(index) |
Switch to a tab by its index |
close_tab(index) |
Close a specific tab |
list_tabs() |
List all open tabs and URLs |
Forms
| Tool | Description |
|---|---|
select_option(element_id, option) |
Select dropdown option |
check_checkbox(element_id) |
Toggle checkbox |
upload_file(element_id, file_path) |
Upload file to input |
Content Extraction
| Tool | Description |
|---|---|
get_page_content() |
Get page as markdown |
get_text(element_id) |
Get element text |
get_page_info() |
Get URL and title |
run_javascript(script) |
Execute JavaScript |
screenshot(annotate, full_page) |
Take screenshot |
Developer Tools
| Tool | Description |
|---|---|
get_console_logs() |
Browser console output |
get_network_requests() |
API calls and responses |
get_page_errors() |
JavaScript errors |
run_in_console(code) |
Execute JS in console |
inspect_element(id) |
Element styles/properties |
get_cookies() |
List browser cookies |
set_cookie(name, value, ...) |
Set a cookie |
delete_cookie(name) |
Remove a cookie |
clear_cookies() |
Clear all cookies |
set_viewport(width, height) |
Change window size |
get_page_performance() |
Load times, Web Vitals |
Recording
| Tool | Description |
|---|---|
start_recording() |
Begin session recording |
stop_recording() |
Save recording |
take_snapshot() |
Save screenshot + HTML |
list_recordings() |
View saved sessions |
⚙️ Configuration
Configure via environment variables:
| Variable | Default | Description |
|---|---|---|
BROWSER_HEADLESS |
true |
Run without visible window |
BROWSER_VIEWPORT_WIDTH |
1280 |
Viewport width in pixels |
BROWSER_VIEWPORT_HEIGHT |
720 |
Viewport height in pixels |
BROWSER_TIMEOUT |
30000 |
Navigation timeout (ms) |
BROWSER_USER_DATA_DIR |
~/.browsercontrol/user_data |
Browser profile path |
BROWSER_EXTENSION_PATH |
— | Path to browser extension |
LOG_LEVEL |
INFO |
Logging verbosity |
Examples:
# Run with visible browser (for debugging)
BROWSER_HEADLESS=false browsercontrol
# Mobile viewport emulation
BROWSER_VIEWPORT_WIDTH=375 BROWSER_VIEWPORT_HEIGHT=812 browsercontrol
# Verbose logging
LOG_LEVEL=DEBUG browsercontrol
📖 Examples
Web Research
You: "Go to Wikipedia and find out when Python was created"
Claude: *navigates to wikipedia.org*
*types "Python programming language" in search*
*clicks search result*
*reads the page*
"Python was created by Guido van Rossum and first released in 1991."
Debug a Web App
You: "Go to my app at localhost:3000 and check for any errors"
Claude: *navigates to localhost:3000*
*calls get_console_logs()*
"I found 2 errors:
[ERROR] Uncaught TypeError: Cannot read property 'map' of undefined
[ERROR] Failed to load resource: 404 /api/users"
*calls get_network_requests()*
"The /api/users endpoint returned 404. Check your API route."
Automated Testing with Recording
You: "Test the login flow on my site. Start recording."
Claude: *calls start_recording()*
*navigates to site*
*types email in element 3*
*types password in element 4*
*clicks Login button (element 5)*
*verifies successful redirect*
*calls stop_recording()*
"Login test passed! Recording saved to session_20260202.zip"
Form Automation
You: "Fill out the contact form on example.com"
Claude: *navigates to example.com/contact*
*types name in element 2*
*types email in element 3*
*types message in element 4*
*clicks Submit (element 5)*
"Form submitted successfully!"
🏗️ Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ AI Agent │────▶│ BrowserControl │────▶│ Browser │
│ (Claude/Gemini) │◀────│ MCP Server │◀────│ (Chromium) │
└─────────────────┘ └──────────────────┘ └─────────────┘
│ │ │
│ "click(5)" │ mouse.click() │
│◀───────────────────────│◀─────────────────────│
│ [annotated │ [screenshot + │
│ screenshot] │ element map] │
How It Works
- AI sends command —
click(5) - Server finds element — Looks up element #5 from the last screenshot
- Browser acts — Clicks at the element's coordinates
- Capture state — Takes new screenshot, detects elements
- Annotate — Draws numbered boxes on interactive elements
- Return to AI — Sends annotated image + element list
📁 Project Structure
browsercontrol/
├── __init__.py # Package exports
├── __main__.py # CLI entry point
├── server.py # MCP server setup
├── browser.py # BrowserManager with SoM
├── config.py # Environment configuration
└── tools/
├── navigation.py # Navigation tools
├── interaction.py # Click, type, hover tools
├── forms.py # Form handling tools
├── content.py # Content extraction tools
├── devtools.py # Developer tools
├── recording.py # Session recording tools
└── tabs.py # Tab management tools
🔧 Troubleshooting
"Missing X server" Error
Set BROWSER_HEADLESS=true or run with xvfb:
xvfb-run browsercontrol
Browser Not Starting
Chromium auto-installs on first run. If it fails, install manually:
python -m playwright install chromium
Session Not Persisting
Check that BROWSER_USER_DATA_DIR is writable:
ls -la ~/.browsercontrol/
Connection Refused
Ensure no other instance is running:
pkill -f browsercontrol
browsercontrol
View Session Recordings
Open recordings in the Playwright trace viewer:
npx playwright show-trace ~/.browsercontrol/recordings/session.zip
🤝 Contributing
Contributions are welcome! Check out our Contributing Guide for details.
Ideas for contributions:
- Firefox/WebKit support
- DOM diffing (detect changes)
- Accessibility audit tools
- Mobile emulation presets
- Cookie import/export files
# Clone and install
git clone https://github.com/adityasasidhar/browsercontrol
cd browsercontrol
uv sync
# Run tests
uv run pytest
# Run in development
uv run fastmcp dev browsercontrol/server.py
📄 License
MIT License — Use it however you want.
🙏 Acknowledgments
- Vision-first approach inspired by Google's AntiGravity IDE
- Built with FastMCP and Playwright
- Thanks to the MCP community for making AI-tool integration accessible
Built for AI agents that need to see the web.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file browsercontrol-0.1.4.tar.gz.
File metadata
- Download URL: browsercontrol-0.1.4.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49978043366686de99499b2fdf3333ba8c3371dac16faceb1d5c6d1589037a44
|
|
| MD5 |
5abcbbaa50f71914a724087620f2106a
|
|
| BLAKE2b-256 |
c628799828b99e4b43f9e2c2b4f55421f5b32568d80e5a8df594cb092dd45484
|
File details
Details for the file browsercontrol-0.1.4-py3-none-any.whl.
File metadata
- Download URL: browsercontrol-0.1.4-py3-none-any.whl
- Upload date:
- Size: 31.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9db8f57a6dbb52dc393685e787a26ba62b10ed1228038938ad5aaedc6c49bb60
|
|
| MD5 |
c0dc731f57006091c8d30fb7d255aedd
|
|
| BLAKE2b-256 |
baf0847a5ebd56bdfae54d4fdfe83afe36f5d338b41a050fb9ca1d89d22a0b12
|