MCP server for browser automation with Set of Marks (SoM) - AI agents can see and interact with web pages using numbered element IDs
Project description
๐ BrowserControl
Give your AI agent real browser superpowers.
Quick Start โข Features โข Tools โข Configuration โข Examples
Ever wished Claude, Gemini, or your custom AI agent could actually browse the web? Not just fetch URLs, but truly see, click, type, and interact with any website like a human?
BrowserControl is an MCP server that gives your AI agent full browser access with a vision-first approach inspired by Google's AntiGravity IDE.
โจ What Makes This Different
| Traditional Web Access | BrowserControl |
|---|---|
| Fetch static HTML | See the rendered page |
| Parse complex DOM | Point at numbered elements |
| Guess at selectors | Just say "click 5" |
| No JavaScript support | Full dynamic content |
| No login persistence | Persistent sessions |
| No debugging tools | Console, Network, Errors |
๐ฏ The Secret: Set of Marks (SoM)
Every screenshot comes annotated with numbered red boxes on interactive elements:
Found 15 interactive elements:
[1] button - Sign In
[2] input - Search...
[3] a - Products
[4] a - Pricing
[5] button - Get Started
Your agent sees the numbers and simply calls click(1) to sign in. No CSS selectors. No XPath. No guessing.
๐ Why BrowserControl Beats Every Alternative
Head-to-Head Comparison
| Feature | BrowserControl | Playwright MCP | Stagehand | Browser-Use | AgentQL |
|---|---|---|---|---|---|
| Vision-First (SoM) | โ Numbered boxes | โ Text tree | โ ๏ธ AI vision | โ ๏ธ AI vision | โ Selectors |
| No Extra AI Calls | โ Zero | โ Parses tree | โ GPT-4V per action | โ Vision model | โ Query model |
| Developer Tools | โ 6 tools | โ None | โ None | โ None | โ None |
| Session Recording | โ Built-in | โ Manual | โ None | โ None | โ None |
| Persistent Sessions | โ Automatic | โ ๏ธ Manual setup | โ None | โ None | โ None |
| MCP Native | โ FastMCP | โ Official | โ Python SDK | โ ๏ธ Custom | โ REST API |
| Install Complexity | โ
pip install |
โ ๏ธ npx + config | โ Docker + setup | โ ๏ธ Docker | โ Cloud signup |
| Token Efficiency | โ Tiny IDs | โ ๏ธ Large tree | โ Full images | โ Full images | โ ๏ธ Query results |
| Cost per Action | โ $0 | โ $0 | โ ~$0.01-0.05 | โ ~$0.01-0.05 | โ API fees |
| Offline/Local | โ 100% local | โ Local | โ ๏ธ Needs LLM API | โ ๏ธ Needs LLM API | โ Cloud only |
๐ฏ Key Advantages
1. Token Efficiency = Faster + Cheaper
Other tools send: BrowserControl sends:
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ
Full DOM tree "click(5)"
(5,000+ tokens) (3 tokens)
or
Base64 screenshot Element ID + summary
(10,000+ tokens) (100 tokens)
Result: 50-100x fewer tokens per action = faster responses, lower costs.
2. No Extra AI Calls Required
| Tool | AI Calls per Click |
|---|---|
| BrowserControl | 0 (just click(5)) |
| Stagehand | 1-2 (vision + action) |
| Browser-Use | 1-2 (vision + planning) |
| AgentQL | 1 (query interpretation) |
Result: No vision API costs, no rate limits, works offline.
3. Developer Tools No One Else Has
# Only BrowserControl can do this:
get_console_logs() # See browser errors
get_network_requests() # Monitor API calls
get_page_errors() # Catch JS exceptions
run_in_console(code) # Debug in real-time
inspect_element(5) # Get computed styles
get_page_performance() # Core Web Vitals
Other tools: Navigate, click, type... that's it.
4. Session Recording Built-In
start_recording() โ Browse around โ stop_recording()
โ
๐น session_20260108.zip
(View with Playwright trace viewer)
Other tools: No recording. Debug from memory.
5. True Persistence
| What Persists | BrowserControl | Others |
|---|---|---|
| Cookies | โ | โ |
| localStorage | โ | โ |
| Session tokens | โ | โ |
| Login state | โ | โ |
| Browser history | โ | โ |
Result: Log in once, stay logged in across sessions.
6. Simpler Mental Model
โ Other tools:
"Find the button with class 'btn-primary' that contains text 'Submit'
and is a descendant of form#contact-form..."
โ
BrowserControl:
"click(7)"
๐ Real-World Performance
| Scenario | BrowserControl | Vision-Based Tools |
|---|---|---|
| Click a button | ~50ms | ~2-5 seconds |
| Fill a form (5 fields) | ~500ms | ~15-30 seconds |
| Navigate + act | ~1 second | ~5-10 seconds |
| Debug console errors | โ Instant | โ Not possible |
๐ฐ Cost Comparison (1000 actions/month)
| Tool | Monthly Cost |
|---|---|
| BrowserControl | $0 (fully local) |
| Stagehand (GPT-4V) | ~$30-50 |
| Browser-Use (Claude Vision) | ~$20-40 |
| AgentQL | ~$50+ (API fees) |
๐ Quick Start
Installation
# Install with pip
pip install browsercontrol
# Or with uv (recommended)
uv add browsercontrol
# That's it! Chromium is auto-installed on first run
Run the Server
# Using the CLI
browsercontrol
# Or as a module
python -m browsercontrol
# Or with FastMCP
fastmcp run browsercontrol.server:mcp
Connect to Claude Desktop
Add to ~/.config/Claude/claude_desktop_config.json:
{
"mcpServers": {
"browsercontrol": {
"command": "browsercontrol"
}
}
}
Then just ask Claude:
"Go to GitHub and star the browsercontrol repo"
Claude will navigate, find the star button, and click itโshowing you screenshots along the way!
๐ฏ Features
1. Set of Marks (SoM) - Vision-First Interaction
Every action returns an annotated screenshot with numbered elements. Your AI agent can:
- See the page exactly as a human would
- Identify clickable elements by number
- Act with simple commands like
click(5)
2. ๐ง Developer Tools
Built-in debugging tools for web development:
| Tool | Description |
|---|---|
get_console_logs() |
Capture browser console (errors, warnings, logs) |
get_network_requests() |
Monitor API calls, status codes, timing |
get_page_errors() |
See JavaScript exceptions and crashes |
run_in_console(code) |
Execute JS in browser console |
inspect_element(id) |
Get computed styles, dimensions, properties |
get_page_performance() |
Page load time, Core Web Vitals, memory |
3. ๐ฌ Session Recording
Record browser sessions for debugging and documentation:
| Tool | Description |
|---|---|
start_recording() |
Begin recording the session |
stop_recording() |
Save recording (Playwright trace format) |
take_snapshot() |
Save screenshot + HTML + URL |
list_recordings() |
View all saved sessions |
View recordings with:
npx playwright show-trace ~/.browsercontrol/recordings/session.zip
4. ๐พ Persistent Sessions
- Cookies, localStorage, and session data persist across restarts
- Stay logged into websites
- Maintain shopping carts, preferences, etc.
๐ ๏ธ Available Tools
Navigation
| Tool | Description |
|---|---|
navigate_to(url) |
Go to a URL |
go_back() |
Navigate back |
go_forward() |
Navigate forward |
refresh_page() |
Reload the page |
scroll(direction, amount) |
Scroll the page |
Interaction
| Tool | Description |
|---|---|
click(element_id) |
Click element by number |
click_at(x, y) |
Click at coordinates |
type_text(element_id, text) |
Type into input |
press_key(key) |
Press keyboard key (Enter, Tab, etc.) |
hover(element_id) |
Hover over element |
scroll_to_element(element_id) |
Scroll element into view |
wait(seconds) |
Wait for loading |
Forms
| Tool | Description |
|---|---|
select_option(element_id, option) |
Select dropdown option |
check_checkbox(element_id) |
Toggle checkbox |
Content
| Tool | Description |
|---|---|
get_page_content() |
Get page as markdown |
get_text(element_id) |
Get element text |
get_page_info() |
Get URL and title |
run_javascript(script) |
Execute JavaScript |
screenshot(annotate, full_page) |
Take screenshot |
Developer Tools
| Tool | Description |
|---|---|
get_console_logs() |
Browser console output |
get_network_requests() |
API calls and responses |
get_page_errors() |
JavaScript errors |
run_in_console(code) |
Execute JS in console |
inspect_element(id) |
Element styles/properties |
get_page_performance() |
Load times, Web Vitals |
Recording
| Tool | Description |
|---|---|
start_recording() |
Begin session recording |
stop_recording() |
Save recording |
take_snapshot() |
Save screenshot + HTML |
list_recordings() |
View saved sessions |
โ๏ธ Configuration
Configure via environment variables:
| Variable | Default | Description |
|---|---|---|
BROWSER_HEADLESS |
true |
Run without visible window |
BROWSER_VIEWPORT_WIDTH |
1280 |
Viewport width in pixels |
BROWSER_VIEWPORT_HEIGHT |
720 |
Viewport height in pixels |
BROWSER_TIMEOUT |
30000 |
Navigation timeout (ms) |
BROWSER_USER_DATA_DIR |
~/.browsercontrol/user_data |
Browser profile path |
BROWSER_EXTENSION_PATH |
- | Path to browser extension |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
Examples
# Run with visible browser (for debugging)
BROWSER_HEADLESS=false browsercontrol
# Custom viewport for mobile testing
BROWSER_VIEWPORT_WIDTH=375 BROWSER_VIEWPORT_HEIGHT=812 browsercontrol
# Verbose logging
LOG_LEVEL=DEBUG browsercontrol
๐ Examples
Example 1: Web Research
You: "Go to Wikipedia and find out when Python was created"
Claude: *navigates to wikipedia.org*
*types "Python programming language" in search*
*clicks search result*
*reads the page*
"Python was created by Guido van Rossum and first released in 1991."
Example 2: Debug a Web App
You: "Go to my app at localhost:3000 and check for any errors"
Claude: *navigates to localhost:3000*
*calls get_console_logs()*
"I found 2 errors:
[ERROR] Uncaught TypeError: Cannot read property 'map' of undefined
[ERROR] Failed to load resource: 404 /api/users"
*calls get_network_requests()*
"The /api/users endpoint returned 404. Check your API route."
Example 3: Automated Testing
You: "Test the login flow on my site. Start recording."
Claude: *calls start_recording()*
*navigates to site*
*types email in element 3*
*types password in element 4*
*clicks Login button (element 5)*
*verifies successful redirect*
*calls stop_recording()*
"Login test passed! Recording saved to session_20260108.zip"
Example 4: Form Automation
You: "Fill out the contact form on example.com"
Claude: *navigates to example.com/contact*
*types name in element 2*
*types email in element 3*
*types message in element 4*
*clicks Submit (element 5)*
"Form submitted successfully!"
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ AI Agent โโโโโโถโ BrowserControl โโโโโโถโ Browser โ
โ (Claude/Gemini) โโโโโโโ MCP Server โโโโโโโ (Chromium) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ โ โ
โ "click(5)" โ mouse.click() โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ [annotated โ [screenshot + โ
โ screenshot] โ element map] โ
How It Works
- AI sends command:
click(5) - Server finds element: Looks up element #5 from the last screenshot
- Browser acts: Clicks at the element's coordinates
- Capture state: Takes new screenshot, detects elements
- Annotate: Draws numbered boxes on interactive elements
- Return to AI: Sends annotated image + element list
๐ฆ Project Structure
browsercontrol/
โโโ __init__.py # Package exports
โโโ __main__.py # CLI entry point
โโโ server.py # MCP server setup
โโโ browser.py # BrowserManager with SoM
โโโ config.py # Environment configuration
โโโ tools/
โโโ navigation.py # Navigation tools
โโโ interaction.py # Click, type, hover tools
โโโ forms.py # Form handling tools
โโโ content.py # Content extraction tools
โโโ devtools.py # Developer tools
โโโ recording.py # Session recording tools
๐ง Troubleshooting
"Missing X server" Error
Set BROWSER_HEADLESS=true or run with xvfb:
xvfb-run browsercontrol
Browser Not Starting
Chromium auto-installs on first run. If it fails, install manually:
python -m playwright install chromium
Session Not Persisting
Check that BROWSER_USER_DATA_DIR is writable:
ls -la ~/.browsercontrol/
Connection Refused
Ensure no other instance is running:
pkill -f browsercontrol
browsercontrol
๐ค Contributing
Contributions are welcome! Some ideas:
- Multi-tab support
- Firefox/WebKit support
- DOM diffing (detect changes)
- Accessibility audit
- Mobile emulation presets
- Cookie import/export
# Clone and install
git clone https://github.com/adityasasidhar/browsercontrol
cd browsercontrol
uv sync
# Run tests
uv run pytest
# Run in development
uv run fastmcp dev browsercontrol/server.py
๐ License
MIT License - Use it however you want.
๐ Acknowledgments
- Inspired by the browser control capabilities in Google's AntiGravity IDE
- Built with FastMCP and Playwright
- Thanks to the MCP community for making AI-tool integration accessible
Built with โค๏ธ for the AI agent community.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file browsercontrol-0.1.1.tar.gz.
File metadata
- Download URL: browsercontrol-0.1.1.tar.gz
- Upload date:
- Size: 543.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Kali GNU/Linux","version":"2025.4","id":"kali-rolling","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bea31f9dfca8ddc6d566a5a34e5b6ae0feb51673cbef2a248288f2a2a26fd773
|
|
| MD5 |
ee1e165b11676736e86a1f36364137d5
|
|
| BLAKE2b-256 |
49fc4cb2da56fef0ee6c0eff1b1287e07d462fd579648426c99cb3eaed4dcd0d
|
File details
Details for the file browsercontrol-0.1.1-py3-none-any.whl.
File metadata
- Download URL: browsercontrol-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Kali GNU/Linux","version":"2025.4","id":"kali-rolling","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b03ecea0f5b4398b71bfc16f77339966bb7070f61f3192d53c1bec7ea630ef16
|
|
| MD5 |
4f3d040adcc00dfdbfb69a0005ac9b75
|
|
| BLAKE2b-256 |
b2afde6b905064c662cd466e9a1bc785e119561bc15b2243007e8a36b21a2463
|