Skip to main content

Visual browser MCP server with Set-of-Mark labeling and humanized interactions

Project description

๐ŸŒ atlas-browser-mcp

Visual web browsing for AI agents via Model Context Protocol (MCP).

PyPI version License: MIT

โœจ Features

  • ๐Ÿ“ธ Visual-First: Navigate the web through screenshots, not DOM parsing
  • ๐Ÿท๏ธ Set-of-Mark: Interactive elements labeled with clickable [0], [1], [2]... markers
  • ๐ŸŽญ Humanized: Bezier curve mouse movements, natural typing rhythms
  • ๐Ÿงฉ CAPTCHA-Ready: Multi-click support for image selection challenges
  • ๐Ÿ›ก๏ธ Anti-Detection: Built-in measures to avoid bot detection

๐Ÿš€ Quick Start

Installation

pip install atlas-browser-mcp
playwright install chromium

Use with Claude Desktop

Add to your Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "browser": {
      "command": "atlas-browser-mcp"
    }
  }
}

Then ask Claude:

"Navigate to https://news.ycombinator.com and tell me the top 3 stories"

๐Ÿ› ๏ธ Available Tools

Tool Description
navigate Go to URL, returns labeled screenshot
screenshot Capture current page with labels
click Click element by label ID [N]
multi_click Click multiple elements (for CAPTCHA)
type Type text, optionally press Enter
scroll Scroll page up or down

๐Ÿ“– Usage Examples

Basic Navigation

User: Go to google.com
AI: [calls navigate(url="https://google.com")]
AI: I see the Google homepage. The search box is labeled [3].

User: Search for "MCP protocol"
AI: [calls click(label_id=3)]
AI: [calls type(text="MCP protocol", submit=true)]
AI: Here are the search results...

CAPTCHA Handling

User: Select all images with traffic lights
AI: [Looking at the CAPTCHA grid]
AI: I can see traffic lights in images [2], [5], and [8].
AI: [calls multi_click(label_ids=[2, 5, 8])]

๐Ÿ”ง Configuration

Headless Mode

For servers without display:

from atlas_browser_mcp.browser import VisualBrowser

browser = VisualBrowser(
    headless=True,   # No visible browser window
    humanize=False   # Faster, less human-like
)

Custom Viewport

browser = VisualBrowser()
browser.VIEWPORT = {"width": 1920, "height": 1080}

๐Ÿ—๏ธ How It Works

  1. Navigate: Browser loads the page
  2. Inject SoM: JavaScript labels all interactive elements
  3. Screenshot: Capture the labeled page
  4. AI Sees: The screenshot shows [0], [1], [2]... on buttons, links, inputs
  5. AI Acts: "Click [5]" โ†’ Browser clicks the element at that position
  6. Repeat: New screenshot with updated labels
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  [0] Logo    [1] Search   [2] Menu  โ”‚
โ”‚                                     โ”‚
โ”‚  [3] Article Title                  โ”‚
โ”‚  [4] Read More                      โ”‚
โ”‚                                     โ”‚
โ”‚  [5] Subscribe    [6] Share         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿค Integration

With Cline (VS Code)

{
  "mcpServers": {
    "browser": {
      "command": "atlas-browser-mcp"
    }
  }
}

Programmatic Use

from atlas_browser_mcp.browser import VisualBrowser

browser = VisualBrowser()

# Navigate
result = browser.execute("navigate", url="https://example.com")
print(f"Page title: {result.data['title']}")
print(f"Found {result.data['element_count']} interactive elements")

# Click element [0]
result = browser.execute("click", label_id=0)

# Type in focused field
result = browser.execute("type", text="Hello world", submit=True)

# Cleanup
browser.execute("close")

๐Ÿ“‹ Requirements

  • Python 3.10+
  • Playwright with Chromium

๐Ÿ› Troubleshooting

"Playwright not installed"

pip install playwright
playwright install chromium

"Browser closed unexpectedly"

Try running with headless=False to see what's happening:

browser = VisualBrowser(headless=False)

Elements not being detected

Some dynamic pages need more wait time. The browser waits 1.5s after navigation, but complex SPAs may need longer.

๐Ÿ“„ License

MIT License - see LICENSE

๐Ÿ™ Credits

Built for Atlas, an autonomous AI agent.

Inspired by:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atlas_browser_mcp-0.1.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atlas_browser_mcp-0.1.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file atlas_browser_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: atlas_browser_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for atlas_browser_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c7c59ccd86718201ef721dc36c461c1a8d5d7f7abb164e0bccd43eaf50adb605
MD5 d711282170f8c7a01e51d76ff1939807
BLAKE2b-256 8b9d0232c6ce4e01b64bd304f5bdfbff4dff945935b5b78eea0faa798cd6a1d3

See more details on using hashes here.

File details

Details for the file atlas_browser_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for atlas_browser_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8e485b19facfc14512fe8cf83c428a5ef306c9f89fda1002a956ec8ada38adf3
MD5 a6c43ec021d49035b52f8b4861d67a26
BLAKE2b-256 b19ba822ddec025f1286ebc61e898be9e0a2ca0e606350957351c679e7cfe4d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page