Skip to main content

Agentic browser automation using LangGraph and raw CDP

Project description

OpenBrowser

Automating Walmart Product Scraping:

https://github.com/user-attachments/assets/c517c739-9199-47b0-bac7-c2c642a21094

OpenBrowserAI Automatic Flight Booking:

https://github.com/user-attachments/assets/632128f6-3d09-497f-9e7d-e29b9cb65e0f

PyPI version Downloads Python 3.12+ License: MIT Tests Coverage

AI-powered browser automation using CodeAgent and CDP (Chrome DevTools Protocol)

OpenBrowser is a framework for intelligent browser automation. It combines direct CDP communication with a CodeAgent architecture, where the LLM writes Python code executed in a persistent namespace, to navigate, interact with, and extract information from web pages autonomously.

Table of Contents

Documentation

Full documentation: https://docs.openbrowser.me

Key Features

  • CodeAgent Architecture - LLM writes Python code in a persistent Jupyter-like namespace for browser automation
  • Raw CDP Communication - Direct Chrome DevTools Protocol for maximum control and speed
  • Vision Support - Screenshot analysis for visual understanding of pages
  • 12+ LLM Providers - OpenAI, Anthropic, Google, Groq, AWS Bedrock, Azure OpenAI, Ollama, and more
  • MCP Server - Model Context Protocol support for Claude Desktop integration
  • CLI Daemon - Persistent browser daemon with -c flag for direct code execution from Bash
  • Video Recording - Record browser sessions as video files

Installation

Quick install (macOS / Linux)

curl -fsSL https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.sh | sh

Quick install (Windows PowerShell)

irm https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.ps1 | iex

Detects uv, pipx, or pip and installs OpenBrowser automatically.

Install to ~/.local/bin without sudo:

curl -fsSL https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.sh | sh -s -- --local

Homebrew (macOS / Linux)

brew tap billy-enrizky/openbrowser
brew install openbrowser-ai

pip

pip install openbrowser-ai

uv (recommended)

uv pip install openbrowser-ai

uvx (zero install)

Run directly without installing -- uvx downloads and caches the package automatically:

# MCP server mode
uvx openbrowser-ai --mcp

# CLI daemon mode
uvx openbrowser-ai -c "await navigate('https://example.com')"

pipx

pipx install openbrowser-ai

From source

git clone https://github.com/billy-enrizky/openbrowser-ai.git
cd openbrowser-ai
uv pip install -e ".[agent]"

Optional Dependencies

pip install openbrowser-ai[agent]      # LLM agent support (langgraph, langchain, litellm)
pip install openbrowser-ai[all]        # All LLM providers
pip install openbrowser-ai[anthropic]  # Anthropic Claude
pip install openbrowser-ai[groq]       # Groq
pip install openbrowser-ai[ollama]     # Ollama (local models)
pip install openbrowser-ai[aws]        # AWS Bedrock
pip install openbrowser-ai[azure]      # Azure OpenAI
pip install openbrowser-ai[video]      # Video recording support

No separate browser install needed. OpenBrowser auto-detects any installed Chromium-based browser (Chrome, Edge, Brave, Chromium) and uses it directly. If none is found and uvx is available, Chromium is installed automatically on first run. To pre-install manually (requires uvx): openbrowser-ai install

Quick Start

Basic Usage

import asyncio
from openbrowser import CodeAgent, ChatGoogle

async def main():
    agent = CodeAgent(
        task="Go to google.com and search for 'Python tutorials'",
        llm=ChatGoogle(model="gemini-3-flash"),
    )

    result = await agent.run()
    print(f"Result: {result}")

asyncio.run(main())

With Different LLM Providers

from openbrowser import CodeAgent, ChatOpenAI, ChatAnthropic, ChatGoogle

# OpenAI
agent = CodeAgent(task="...", llm=ChatOpenAI(model="gpt-5.2"))

# Anthropic
agent = CodeAgent(task="...", llm=ChatAnthropic(model="claude-sonnet-4-6"))

# Google Gemini
agent = CodeAgent(task="...", llm=ChatGoogle(model="gemini-3-flash"))

Using Browser Session Directly

import asyncio
from openbrowser import BrowserSession, BrowserProfile

async def main():
    profile = BrowserProfile(
        headless=True,
        viewport_width=1920,
        viewport_height=1080,
    )
    
    session = BrowserSession(browser_profile=profile)
    await session.start()
    
    await session.navigate_to("https://example.com")
    screenshot = await session.screenshot()
    
    await session.stop()

asyncio.run(main())

Configuration

Environment Variables

# Google (recommended)
export GOOGLE_API_KEY="..."

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Groq
export GROQ_API_KEY="gsk_..."

# AWS Bedrock
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-west-2"

# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"

BrowserProfile Options

from openbrowser import BrowserProfile

profile = BrowserProfile(
    headless=True,
    viewport_width=1280,
    viewport_height=720,
    disable_security=False,
    extra_chromium_args=["--disable-gpu"],
    record_video_dir="./recordings",
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass",
    },
)

Supported LLM Providers

Provider Class Models
Google ChatGoogle gemini-3-flash, gemini-3-pro
OpenAI ChatOpenAI gpt-5.2, o4-mini, o3
Anthropic ChatAnthropic claude-sonnet-4-6, claude-opus-4-6
Groq ChatGroq llama-4-scout, qwen3-32b
AWS Bedrock ChatAWSBedrock anthropic.claude-sonnet-4-6, amazon.nova-pro
AWS Bedrock (Anthropic) ChatAnthropicBedrock Claude models via Anthropic Bedrock SDK
Azure OpenAI ChatAzureOpenAI Any Azure-deployed model
OpenRouter ChatOpenRouter Any model on openrouter.ai
DeepSeek ChatDeepSeek deepseek-chat, deepseek-r1
Cerebras ChatCerebras llama-4-scout, qwen-3-235b
Ollama ChatOllama llama-4-scout, deepseek-r1 (local)
OCI ChatOCIRaw Oracle Cloud GenAI models
Browser-Use ChatBrowserUse External LLM service

Claude Code Plugin

Install OpenBrowser as a Claude Code plugin:

# Add the marketplace (one-time)
claude plugin marketplace add billy-enrizky/openbrowser-ai

# Install the plugin
claude plugin install openbrowser@openbrowser-ai

This installs the MCP server and 6 built-in skills:

Skill Description
web-scraping Extract structured data, handle pagination
form-filling Fill forms, login flows, multi-step wizards
e2e-testing Test web apps by simulating user interactions
page-analysis Analyze page content, structure, metadata
accessibility-audit Audit pages for WCAG compliance
file-download Download files (PDFs, CSVs) using browser session

See plugin/README.md for detailed tool parameter documentation.

Codex

OpenBrowser works with OpenAI Codex via native skill discovery.

Quick Install

Tell Codex:

Fetch and follow instructions from https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/refs/heads/main/.codex/INSTALL.md

Manual Install

# Clone the repository
git clone https://github.com/billy-enrizky/openbrowser-ai.git ~/.codex/openbrowser

# Symlink skills for native discovery
mkdir -p ~/.agents/skills
ln -s ~/.codex/openbrowser/plugin/skills ~/.agents/skills/openbrowser

# Restart Codex

Then configure the MCP server in your project (see MCP Server below).

Detailed docs: .codex/INSTALL.md

OpenCode

OpenBrowser works with OpenCode.ai via plugin and skill symlinks.

Quick Install

Tell OpenCode:

Fetch and follow instructions from https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/refs/heads/main/.opencode/INSTALL.md

Manual Install

# Clone the repository
git clone https://github.com/billy-enrizky/openbrowser-ai.git ~/.config/opencode/openbrowser

# Create directories
mkdir -p ~/.config/opencode/plugins ~/.config/opencode/skills

# Symlink plugin and skills
ln -s ~/.config/opencode/openbrowser/.opencode/plugins/openbrowser.js ~/.config/opencode/plugins/openbrowser.js
ln -s ~/.config/opencode/openbrowser/plugin/skills ~/.config/opencode/skills/openbrowser

# Restart OpenCode

Then configure the MCP server in your project (see MCP Server below).

Detailed docs: .opencode/INSTALL.md

OpenClaw

OpenClaw supports OpenBrowser via the CLI daemon. Install OpenBrowser, then use openbrowser-ai -c from the Bash tool:

openbrowser-ai -c "await navigate('https://example.com')"
openbrowser-ai -c "print(await evaluate('document.title'))"

The daemon starts automatically on first use and persists variables across calls.

For OpenClaw plugin documentation, see docs.openclaw.ai/tools/plugin.

MCP Server

MCP Registry

OpenBrowser includes an MCP (Model Context Protocol) server that exposes browser automation as tools for AI assistants like Claude. Listed on the MCP Registry as me.openbrowser/openbrowser-ai. No external LLM API keys required -- the MCP client provides the intelligence.

Quick Setup

Claude Code: add to your project's .mcp.json:

{
  "mcpServers": {
    "openbrowser": {
      "command": "uvx",
      "args": ["openbrowser-ai", "--mcp"]
    }
  }
}

Claude Desktop: add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "openbrowser": {
      "command": "uvx",
      "args": ["openbrowser-ai", "--mcp"],
      "env": {
        "OPENBROWSER_HEADLESS": "true"
      }
    }
  }
}

Run directly:

uvx openbrowser-ai --mcp

Tool

The MCP server exposes a single execute_code tool that runs Python code in a persistent namespace with browser automation functions. The LLM writes Python code to navigate, interact, and extract data, returning only what was explicitly requested.

Available functions (all async, use await):

Category Functions
Navigation navigate(url, new_tab), go_back(), wait(seconds)
Interaction click(index), input_text(index, text, clear), scroll(down, pages, index), send_keys(keys), upload_file(index, path)
Dropdowns select_dropdown(index, text), dropdown_options(index)
Tabs switch(tab_id), close(tab_id)
JavaScript evaluate(code): run JS in page context, returns Python objects
Downloads download_file(url, filename): download a file using browser cookies, list_downloads(): list downloaded files
State browser.get_browser_state_summary(): get page metadata and interactive elements
CSS get_selector_from_index(index): get CSS selector for an element
Completion done(text, success): signal task completion

Pre-imported libraries: json, csv, re, datetime, asyncio, Path, requests, numpy, pandas, matplotlib, BeautifulSoup

Configuration

Environment Variable Description Default
OPENBROWSER_HEADLESS Run browser without GUI false
OPENBROWSER_ALLOWED_DOMAINS Comma-separated domain whitelist (none)
OPENBROWSER_COMPACT_DESCRIPTION Minimal tool description (~500 tokens) false
OPENBROWSER_MAX_OUTPUT Max output characters per execution 10000

Benchmark: Token Efficiency

CLI Benchmark: 4-Way Comparison (6 Tasks, N=3 runs)

Four CLI tools compared with a single Bash tool each. Claude Sonnet 4.6 on Bedrock. Randomized order. All achieve 100% accuracy.

CLI Benchmark: Token Usage vs Duration

CLI Tool Duration (mean +/- std) Tool Calls Bedrock API Tokens Response Chars
openbrowser-ai 84.8 +/- 10.9s 15.3 +/- 2.3 36,010 +/- 6,063 9,452 +/- 472
browser-use 106.0 +/- 9.5s 20.7 +/- 6.4 77,123 +/- 33,354 36,241 +/- 12,940
agent-browser 99.0 +/- 6.8s 25.0 +/- 4.0 90,107 +/- 3,698 56,009 +/- 39,733
playwright-cli 118.3 +/- 21.4s 25.7 +/- 8.1 94,130 +/- 35,982 84,065 +/- 49,713

openbrowser-ai uses 2.1-2.6x fewer tokens than all competitors via Python code batching and compact DOM representation.

CLI Benchmark: Overview

Per-Task Token Usage

CLI Benchmark: Per-Task Token Usage

Task openbrowser-ai browser-use playwright-cli agent-browser
fact_lookup 2,504 4,710 16,857 9,676
form_fill 7,887 15,811 31,757 19,226
multi_page_extract 2,354 2,405 8,886 8,117
search_navigate 16,539 47,936 27,779 44,367
deep_navigation 2,178 3,747 4,705 5,534
content_analysis 4,548 2,515 4,147 3,189

openbrowser-ai wins 5 of 6 tasks. The advantage is largest on complex pages (search_navigate: 2.9x fewer tokens than browser-use) where code batching avoids repeated page state dumps.

Cost per Benchmark Run (6 Tasks)

Model openbrowser-ai browser-use playwright-cli agent-browser
Claude Sonnet 4.6 ($3/$15 per M) $0.12 $0.24 $0.29 $0.27
Claude Opus 4.6 ($5/$25 per M) $0.24 $0.45 $0.56 $0.51

Raw results are in benchmarks/e2e_4way_cli_results.json. Full 4-way comparison with methodology.

E2E LLM Benchmark: MCP Server Comparison (6 Tasks, N=5 runs)

E2E LLM Benchmark: MCP Server Comparison

MCP Server Pass Rate Duration (mean +/- std) Tool Calls Bedrock API Tokens
Playwright MCP (Microsoft) 100% 62.7 +/- 4.8s 9.4 +/- 0.9 158,787
Chrome DevTools MCP (Google) 100% 103.4 +/- 2.7s 19.4 +/- 0.5 299,486
OpenBrowser MCP 100% 77.0 +/- 6.7s 13.8 +/- 2.0 50,195

OpenBrowser uses 3.2x fewer tokens than Playwright and 6.0x fewer than Chrome DevTools. MCP response sizes: Playwright 1,132,173 chars, Chrome DevTools 1,147,244 chars, OpenBrowser 7,853 chars -- a 144x difference.

Full MCP comparison with methodology

CLI Usage

# Run a browser automation task with an LLM agent
uvx openbrowser-ai -p "Search for Python tutorials on Google"

# Execute code directly via persistent daemon
uvx openbrowser-ai -c "await navigate('https://example.com')"
uvx openbrowser-ai -c "print(await evaluate('document.title'))"

# Daemon management
uvx openbrowser-ai daemon start     # Start daemon (auto-starts on first -c call)
uvx openbrowser-ai daemon stop      # Stop daemon and browser
uvx openbrowser-ai daemon status    # Show daemon info
uvx openbrowser-ai daemon restart   # Restart daemon

# Install browser
uvx openbrowser-ai install

# Run MCP server
uvx openbrowser-ai --mcp

The -c flag connects to a persistent browser daemon over a Unix socket (localhost TCP on Windows). Variables persist across calls while the daemon is running. The daemon starts automatically on first use and shuts down after 10 minutes of inactivity.

Project Structure

openbrowser-ai/
├── .claude-plugin/            # Claude Code marketplace config
├── .codex/                    # Codex integration
│   └── INSTALL.md
├── .opencode/                 # OpenCode integration
│   ├── INSTALL.md
│   └── plugins/openbrowser.js
├── plugin/                    # Plugin package (skills + MCP config)
│   ├── .claude-plugin/
│   ├── .mcp.json
│   └── skills/                # 6 browser automation skills
├── src/openbrowser/
│   ├── __init__.py            # Main exports
│   ├── cli.py                 # CLI commands
│   ├── config.py              # Configuration
│   ├── actor/                 # Element interaction
│   ├── agent/                 # LangGraph agent
│   ├── browser/               # CDP browser control
│   ├── code_use/              # Code agent + shared executor
│   ├── daemon/                # Persistent browser daemon (Unix socket)
│   ├── dom/                   # DOM extraction
│   ├── llm/                   # LLM providers
│   ├── mcp/                   # MCP server
│   └── tools/                 # Action registry
├── benchmarks/                # MCP benchmarks and E2E tests
│   ├── playwright_benchmark.py
│   ├── cdp_benchmark.py
│   ├── openbrowser_benchmark.py
│   └── e2e_published_test.py
└── tests/                     # Test suite

Testing

# Run unit tests
pytest tests/

# Run with verbose output
pytest tests/ -v

# E2E test the MCP server against the published PyPI package
uv run python benchmarks/e2e_published_test.py

Benchmarks

Run individual MCP server benchmarks (JSON-RPC stdio, 5-step Wikipedia workflow):

uv run python benchmarks/openbrowser_benchmark.py   # OpenBrowser MCP
uv run python benchmarks/playwright_benchmark.py     # Playwright MCP
uv run python benchmarks/cdp_benchmark.py            # Chrome DevTools MCP

Raw results are in benchmarks/e2e_4way_cli_results.json. See full comparison for methodology.

Backend and Frontend Deployment

The project includes a FastAPI backend and a Next.js frontend, both containerized with Docker.

Prerequisites

  • Docker and Docker Compose
  • A .env file in the project root with POSTGRES_PASSWORD and any LLM API keys (see backend/env.example)

Local Development (Docker Compose)

# Start backend + PostgreSQL (frontend runs locally)
docker-compose -f docker-compose.dev.yml up --build

# In a separate terminal, start the frontend
cd frontend && npm install && npm run dev
Service URL Description
Backend http://localhost:8000 FastAPI + WebSocket + VNC
Frontend http://localhost:3000 Next.js dev server
PostgreSQL localhost:5432 Chat persistence
VNC ws://localhost:6080 Live browser view

The dev compose mounts backend/app/ and src/ as volumes for hot-reload. API keys are loaded from backend/.env via env_file. The POSTGRES_PASSWORD is read from the root .env file.

Full Stack (Docker Compose)

# Start all services (backend + frontend + PostgreSQL)
docker-compose up --build

This builds and runs both the backend and frontend containers together with PostgreSQL.

Backend

The backend is a FastAPI application in backend/ with a Dockerfile at backend/Dockerfile. It includes:

  • REST API on port 8000
  • WebSocket endpoint at /ws for real-time agent communication
  • VNC support (Xvfb + x11vnc + websockify) for live browser viewing on ports 6080-6090
  • Kiosk security: Openbox window manager, Chromium enterprise policies, X11 key grabber daemon
  • Health check at /health
# Build the backend image
docker build -f backend/Dockerfile -t openbrowser-backend .

# Run standalone
docker run -p 8000:8000 -p 6080:6080 \
  --env-file backend/.env \
  -e VNC_ENABLED=true \
  -e AUTH_ENABLED=false \
  --shm-size=2g \
  openbrowser-backend

Frontend

The frontend is a Next.js application in frontend/ with a Dockerfile at frontend/Dockerfile.

# Build the frontend image
cd frontend && docker build -t openbrowser-frontend .

# Run standalone
docker run -p 3000:3000 \
  -e NEXT_PUBLIC_API_URL=http://localhost:8000 \
  -e NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws \
  openbrowser-frontend

Environment Variables

Key environment variables for the backend (see backend/env.example for the full list):

Variable Description Default
GOOGLE_API_KEY Google/Gemini API key (required)
DEFAULT_LLM_MODEL Default model for agents gemini-3-flash-preview
AUTH_ENABLED Enable Cognito JWT auth false
VNC_ENABLED Enable VNC browser viewing true
DATABASE_URL PostgreSQL connection string (optional)
POSTGRES_PASSWORD PostgreSQL password (root .env) (required for compose)

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact


Made with love for the AI automation community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openbrowser_ai-0.1.38.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openbrowser_ai-0.1.38-py3-none-any.whl (526.7 kB view details)

Uploaded Python 3

File details

Details for the file openbrowser_ai-0.1.38.tar.gz.

File metadata

  • Download URL: openbrowser_ai-0.1.38.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openbrowser_ai-0.1.38.tar.gz
Algorithm Hash digest
SHA256 df8189b5c330ad9e212ce0bb89c6a6220cb4273c662a7180132ba40424f26a43
MD5 f96cc42119329e9a85e8e3ef37b6357b
BLAKE2b-256 0e2b673a34afe47830d8a2c1c0e93320a0a41fc2911134627972a4606ed8653e

See more details on using hashes here.

Provenance

The following attestation bundles were made for openbrowser_ai-0.1.38.tar.gz:

Publisher: publish.yml on billy-enrizky/openbrowser-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openbrowser_ai-0.1.38-py3-none-any.whl.

File metadata

  • Download URL: openbrowser_ai-0.1.38-py3-none-any.whl
  • Upload date:
  • Size: 526.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openbrowser_ai-0.1.38-py3-none-any.whl
Algorithm Hash digest
SHA256 c372545bac0b4385f7c09b50e5eae9572ef3f40e2e33988d9f5ac414a876742b
MD5 7f1e6c13cfcc34a9229bef3e3c373e7a
BLAKE2b-256 3255e75005a8693eeba34979f85184dd3d53687af9c663669ac7beae1033a664

See more details on using hashes here.

Provenance

The following attestation bundles were made for openbrowser_ai-0.1.38-py3-none-any.whl:

Publisher: publish.yml on billy-enrizky/openbrowser-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page