Agentic browser automation using LangGraph and raw CDP

These details have not been verified by PyPI

Project description

OpenBrowser

Automating Walmart Product Scraping:

https://github.com/user-attachments/assets/ae5d74ce-0ac6-46b0-b02b-ff5518b4b20d

OpenBrowserAI Automatic Flight Booking:

https://github.com/user-attachments/assets/632128f6-3d09-497f-9e7d-e29b9cb65e0f

AI-powered browser automation using CodeAgent and CDP (Chrome DevTools Protocol)

OpenBrowser is a framework for intelligent browser automation. It combines direct CDP communication with a CodeAgent architecture, where the LLM writes Python code executed in a persistent namespace, to navigate, interact with, and extract information from web pages autonomously.

Documentation
Key Features
Installation
Quick Start
Configuration
Supported LLM Providers
Claude Code Plugin
Codex
OpenCode
OpenClaw
MCP Server
MCP Benchmark: Why OpenBrowser
CLI Usage
Project Structure
Testing
Contributing
License
Contact

Documentation

Full documentation: https://docs.openbrowser.me

Key Features

CodeAgent Architecture - LLM writes Python code in a persistent Jupyter-like namespace for browser automation
Raw CDP Communication - Direct Chrome DevTools Protocol for maximum control and speed
Vision Support - Screenshot analysis for visual understanding of pages
12+ LLM Providers - OpenAI, Anthropic, Google, Groq, AWS Bedrock, Azure OpenAI, Ollama, and more
MCP Server - Model Context Protocol support for Claude Desktop integration
Video Recording - Record browser sessions as video files

Installation

pip install openbrowser-ai

With Optional Dependencies

# Install with all LLM providers
pip install openbrowser-ai[all]

# Install specific providers
pip install openbrowser-ai[anthropic]  # Anthropic Claude
pip install openbrowser-ai[groq]       # Groq
pip install openbrowser-ai[ollama]     # Ollama (local models)
pip install openbrowser-ai[aws]        # AWS Bedrock
pip install openbrowser-ai[azure]      # Azure OpenAI

# Install with video recording support
pip install openbrowser-ai[video]

Install Browser

uvx openbrowser-ai install
# or
playwright install chromium

Quick Start

Basic Usage

import asyncio
from openbrowser import CodeAgent, ChatGoogle

async def main():
    agent = CodeAgent(
        task="Go to google.com and search for 'Python tutorials'",
        llm=ChatGoogle(model="gemini-2.0-flash"),
    )

    result = await agent.run()
    print(f"Result: {result}")

asyncio.run(main())

With Different LLM Providers

from openbrowser import CodeAgent, ChatOpenAI, ChatAnthropic, ChatGoogle

# OpenAI
agent = CodeAgent(task="...", llm=ChatOpenAI(model="gpt-4o"))

# Anthropic
agent = CodeAgent(task="...", llm=ChatAnthropic(model="claude-sonnet-4-6"))

# Google Gemini
agent = CodeAgent(task="...", llm=ChatGoogle(model="gemini-2.0-flash"))

Using Browser Session Directly

import asyncio
from openbrowser import BrowserSession, BrowserProfile

async def main():
    profile = BrowserProfile(
        headless=True,
        viewport_width=1920,
        viewport_height=1080,
    )
    
    session = BrowserSession(browser_profile=profile)
    await session.start()
    
    await session.navigate_to("https://example.com")
    screenshot = await session.screenshot()
    
    await session.stop()

asyncio.run(main())

Configuration

Environment Variables

# Google (recommended)
export GOOGLE_API_KEY="..."

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Groq
export GROQ_API_KEY="gsk_..."

# AWS Bedrock
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-west-2"

# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"

BrowserProfile Options

from openbrowser import BrowserProfile

profile = BrowserProfile(
    headless=True,
    viewport_width=1280,
    viewport_height=720,
    disable_security=False,
    extra_chromium_args=["--disable-gpu"],
    record_video_dir="./recordings",
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass",
    },
)

Supported LLM Providers

Provider	Class	Models
Google	`ChatGoogle`	gemini-2.5-flash, gemini-2.5-pro
OpenAI	`ChatOpenAI`	gpt-4.1, o4-mini, o3
Anthropic	`ChatAnthropic`	claude-sonnet-4-6, claude-opus-4-6
Groq	`ChatGroq`	llama-4-scout, qwen3-32b
AWS Bedrock	`ChatAWSBedrock`	anthropic.claude-sonnet-4-6, amazon.nova-pro
AWS Bedrock (Anthropic)	`ChatAnthropicBedrock`	Claude models via Anthropic Bedrock SDK
Azure OpenAI	`ChatAzureOpenAI`	Any Azure-deployed model
OpenRouter	`ChatOpenRouter`	Any model on openrouter.ai
DeepSeek	`ChatDeepSeek`	deepseek-chat, deepseek-reasoner
Cerebras	`ChatCerebras`	llama3.1-8b, qwen-3-coder-480b
Ollama	`ChatOllama`	llama3.1, deepseek-r1 (local)
OCI	`ChatOCIRaw`	Oracle Cloud GenAI models
Browser-Use	`ChatBrowserUse`	External LLM service

Claude Code Plugin

Install OpenBrowser as a Claude Code plugin:

# Add the marketplace (one-time)
claude plugin marketplace add billy-enrizky/openbrowser-ai

# Install the plugin
claude plugin install openbrowser@openbrowser-ai

This installs the MCP server and 5 built-in skills:

Skill	Description
`web-scraping`	Extract structured data, handle pagination
`form-filling`	Fill forms, login flows, multi-step wizards
`e2e-testing`	Test web apps by simulating user interactions
`page-analysis`	Analyze page content, structure, metadata
`accessibility-audit`	Audit pages for WCAG compliance

See plugin/README.md for detailed tool parameter documentation.

Codex

OpenBrowser works with OpenAI Codex via native skill discovery.

Quick Install

Tell Codex:

Fetch and follow instructions from https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/refs/heads/main/.codex/INSTALL.md

Manual Install

# Clone the repository
git clone https://github.com/billy-enrizky/openbrowser-ai.git ~/.codex/openbrowser

# Symlink skills for native discovery
mkdir -p ~/.agents/skills
ln -s ~/.codex/openbrowser/plugin/skills ~/.agents/skills/openbrowser

# Restart Codex

Then configure the MCP server in your project (see MCP Server below).

Detailed docs: .codex/INSTALL.md

OpenCode

OpenBrowser works with OpenCode.ai via plugin and skill symlinks.

Quick Install

Tell OpenCode:

Fetch and follow instructions from https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/refs/heads/main/.opencode/INSTALL.md

Manual Install

# Clone the repository
git clone https://github.com/billy-enrizky/openbrowser-ai.git ~/.config/opencode/openbrowser

# Create directories
mkdir -p ~/.config/opencode/plugins ~/.config/opencode/skills

# Symlink plugin and skills
ln -s ~/.config/opencode/openbrowser/.opencode/plugins/openbrowser.js ~/.config/opencode/plugins/openbrowser.js
ln -s ~/.config/opencode/openbrowser/plugin/skills ~/.config/opencode/skills/openbrowser

# Restart OpenCode

Then configure the MCP server in your project (see MCP Server below).

Detailed docs: .opencode/INSTALL.md

OpenClaw

OpenClaw does not natively support MCP servers, but the community openclaw-mcp-adapter plugin bridges MCP servers to OpenClaw agents.

Install the MCP adapter plugin (see its README for setup).
Add OpenBrowser as an MCP server in ~/.openclaw/openclaw.json:

{
  "plugins": {
    "entries": {
      "mcp-adapter": {
        "enabled": true,
        "config": {
          "servers": [
            {
              "name": "openbrowser",
              "transport": "stdio",
              "command": "uvx",
              "args": ["openbrowser-ai[mcp]", "--mcp"]
            }
          ]
        }
      }
    }
  }
}

The execute_code tool will be registered as a native OpenClaw agent tool.

For OpenClaw plugin documentation, see docs.openclaw.ai/tools/plugin.

MCP Server

OpenBrowser includes an MCP (Model Context Protocol) server that exposes browser automation as tools for AI assistants like Claude. No external LLM API keys required. The MCP client (Claude) provides the intelligence.

Quick Setup

Claude Code: add to your project's .mcp.json:

{
  "mcpServers": {
    "openbrowser": {
      "command": "uvx",
      "args": ["openbrowser-ai[mcp]", "--mcp"]
    }
  }
}

Claude Desktop: add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "openbrowser": {
      "command": "uvx",
      "args": ["openbrowser-ai[mcp]", "--mcp"],
      "env": {
        "OPENBROWSER_HEADLESS": "true"
      }
    }
  }
}

Run directly:

uvx openbrowser-ai[mcp] --mcp

Tool

The MCP server exposes a single execute_code tool that runs Python code in a persistent namespace with browser automation functions. The LLM writes Python code to navigate, interact, and extract data, returning only what was explicitly requested.

Available functions (all async, use await):

Category	Functions
Navigation	`navigate(url, new_tab)`, `go_back()`, `wait(seconds)`
Interaction	`click(index)`, `input_text(index, text, clear)`, `scroll(down, pages, index)`, `send_keys(keys)`, `upload_file(index, path)`
Dropdowns	`select_dropdown(index, text)`, `dropdown_options(index)`
Tabs	`switch(tab_id)`, `close(tab_id)`
JavaScript	`evaluate(code)`: run JS in page context, returns Python objects
State	`browser.get_browser_state_summary()`: get page metadata and interactive elements
CSS	`get_selector_from_index(index)`: get CSS selector for an element
Completion	`done(text, success)`: signal task completion

Pre-imported libraries: json, csv, re, datetime, asyncio, Path, requests, numpy, pandas, matplotlib, BeautifulSoup

Configuration

Environment Variable	Description	Default
`OPENBROWSER_HEADLESS`	Run browser without GUI	`false`
`OPENBROWSER_ALLOWED_DOMAINS`	Comma-separated domain whitelist	(none)

MCP Benchmark: Why OpenBrowser

E2E LLM Benchmark (6 Real-World Tasks, N=5 runs)

Six real-world browser tasks run through Claude Sonnet 4.6 on AWS Bedrock (Converse API) with a server-agnostic system prompt. The LLM autonomously decides which tools to call and when the task is complete. 5 runs per server with 10,000-sample bootstrap CIs. All tasks run against live websites.

#	Task	Description	Target Site
1	fact_lookup	Navigate to a Wikipedia article and extract specific facts (creator and year)	en.wikipedia.org
2	form_fill	Fill out a multi-field form (text input, radio button, checkbox) and submit	httpbin.org/forms/post
3	multi_page_extract	Extract the titles of the top 5 stories from a dynamic page	news.ycombinator.com
4	search_navigate	Search Wikipedia, click a result, and extract specific information	en.wikipedia.org
5	deep_navigation	Navigate to a GitHub repo and find the latest release version number	github.com
6	content_analysis	Analyze page structure: count headings, links, and paragraphs	example.com

E2E LLM Benchmark: MCP Server Comparison

MCP Server	Pass Rate	Duration (mean +/- std)	Tool Calls	Bedrock API Tokens
Playwright MCP (Microsoft)	100%	92.2 +/- 11.4s	11.0 +/- 1.4	150,248
Chrome DevTools MCP (Google)	100%	128.8 +/- 6.2s	19.8 +/- 0.4	310,856
OpenBrowser MCP	100%	103.1 +/- 16.4s	15.0 +/- 3.9	49,423

OpenBrowser uses 3x fewer tokens than Playwright and 6.3x fewer than Chrome DevTools, measured via Bedrock Converse API usage field (the actual billed tokens including system prompt, tool schemas, conversation history, and tool results).

Cost per Benchmark Run (6 Tasks)

Based on Bedrock API token usage (input + output tokens at respective rates).

Model	Playwright MCP	Chrome DevTools MCP	OpenBrowser MCP
Claude Sonnet 4.6 ($3/$15 per M)	$0.47	$0.96	$0.18
Claude Opus 4.6 ($5/$25 per M)	$0.78	$1.59	$0.30

Why the Difference

Playwright and Chrome DevTools return full page accessibility snapshots as tool output (~124K-135K tokens for Wikipedia). The LLM reads the entire snapshot to find what it needs.

OpenBrowser uses a CodeAgent architecture (single execute_code tool). The LLM writes Python code that processes browser state server-side and returns only extracted results (~30-1,000 chars per call). The full page content never enters the LLM context window.

Playwright: navigate to Wikipedia -> 478,793 chars (full a11y tree returned to LLM)
OpenBrowser: navigate to Wikipedia -> 42 chars (page title only, state processed in code)
             evaluate JS for infobox -> 896 chars (just the extracted data)

Full comparison with methodology

CLI Usage

# Run a browser automation task
uvx openbrowser-ai -p "Search for Python tutorials on Google"

# Install browser
uvx openbrowser-ai install

# Run MCP server
uvx openbrowser-ai[mcp] --mcp

Project Structure

openbrowser-ai/
├── .claude-plugin/            # Claude Code marketplace config
├── .codex/                    # Codex integration
│   └── INSTALL.md
├── .opencode/                 # OpenCode integration
│   ├── INSTALL.md
│   └── plugins/openbrowser.js
├── plugin/                    # Plugin package (skills + MCP config)
│   ├── .claude-plugin/
│   ├── .mcp.json
│   └── skills/                # 5 browser automation skills
├── src/openbrowser/
│   ├── __init__.py            # Main exports
│   ├── cli.py                 # CLI commands
│   ├── config.py              # Configuration
│   ├── actor/                 # Element interaction
│   ├── agent/                 # LangGraph agent
│   ├── browser/               # CDP browser control
│   ├── code_use/              # Code agent
│   ├── dom/                   # DOM extraction
│   ├── llm/                   # LLM providers
│   ├── mcp/                   # MCP server
│   └── tools/                 # Action registry
├── benchmarks/                # MCP benchmarks and E2E tests
│   ├── playwright_benchmark.py
│   ├── cdp_benchmark.py
│   ├── openbrowser_benchmark.py
│   └── e2e_published_test.py
└── tests/                     # Test suite

Testing

# Run unit tests
pytest tests/

# Run with verbose output
pytest tests/ -v

# E2E test the MCP server against the published PyPI package
uv run python benchmarks/e2e_published_test.py

Benchmarks

Run individual MCP server benchmarks (JSON-RPC stdio, 5-step Wikipedia workflow):

uv run python benchmarks/openbrowser_benchmark.py   # OpenBrowser MCP
uv run python benchmarks/playwright_benchmark.py     # Playwright MCP
uv run python benchmarks/cdp_benchmark.py            # Chrome DevTools MCP

Results are written to benchmarks/*_results.json. See full comparison for methodology.

Production deployment

AWS production infrastructure (VPC, EC2 backend, API Gateway, Cognito, DynamoDB, ECR, S3 + CloudFront) is defined in Terraform. See infra/production/terraform/README.md for architecture, prerequisites, and step-by-step deploy (ECR -> build/push image -> terraform apply).

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Email: billy.suharno@gmail.com
GitHub: @billy-enrizky
Repository: github.com/billy-enrizky/openbrowser-ai
Documentation: https://docs.openbrowser.me

Made with love for the AI automation community

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.38

Mar 17, 2026

0.1.37

Mar 16, 2026

0.1.36

Mar 15, 2026

0.1.35

Mar 15, 2026

0.1.34

Mar 9, 2026

0.1.33

Mar 8, 2026

0.1.32

Mar 7, 2026

0.1.31

Mar 7, 2026

0.1.30

Mar 7, 2026

0.1.29

Mar 1, 2026

0.1.28

Feb 26, 2026

0.1.27

Feb 22, 2026

This version

0.1.26

Feb 21, 2026

0.1.25

Feb 20, 2026

0.1.24

Feb 20, 2026

0.1.23

Feb 20, 2026

0.1.22

Feb 19, 2026

0.1.21

Feb 19, 2026

0.1.20

Feb 19, 2026

0.1.19

Feb 19, 2026

0.1.18

Feb 19, 2026

0.1.17

Feb 18, 2026

0.1.16

Feb 18, 2026

0.1.15

Feb 18, 2026

0.1.14

Feb 18, 2026

0.1.13

Feb 18, 2026

0.1.12

Jan 29, 2026

0.1.11

Jan 12, 2026

0.1.10

Jan 11, 2026

0.1.9

Jan 10, 2026

0.1.8

Jan 10, 2026

0.1.7

Jan 9, 2026

0.1.6

Jan 9, 2026

0.1.3

Jan 7, 2026

0.1.2

Jan 6, 2026

0.1.1

Jan 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openbrowser_ai-0.1.26.tar.gz (2.4 MB view details)

Uploaded Feb 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openbrowser_ai-0.1.26-py3-none-any.whl (524.8 kB view details)

Uploaded Feb 21, 2026 Python 3

File details

Details for the file openbrowser_ai-0.1.26.tar.gz.

File metadata

Download URL: openbrowser_ai-0.1.26.tar.gz
Upload date: Feb 21, 2026
Size: 2.4 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openbrowser_ai-0.1.26.tar.gz
Algorithm	Hash digest
SHA256	`41a648049888b92da24d0b4c1af789cfb5c4d0100bb1e4ea3844f33773ee11d0`
MD5	`a5653e5c09130532c977a97e0a0b07b0`
BLAKE2b-256	`2d907dbd2553e28af1eb8d3c8479cc4a500d16dd22dc716093c60c833b56fff5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for openbrowser_ai-0.1.26.tar.gz:

Publisher: publish.yml on billy-enrizky/openbrowser-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: openbrowser_ai-0.1.26.tar.gz
- Subject digest: 41a648049888b92da24d0b4c1af789cfb5c4d0100bb1e4ea3844f33773ee11d0
- Sigstore transparency entry: 975585308
- Sigstore integration time: Feb 21, 2026
Source repository:
- Permalink: billy-enrizky/openbrowser-ai@64c3194ee9f82ff1a1717c7935e6ce2abc7e68d8
- Branch / Tag: refs/tags/v0.1.26
- Owner: https://github.com/billy-enrizky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@64c3194ee9f82ff1a1717c7935e6ce2abc7e68d8
- Trigger Event: release

File details

Details for the file openbrowser_ai-0.1.26-py3-none-any.whl.

File metadata

Download URL: openbrowser_ai-0.1.26-py3-none-any.whl
Upload date: Feb 21, 2026
Size: 524.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openbrowser_ai-0.1.26-py3-none-any.whl
Algorithm	Hash digest
SHA256	`777333a86426392283f9cbd494286a7c9e5f5fc15e1d0f7386e3049d33530f76`
MD5	`f7546f841534d3a9ebc26b38a81fa65c`
BLAKE2b-256	`5e8e0b73303183e62bcce0eacea691788acdf4c377450e88b485c33b9ba59495`

See more details on using hashes here.

Provenance

The following attestation bundles were made for openbrowser_ai-0.1.26-py3-none-any.whl:

Publisher: publish.yml on billy-enrizky/openbrowser-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: openbrowser_ai-0.1.26-py3-none-any.whl
- Subject digest: 777333a86426392283f9cbd494286a7c9e5f5fc15e1d0f7386e3049d33530f76
- Sigstore transparency entry: 975585312
- Sigstore integration time: Feb 21, 2026
Source repository:
- Permalink: billy-enrizky/openbrowser-ai@64c3194ee9f82ff1a1717c7935e6ce2abc7e68d8
- Branch / Tag: refs/tags/v0.1.26
- Owner: https://github.com/billy-enrizky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@64c3194ee9f82ff1a1717c7935e6ce2abc7e68d8
- Trigger Event: release

openbrowser-ai 0.1.26

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

OpenBrowser

Table of Contents

Documentation

Key Features

Installation

With Optional Dependencies

Install Browser

Quick Start

Basic Usage

With Different LLM Providers

Using Browser Session Directly

Configuration

Environment Variables

BrowserProfile Options

Supported LLM Providers

Claude Code Plugin

Codex

Quick Install

Manual Install

OpenCode

Quick Install

Manual Install

OpenClaw

MCP Server

Quick Setup

Tool

Configuration

MCP Benchmark: Why OpenBrowser

E2E LLM Benchmark (6 Real-World Tasks, N=5 runs)

Cost per Benchmark Run (6 Tasks)

Why the Difference

CLI Usage

Project Structure

Testing

Benchmarks

Production deployment

Contributing

License

Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance