Skip to main content

AI-friendly browser automation via CDP with profile-based login persistence

Project description

harness-browser

AI-friendly browser automation via Chrome DevTools Protocol (CDP).

PyPI CI Python 3.11+ License: MIT

English · 中文

An agent-first browser runtime built on pure CDP. Predictable DOM snapshots, persistent profile sessions, and a typed Python API designed for LLM tool-calling — no Playwright, no driver layer in between.

Why harness-browser

Concern How we address it
Token cost 4-level DOM with interactive mode (~200–500 tokens) returns only clickable/typeable elements with stable refs — not raw HTML
Stable element targeting Refs (btn_2, inp_search) survive layout reflows and are auto-invalidated on navigation, so the agent never points at a stale node
Login persistence One Chrome user-data-dir per profile under ~/.harness-browser/profiles/<name>/ — log in once, every subsequent run reuses cookies and storage
No Playwright tax Pure CDP over WebSocket (websockets>=12.0); no browser binaries shipped, no patched Chromium, no driver layer
Observability Every action emits ActionMetrics (duration_ms, dom_nodes_scanned, estimated_tokens, screenshot_size_kb) plus before_action / after_action / action_error / page_navigated hooks
Configuration Seven BROWSER_USE_* env vars cover paths, ports, timeouts, and remote/Docker Chrome via BROWSER_USE_CDP_WS_URL — no code changes between dev, CI, and prod
Agent integrations Stateless browser_tool(action=..., profile=...) for any framework, MCP server (python -m harness_browser.mcp_server), and a ready-to-copy Claude Code skill in skills/

Features

  • Pure CDP — direct WebSocket connection, no Playwright dependency
  • Profile-based login persistence — Chrome user-data-dir per profile, cookies/sessions reused across runs
  • 4-level DOM outputminimal (~50 tokens), interactive (~200–500 tokens), full (~1000–3000 tokens), structured (JSON)
  • Ref system — stable element references across actions, invalidated on navigation
  • Hook systembefore_action, after_action, action_error, page_navigated
  • Per-action metricsduration_ms, estimated_tokens, screenshot_size_kb
  • Environment-variable configuration — all paths, ports, and timeouts configurable without code changes
  • Remote/Docker Chrome support — bypass launcher via BROWSER_USE_CDP_WS_URL
  • MCP Server — expose actions as MCP tools for Claude Code and other MCP clients
  • Drop-in Claude Code skill — copy skills/harness-browser/ into any agent project
  • Strict typing — mypy strict, ruff clean, 34 unit tests covering DOM, refs, hooks, settings, and CDP framing

Requirements

  • Python 3.11+
  • Chrome or Chromium
# Ubuntu/Debian
sudo apt install chromium-browser

# macOS
brew install --cask google-chrome

Installation

pip install harness-browser

Quick Start

Python API

import asyncio
from harness_browser import BrowserSession

async def main():
    async with await BrowserSession.create(profile="default") as sess:
        await sess.navigate("https://example.com")
        result = await sess.dom_tree(level="interactive")
        print(result.content)
        # → [ref=inp_1] input[text] placeholder="Search"
        # → [ref=btn_2] button "Go"
        await sess.click(ref="btn_2")

asyncio.run(main())

AI Framework Usage (stateless)

from harness_browser import browser_tool

# All calls route to the same session by profile name
result = await browser_tool(action="navigate", url="https://github.com", profile="work")
result = await browser_tool(action="dom_tree", level="interactive", profile="work")
result = await browser_tool(action="click", ref="btn_search", profile="work")
result = await browser_tool(action="type", text="harness", profile="work")

DOM Levels

Level Tokens Use case
minimal ~50 Confirm page loaded, check title/URL
interactive ~200–500 Find clickable/typeable elements (default)
full ~1000–3000 Read page content
structured varies JSON for programmatic processing

Login State Reuse

Profiles persist Chrome sessions in ~/.harness-browser/profiles/<name>/:

# First run: navigate to login page, log in manually
await browser_tool(action="navigate", url="https://github.com/login", profile="github")

# All future runs: login state reused automatically
await browser_tool(action="navigate", url="https://github.com/settings", profile="github")

Hook System

async with await BrowserSession.create(profile="work") as sess:
    @sess.on("before_action")
    async def log_action(event):
        print(f"[{event['action']}] starting")

    @sess.on("after_action")
    async def log_metrics(metrics):
        print(f"  done in {metrics.duration_ms}ms (~{metrics.estimated_tokens} tokens)")

    await sess.navigate("https://example.com")

MCP Server

python -m harness_browser.mcp_server

Add to Claude Code settings.json:

{
  "mcpServers": {
    "harness-browser": {
      "command": "python",
      "args": ["-m", "harness_browser.mcp_server"],
      "env": {
        "BROWSER_USE_MODE": "auto",
        "BROWSER_USE_PROFILES_DIR": "/data/browser-profiles"
      }
    }
  }
}

All BROWSER_USE_* environment variables can be passed through the MCP env block — this is the recommended way to configure mode, profile location, and remote CDP endpoints for an MCP-hosted browser.

Available MCP tools: browser_navigate, browser_dom_tree, browser_screenshot, browser_click, browser_type, browser_eval_js.

Screenshots

screenshot writes a PNG to disk and returns its path — never raw base64. That keeps token usage flat regardless of image size and lets dashboards preview the file directly.

# default: timestamped file in BROWSER_USE_SCREENSHOTS_DIR
result = await sess.screenshot()
print(result.content)
# → /home/user/.harness-browser/screenshots/harness-1779462725763.png

# full scrollable page (uses Page.getLayoutMetrics + captureBeyondViewport)
await sess.screenshot(full_page=True)

# crop to a single element discovered via dom_tree
await sess.screenshot(element_ref="btn_2")

# pin the file path — every call overwrites the same file
await sess.screenshot(path="/tmp/latest.png")

result.metadata carries the page url / title / width / height / size_kb / full_page so callers can render context without an extra Runtime.evaluate.

Claude Code Skill

A ready-to-use skill ships under skills/:

# Copy into another agent project as a Claude Code skill
cp -r skills/harness-browser /path/to/other-project/.codebuddy/skills/
# or the Chinese variant
cp -r skills/harness-browser-zh /path/to/other-project/.codebuddy/skills/

The skill teaches the agent the standard navigate → dom_tree → click/type loop and the ref discipline (always re-fetch DOM after navigation).

Actions Reference

Action Required Optional
navigate url
dom_tree level (default: interactive)
screenshot element_ref, full_page, path
click one of: ref, selector, x+y
type text ref
scroll direction, amount
hover ref
eval_js expression
go_back
go_forward
reload
list_tabs
new_tab url
switch_tab tab_id
close_tab tab_id
close_session

Configuration

All settings can be configured via environment variables. No code changes required.

Environment Variable Default Description
BROWSER_USE_PROFILES_DIR ~/.harness-browser/profiles Root directory for Chrome user-data-dirs
BROWSER_USE_SCREENSHOTS_DIR ~/.harness-browser/screenshots Directory where the screenshot action writes PNG files
BROWSER_USE_CDP_HOST localhost Host or IP serving Chrome's CDP HTTP/WebSocket endpoint
BROWSER_USE_CDP_PORT_START 9222 First CDP debug port assigned to profiles
BROWSER_USE_MODE auto Launch mode: auto / headed / headless. auto picks headed when DISPLAY/WAYLAND_DISPLAY is set (or on macOS/Windows), else headless
BROWSER_USE_CHROME_BIN auto-detect Absolute path to Chrome/Chromium executable
BROWSER_USE_CDP_TIMEOUT 30.0 Seconds to wait for a CDP command response
BROWSER_USE_LAUNCH_RETRIES 20 Times to poll Chrome after launch
BROWSER_USE_LAUNCH_DELAY 0.25 Seconds between launch poll attempts
BROWSER_USE_CDP_WS_URL Direct connect: bypass launcher, connect to this WebSocket URL

Common scenarios

Custom profile storage:

export BROWSER_USE_PROFILES_DIR=/data/browser-profiles

Force headed or headless mode (default is auto, which picks based on DISPLAY):

export BROWSER_USE_MODE=headless   # always headless (CI, containers)
export BROWSER_USE_MODE=headed     # always headed (force a window even without DISPLAY)
# unset / "auto" → headed when a desktop is detected, headless otherwise

Non-standard Chrome path:

export BROWSER_USE_CHROME_BIN=/opt/google/chrome/chrome

Connect to a remote or Docker Chrome (bypasses launcher entirely):

# Start Chrome with --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0
export BROWSER_USE_CDP_WS_URL="ws://remote-host:9222/devtools/browser/xxxxxxxx"

Talk to Chrome on another host or container (keeps the attach/launcher logic, just changes the host):

# Chrome already running with --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0
export BROWSER_USE_CDP_HOST=10.0.0.42
# harness will hit http://10.0.0.42:9222/json/version and use that page's WS URL

Override settings in code (useful for testing or multi-instance setups):

from harness_browser import BrowserSession, HarnessSettings

cfg = HarnessSettings(
    cdp_port_start=9300,
    cdp_timeout=60.0,
    profiles_dir="/data/profiles",
)
sess = await BrowserSession.create(profile="work", settings=cfg)

Development

# Clone
git clone https://git.woa.com/orcakit/browser-use.git
cd browser-use

# Install with dev extras
uv sync --extra dev

# Install pre-commit hooks
pre-commit install

# Run tests
make test

# Lint + type check
make lint

# Format
make format

# Build wheel
make build

Contributing

See CONTRIBUTING.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harness_browser-0.1.2.tar.gz (215.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harness_browser-0.1.2-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file harness_browser-0.1.2.tar.gz.

File metadata

  • Download URL: harness_browser-0.1.2.tar.gz
  • Upload date:
  • Size: 215.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for harness_browser-0.1.2.tar.gz
Algorithm Hash digest
SHA256 df022ce84700d255244b41e436a74d84422f0f3e80061c456d7ca354fff0f080
MD5 5fe1572966e73987d78d16cac35f6e42
BLAKE2b-256 5b93755840587be443866669abec092c0f94aedf7f26855bb9731176d37e1f99

See more details on using hashes here.

File details

Details for the file harness_browser-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: harness_browser-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for harness_browser-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5e179482b86abef545f082211fbf6c227f2882d9b78a554988be623cc6cd4944
MD5 488a64328637dd5e8ce2cfa369c98954
BLAKE2b-256 acb34ee302816b539302fd2861463563500da56efbdd8bc8240ea31b2846586c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page