AI-friendly browser automation via CDP with profile-based login persistence
Project description
harness-browser
AI-friendly browser automation via Chrome DevTools Protocol (CDP).
English · 中文
An agent-first browser runtime built on pure CDP. Predictable DOM snapshots, persistent profile sessions, and a typed Python API designed for LLM tool-calling — no Playwright, no driver layer in between.
Why harness-browser
| Concern | How we address it |
|---|---|
| Token cost | 4-level DOM with interactive mode (~200–500 tokens) returns only clickable/typeable elements with stable refs — not raw HTML |
| Stable element targeting | Refs (btn_2, inp_search) survive layout reflows and are auto-invalidated on navigation, so the agent never points at a stale node |
| Login persistence | One Chrome user-data-dir per profile under ~/.harness-browser/profiles/<name>/ — log in once, every subsequent run reuses cookies and storage |
| No Playwright tax | Pure CDP over WebSocket (websockets>=12.0); no browser binaries shipped, no patched Chromium, no driver layer |
| Observability | Every action emits ActionMetrics (duration_ms, dom_nodes_scanned, estimated_tokens, screenshot_size_kb) plus before_action / after_action / action_error / page_navigated hooks |
| Configuration | Seven BROWSER_USE_* env vars cover paths, ports, timeouts, and remote/Docker Chrome via BROWSER_USE_CDP_WS_URL — no code changes between dev, CI, and prod |
| Agent integrations | Stateless browser_tool(action=..., profile=...) for any framework, MCP server (python -m harness_browser.mcp_server), and a ready-to-copy Claude Code skill in skills/ |
Features
- Pure CDP — direct WebSocket connection, no Playwright dependency
- Profile-based login persistence — Chrome user-data-dir per profile, cookies/sessions reused across runs
- 4-level DOM output —
minimal(~50 tokens),interactive(~200–500 tokens),full(~1000–3000 tokens),structured(JSON) - Ref system — stable element references across actions, invalidated on navigation
- Hook system —
before_action,after_action,action_error,page_navigated - Per-action metrics —
duration_ms,estimated_tokens,screenshot_size_kb - Environment-variable configuration — all paths, ports, and timeouts configurable without code changes
- Remote/Docker Chrome support — bypass launcher via
BROWSER_USE_CDP_WS_URL - MCP Server — expose actions as MCP tools for Claude Code and other MCP clients
- Drop-in Claude Code skill — copy
skills/harness-browser/into any agent project - Strict typing — mypy strict, ruff clean, 34 unit tests covering DOM, refs, hooks, settings, and CDP framing
Requirements
- Python 3.11+
- Chrome or Chromium
# Ubuntu/Debian
sudo apt install chromium-browser
# macOS
brew install --cask google-chrome
Installation
pip install harness-browser
Quick Start
Python API
import asyncio
from harness_browser import BrowserSession
async def main():
async with await BrowserSession.create(profile="default") as sess:
await sess.navigate("https://example.com")
result = await sess.dom_tree(level="interactive")
print(result.content)
# → [ref=inp_1] input[text] placeholder="Search"
# → [ref=btn_2] button "Go"
await sess.click(ref="btn_2")
asyncio.run(main())
AI Framework Usage (stateless)
from harness_browser import browser_tool
# All calls route to the same session by profile name
result = await browser_tool(action="navigate", url="https://github.com", profile="work")
result = await browser_tool(action="dom_tree", level="interactive", profile="work")
result = await browser_tool(action="click", ref="btn_search", profile="work")
result = await browser_tool(action="type", text="harness", profile="work")
DOM Levels
| Level | Tokens | Use case |
|---|---|---|
minimal |
~50 | Confirm page loaded, check title/URL |
interactive |
~200–500 | Find clickable/typeable elements (default) |
full |
~1000–3000 | Read page content |
structured |
varies | JSON for programmatic processing |
Login State Reuse
Profiles persist Chrome sessions in ~/.harness-browser/profiles/<name>/:
# First run: navigate to login page, log in manually
await browser_tool(action="navigate", url="https://github.com/login", profile="github")
# All future runs: login state reused automatically
await browser_tool(action="navigate", url="https://github.com/settings", profile="github")
Hook System
async with await BrowserSession.create(profile="work") as sess:
@sess.on("before_action")
async def log_action(event):
print(f"[{event['action']}] starting")
@sess.on("after_action")
async def log_metrics(metrics):
print(f" done in {metrics.duration_ms}ms (~{metrics.estimated_tokens} tokens)")
await sess.navigate("https://example.com")
MCP Server
python -m harness_browser.mcp_server
Add to Claude Code settings.json:
{
"mcpServers": {
"harness-browser": {
"command": "python",
"args": ["-m", "harness_browser.mcp_server"],
"env": {
"BROWSER_USE_MODE": "auto",
"BROWSER_USE_PROFILES_DIR": "/data/browser-profiles"
}
}
}
}
All BROWSER_USE_* environment variables can be passed through the MCP env block —
this is the recommended way to configure mode, profile location, and remote CDP
endpoints for an MCP-hosted browser.
Available MCP tools: browser_navigate, browser_dom_tree, browser_screenshot,
browser_click, browser_type, browser_eval_js.
Screenshots
screenshot writes a PNG to disk and returns its path — never raw base64.
That keeps token usage flat regardless of image size and lets dashboards
preview the file directly.
# default: timestamped file in BROWSER_USE_SCREENSHOTS_DIR
result = await sess.screenshot()
print(result.content)
# → /home/user/.harness-browser/screenshots/harness-1779462725763.png
# full scrollable page (uses Page.getLayoutMetrics + captureBeyondViewport)
await sess.screenshot(full_page=True)
# crop to a single element discovered via dom_tree
await sess.screenshot(element_ref="btn_2")
# pin the file path — every call overwrites the same file
await sess.screenshot(path="/tmp/latest.png")
result.metadata carries the page url / title / width / height /
size_kb / full_page so callers can render context without an extra
Runtime.evaluate.
Claude Code Skill
A ready-to-use skill ships under skills/:
# Copy into another agent project as a Claude Code skill
cp -r skills/harness-browser /path/to/other-project/.codebuddy/skills/
# or the Chinese variant
cp -r skills/harness-browser-zh /path/to/other-project/.codebuddy/skills/
The skill teaches the agent the standard navigate → dom_tree → click/type loop
and the ref discipline (always re-fetch DOM after navigation).
Actions Reference
| Action | Required | Optional |
|---|---|---|
navigate |
url |
|
dom_tree |
level (default: interactive) |
|
screenshot |
element_ref, full_page, path |
|
click |
one of: ref, selector, x+y |
|
type |
text |
ref |
scroll |
direction, amount |
|
hover |
ref |
|
eval_js |
expression |
|
go_back |
||
go_forward |
||
reload |
||
list_tabs |
||
new_tab |
url |
|
switch_tab |
tab_id |
|
close_tab |
tab_id |
|
close_session |
Configuration
All settings can be configured via environment variables. No code changes required.
| Environment Variable | Default | Description |
|---|---|---|
BROWSER_USE_PROFILES_DIR |
~/.harness-browser/profiles |
Root directory for Chrome user-data-dirs |
BROWSER_USE_SCREENSHOTS_DIR |
~/.harness-browser/screenshots |
Directory where the screenshot action writes PNG files |
BROWSER_USE_CDP_HOST |
localhost |
Host or IP serving Chrome's CDP HTTP/WebSocket endpoint |
BROWSER_USE_CDP_PORT_START |
9222 |
First CDP debug port assigned to profiles |
BROWSER_USE_MODE |
auto |
Launch mode: auto / headed / headless. auto picks headed when DISPLAY/WAYLAND_DISPLAY is set (or on macOS/Windows), else headless |
BROWSER_USE_CHROME_BIN |
auto-detect | Absolute path to Chrome/Chromium executable |
BROWSER_USE_CDP_TIMEOUT |
30.0 |
Seconds to wait for a CDP command response |
BROWSER_USE_LAUNCH_RETRIES |
20 |
Times to poll Chrome after launch |
BROWSER_USE_LAUNCH_DELAY |
0.25 |
Seconds between launch poll attempts |
BROWSER_USE_CDP_WS_URL |
— | Direct connect: bypass launcher, connect to this WebSocket URL |
Common scenarios
Custom profile storage:
export BROWSER_USE_PROFILES_DIR=/data/browser-profiles
Force headed or headless mode (default is auto, which picks based on DISPLAY):
export BROWSER_USE_MODE=headless # always headless (CI, containers)
export BROWSER_USE_MODE=headed # always headed (force a window even without DISPLAY)
# unset / "auto" → headed when a desktop is detected, headless otherwise
Non-standard Chrome path:
export BROWSER_USE_CHROME_BIN=/opt/google/chrome/chrome
Connect to a remote or Docker Chrome (bypasses launcher entirely):
# Start Chrome with --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0
export BROWSER_USE_CDP_WS_URL="ws://remote-host:9222/devtools/browser/xxxxxxxx"
Talk to Chrome on another host or container (keeps the attach/launcher logic, just changes the host):
# Chrome already running with --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0
export BROWSER_USE_CDP_HOST=10.0.0.42
# harness will hit http://10.0.0.42:9222/json/version and use that page's WS URL
Override settings in code (useful for testing or multi-instance setups):
from harness_browser import BrowserSession, HarnessSettings
cfg = HarnessSettings(
cdp_port_start=9300,
cdp_timeout=60.0,
profiles_dir="/data/profiles",
)
sess = await BrowserSession.create(profile="work", settings=cfg)
Development
# Clone
git clone https://git.woa.com/orcakit/browser-use.git
cd browser-use
# Install with dev extras
uv sync --extra dev
# Install pre-commit hooks
pre-commit install
# Run tests
make test
# Lint + type check
make lint
# Format
make format
# Build wheel
make build
Contributing
See CONTRIBUTING.md.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harness_browser-0.1.1.tar.gz.
File metadata
- Download URL: harness_browser-0.1.1.tar.gz
- Upload date:
- Size: 213.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba37c3cbc19cea5a7b6336a2e95ff3292dbda004bf41107fa7a351c8ef4d7f46
|
|
| MD5 |
7bbaa1f2d18e7e2d2177a8b153264d0b
|
|
| BLAKE2b-256 |
46fa8c008a441ca692d350854c1ad3c7d0ba7ab037fa0d72960e0cc50b46af48
|
File details
Details for the file harness_browser-0.1.1-py3-none-any.whl.
File metadata
- Download URL: harness_browser-0.1.1-py3-none-any.whl
- Upload date:
- Size: 36.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4201636b09136e95ff209bca1b9e7849b5b3bce3a3e33d25424168c0734ecbb9
|
|
| MD5 |
e55326abe9bda73818854012f604bdfd
|
|
| BLAKE2b-256 |
e0aea447a771540dfab622624582242be04b3ee824e9b3cde91e33913feed3ae
|