Skip to main content

Browser Bridge server and CLI for controlling a Chrome extension over WebSocket

Project description

Browser-Agent Bridge - Ultra-Fast Browser Control for Agents

WebSocket-only HTML-first browser bridge for remotely controlling a local Chrome extension, built as a super fast alternative to traditional vision-based browser control systems.

Why This Exists

Traditional browser relays often rely on LLM vision to understand web pages at each step. In practice, that approach is:

  1. Expensive: it consumes many tokens to repeatedly analyze visual page state.
  2. Slow: repeated visual analysis adds latency at every interaction step.
  3. Error-prone: visual perception includes noise that is less relevant than structured HTML for deterministic control.

This project exists as an HTML-first relay: the browser-side extension exposes structured observations and preprocessed HTML, so remote agents can interact with websites with lower cost, lower latency, and more reliable control.

Architecture (WS-only)

Operator CLI (remote/local)
    |
    |  ws(s)://.../ws/operator   (auth)
    v
Bridge Server
    ^
    |  ws(s)://.../ws/client     (auth)
    |
Chrome Extension (local browser)
    |
    +-- content script commands: observe/click/type/get_html/ping_tab/etc.

The extension connects outbound to server. Operator sends commands through server to a specific (instance_id, client_id).

Protocol

Client -> Server

  • auth: {kind, instance_id, client_id, token}
  • result: {kind, command_id, ok, result|error}
  • ping

Server -> Client

  • auth_ok / auth_error
  • command: {kind, command_id, type, payload, request_id, sent_at}
  • pong

Operator -> Server

  • auth: {kind, token}
  • list_clients
  • connect_status: {kind, instance_id, client_id}
  • send_command: {kind, instance_id, client_id, type, payload, timeout_s, request_id}
  • ping

Server -> Operator

  • auth_ok / auth_error
  • clients
  • connect_status
  • command_result
  • pong

Auth Modes

Set BRIDGE_AUTH_MODE:

  • static (default): compare token against BRIDGE_SHARED_TOKEN (for clients) and BRIDGE_OPERATOR_TOKEN (for operator; defaults to shared token).
    • BRIDGE_OPERATOR_TOKEN must be at least 16 chars and include lowercase, uppercase, digit, and symbol.
  • jwt: validate JWT with BRIDGE_JWT_SECRET/BRIDGE_JWT_ALG.
    • Client JWT should include matching instance_id and client_id claims.
    • Operator JWT should include role=operator.

Production safety

  • BRIDGE_ENV=production enforces strong auth config:
    • static mode: BRIDGE_SHARED_TOKEN must not be empty/dev default.
    • jwt mode: BRIDGE_JWT_SECRET must not be default.

Install (pipx recommended)

python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install browser-agent-bridge

Quick Start

1) (Optional) Generate local JWT secret file

browser-bridge setup-secret

If BRIDGE_AUTH_MODE=jwt and BRIDGE_JWT_SECRET is still default, server startup auto-loads/creates local secret file (~/.browser_bridge/jwt_secret or BRIDGE_JWT_SECRET_FILE).

2) Start server

# static mode example
export BRIDGE_AUTH_MODE=static
export BRIDGE_SHARED_TOKEN='change-me-strong-token'
export BRIDGE_OPERATOR_TOKEN='Str0ng!Operator#42'
browser-bridge-server

3) Load extension

  1. Open chrome://extensions
  2. Enable Developer mode
  3. Load unpacked extension/
  4. In popup fill:
    • Bridge Server WS URL: ws://127.0.0.1:8765/ws/client (or wss://.../ws/client)
    • Instance ID: e.g. local-instance
    • Client ID: e.g. chrome-main
    • Auth Token / JWT: client token
  5. Save + Connect

Connected tab preview:

Connected tab preview

4) Operator CLI usage

browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token 'Str0ng!Operator#42' list-clients
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token 'Str0ng!Operator#42' connect-status --instance-id local-instance --client-id chrome-main
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token 'Str0ng!Operator#42' ping-tab --instance-id local-instance --client-id chrome-main
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token 'Str0ng!Operator#42' observe --instance-id local-instance --client-id chrome-main

observe now returns stable references per node:

  • ref: stable element reference for follow-up actions
  • click_ref: reference biased toward a clickable ancestor (row/link/button)
  • clickable_selector: selector for the chosen clickable ancestor

You can pass these back to click via send-command payload using ref/click_ref and optional guardrails:

  • prefer: control (default), row, or link
  • avoid_roles: e.g. ["checkbox", "menuitem"]
  • avoid_tags: e.g. ["input"]
  • avoid_input_types: e.g. ["checkbox", "radio"]

Raw command:

browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type get_html --payload '{"max_chars":40000}'

You can also avoid shell JSON escaping with --payload-file:

cat > /tmp/cmd.json <<'JSON'
{"selector":"input[name=\"q\"]","text":"openclaw"}
JSON

browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type type --payload-file /tmp/cmd.json

get_html result includes:

  • html: captured DOM text (possibly truncated)
  • truncated: whether output was cut to payload.max_chars
  • notes: actionable recommendations (for example, increase max_chars when truncated, or set preprocess=false for rawer DOM)
  • preprocess and removed_nodes: preprocessing mode and removed-node count

Adaptive load wait (navigate, click, type, press_key):

  • Extension now waits for tab load completion before replying, but only up to 10s (adaptive: returns immediately if tab is already complete).
  • Override per command payload:
    • wait_for_load (default true)
    • wait_for_load_ms (default 10000, capped at 10000)
  • Command result includes load_wait diagnostics: waited_ms, completed, timed_out, final_status, enabled, max_wait_ms.

Example:

browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type navigate --payload '{"url":"https://example.com","wait_for_load_ms":4000}'

Human-like typing (type):

  • type now simulates typing character-by-character by default to better match human input behavior.
  • Optional payload fields:
    • human_like (default true)
    • clear_first (default true)
    • keystroke_delay_ms (default 45)
    • keystroke_jitter_ms (default 30)

Example:

browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type type --payload '{"selector":"input[name=\"q\"]","text":"hello world","keystroke_delay_ms":70,"keystroke_jitter_ms":45}'

Special keys (press_key):

  • press_key is a first-class command for non-text keyboard actions such as submit, focus traversal, and Escape handling.
  • Supported keys: Enter, Tab, Escape, Backspace, Delete, ArrowUp, ArrowDown, ArrowLeft, ArrowRight, Home, End, PageUp, PageDown, Space.
  • Key aliases are accepted for common variants like return, esc, del, up, down, left, right, and spacebar.
  • Optional payload fields:
    • any element targeting field already supported by actions: selector, ref, click_ref, or locator
    • modifier flags: alt_key, ctrl_key, meta_key, shift_key
    • focus (default true) to focus the target before dispatch
    • repeat (default false) to mark the event as an auto-repeat keypress
  • If no target is provided, press_key uses the current document.activeElement.
  • Enter, Tab, Backspace, Delete, and Space include browser-like default handling when the page does not cancel the keyboard event.

Examples:

browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type press_key --payload '{"key":"Enter","selector":"input[name=\"q\"]"}'
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type press_key --payload '{"key":"Tab","shift_key":true}'

Security Hardening

  • Use TLS in non-local deployments (wss://).
  • Use strong static tokens or JWT secret. Operator static token must include mixed-case letters, digits, symbols, and be 16+ chars.
  • Optional command allowlist: BRIDGE_COMMAND_ALLOWLIST=observe,ping_tab,get_html.
  • Optional allowed clients allowlist in static mode: BRIDGE_ALLOWED_CLIENTS=instance1:client1,instance2:client2.
  • Request idempotency/replay guard is enforced by request_id dedup window.
  • Max payload limit is enforced by BRIDGE_MAX_MESSAGE_BYTES.

Extension Connection Stability

  • The Chrome extension runs as a Manifest V3 service worker.
  • The client now sends periodic websocket ping messages after auth_ok so Chrome does not suspend an otherwise idle remote bridge connection.

Testing

pytest -v

Coverage includes WS auth success/failure, command routing, disconnect handling, wrong target routing, CLI failure paths, and reconnect replacement behavior.

Contributing

Contributions are very welcome.

If you want to help, great places to start are:

  • bug fixes and reliability improvements
  • new command handlers and protocol hardening
  • better docs and examples
  • tests for real-world edge cases

Quick contributor workflow:

  1. Fork the repo and create a focused branch.
  2. Run tests locally (pytest -v).
  3. Open a PR with a clear description, motivation, and test notes.

For detailed guidelines, see CONTRIBUTING.md.

If you have ideas but no patch yet, opening an issue/discussion is also appreciated.

License

MIT (see LICENSE).


Created by the creator of openclaw-setup.me.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browser_agent_bridge-0.2.5.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

browser_agent_bridge-0.2.5-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file browser_agent_bridge-0.2.5.tar.gz.

File metadata

  • Download URL: browser_agent_bridge-0.2.5.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for browser_agent_bridge-0.2.5.tar.gz
Algorithm Hash digest
SHA256 4dff330af0f002833cd99946af22085f81244a13f89bcbaf7f19d504a54df1da
MD5 749958799d0173c96591173b159a1308
BLAKE2b-256 b5b62a2086422abf351023050670c1c69d956aec7d69162143fc1fa4534d98a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for browser_agent_bridge-0.2.5.tar.gz:

Publisher: publish.yml on NmadeleiDev/browser_agent_bridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file browser_agent_bridge-0.2.5-py3-none-any.whl.

File metadata

File hashes

Hashes for browser_agent_bridge-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 42aa13e575e3cf8539b1622854b0330c18628d1176dd4f004aa3b084235d07c5
MD5 3e191e9e0b22ed9130646740861e1882
BLAKE2b-256 2988880e4cd18be79cbee40a6f70654dff4b3c74d094abc824560274d773edf3

See more details on using hashes here.

Provenance

The following attestation bundles were made for browser_agent_bridge-0.2.5-py3-none-any.whl:

Publisher: publish.yml on NmadeleiDev/browser_agent_bridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page