Skip to main content

Control your browser from the command line via a Chrome extension + WebSocket bridge

Project description

English | 中文

browser-ctl

Browser automation built for AI agents.
Give your LLM a real Chrome browser — with your sessions, cookies, and extensions — through simple CLI commands.

PyPI Python License


pip install browser-ctl

bctl go https://github.com
bctl click "a.search-button"
bctl type "input[name=q]" "browser-ctl"
bctl press Enter
bctl screenshot results.png

The Problem with Existing Browser Automation

Tools like browser-use, Playwright MCP, and Puppeteer are powerful, but they share a set of pain points when used with AI agents:

Pain point Typical tools browser-ctl
Heavy browser binaries — must download and manage a bundled Chromium (~400 MB) Playwright, Puppeteer Uses your existing Chrome — zero browser downloads
No access to real sessions — launches a fresh, empty browser with no cookies, logins, or extensions browser-use, Playwright MCP Controls your real Chrome — all sessions, cookies, and extensions intact
Anti-bot detection — headless browsers are flagged and blocked by many websites Puppeteer, Playwright Uses your real browser profile — indistinguishable from normal browsing
Complex SDK integration — requires importing libraries and writing async code browser-use, Stagehand Pure CLI with JSON output — any LLM can call bctl click "button"
Heavy dependencies — Playwright alone pulls ~50 MB of packages + browser binary Playwright, Puppeteer CLI is stdlib-only; server needs only aiohttp
Token-inefficient for LLMs — verbose API calls waste context window tokens SDK-based tools Concise commands: bctl text h1 vs pages of boilerplate

Designed for LLM Agents

browser-ctl is purpose-built for AI agent workflows:

  • Tool-calling ready — every command is a single shell call returning structured JSON, perfect for function-calling / tool-use patterns
  • Built-in AI skill — ships with SKILL.md that teaches AI agents (Cursor, OpenCode, etc.) the full command set and best practices
  • Real browser = real access — your LLM can operate on authenticated pages (Gmail, Jira, internal tools) without credential management
  • Deterministic output — JSON responses with CSS-selector-based queries, no vision model needed for most tasks
  • Minimal token costbctl select "a.link" -l 5 returns structured data in one call vs multi-step screenshot → vision → parse loops
# Install the AI skill for Cursor IDE in one command
bctl setup cursor

How It Works

AI Agent / Terminal  ──HTTP──▶  Bridge Server  ◀──WebSocket──  Chrome Extension
     (bctl CLI)                  (:19876)                      (your browser)
  1. CLI (bctl) sends commands via HTTP to a local bridge server
  2. Bridge server relays them over WebSocket to the Chrome extension
  3. Extension executes commands using Chrome APIs & content scripts in your real browser
  4. Results flow back the same path as JSON

The bridge server auto-starts on first command — no manual setup needed.


Installation

Step 1 — Install the Python package:

pip install browser-ctl

Step 2 — Load the Chrome extension:

bctl setup

Then in Chrome: chrome://extensions → Enable Developer modeLoad unpacked → select ~/.browser-ctl/extension/

Step 3 — Verify:

bctl ping
# {"success": true, "data": {"server": true, "extension": true}}

Command Reference

Navigation

Command Description
bctl navigate <url> Navigate to URL   (aliases: nav, go; auto-prepends https://)
bctl back Go back in history
bctl forward Go forward   (alias: fwd)
bctl reload Reload current page

Interaction

Command Description
bctl click <sel> [-i N] [-t text] Click element; -t filters by visible text (substring)
bctl hover <sel> [-i N] [-t text] Hover over element; -t filters by visible text
bctl type <sel> <text> Type text into input/textarea (React-compatible)
bctl press <key> Press key — Enter submits forms, Escape closes dialogs
bctl scroll <dir|sel> [px] Scroll: up / down / top / bottom or element into view
bctl select-option <sel> <val> Select dropdown option   (alias: sopt) [--text]
bctl drag <src> [target] Drag to element or offset [--dx N --dy N]

DOM Query

Command Description
bctl text [sel] Get text content (default: body)
bctl html [sel] Get innerHTML
bctl attr <sel> [name] [-i N] Get attribute(s) of element
bctl select <sel> [-l N] List matching elements   (alias: sel)
bctl count <sel> Count matching elements
bctl status Current page URL and title

JavaScript

Command Description
bctl eval <code> Execute JS in page context (auto-bypasses CSP)

Tabs

Command Description
bctl tabs List all tabs
bctl tab <id> Switch to tab by ID
bctl new-tab [url] Open new tab
bctl close-tab [id] Close tab (default: active)

Screenshot & Files

Command Description
bctl screenshot [path] Capture screenshot   (alias: ss)
bctl download <target> [-o path] [-i N] Download file/image   (alias: dl; -o supports absolute paths)
bctl upload <sel> <files...> Upload file(s) to <input type="file">

Wait & Dialog

Command Description
bctl wait <sel|seconds> [timeout] Wait for element or sleep
bctl dialog [accept|dismiss] [--text val] Handle next alert / confirm / prompt

Batch / Pipe

Command Description
bctl pipe Read commands from stdin, one per line (JSONL output). Consecutive DOM ops are auto-batched into a single browser call
bctl batch '<cmd1>' '<cmd2>' ... Execute multiple commands in one call with smart batching

Server

Command Description
bctl ping Check server & extension status
bctl serve Start server in foreground
bctl stop Stop server

Examples

Search and extract
bctl go "https://news.ycombinator.com"
bctl select "a.titlelink" -l 5       # Top 5 links with text, href, etc.
Click by visible text (SPA-friendly)
bctl click "button" -t "Sign in"        # Click button containing "Sign in"
bctl click "a" -t "Settings"            # Click link containing "Settings"
bctl click "div[role=button]" -t "Save" # Works with any element + text filter
Fill a form
bctl type "input[name=email]" "user@example.com"
bctl type "input[name=password]" "hunter2"
bctl select-option "select#country" "US"
bctl upload "input[type=file]" ./resume.pdf
bctl click "button[type=submit]"
Scroll and screenshot
bctl go "https://en.wikipedia.org/wiki/Web_browser"
bctl scroll down 1000
bctl ss page.png
Handle dialogs
bctl dialog accept              # Set up handler BEFORE triggering
bctl click "#delete-button"     # This triggers a confirm() dialog
Drag and drop
bctl drag ".task-card" ".done-column"
bctl drag ".range-slider" --dx 50 --dy 0
Batch / Pipe (fast multi-step)
# Pipe mode: multiple commands in one call, auto-batched
bctl pipe <<'EOF'
click "button" -t "Select tag"
wait 1
type "input[placeholder='Search']" "v1.0.0"
wait 1
click "button" -t "Create new tag"
EOF

# Batch mode: same thing as arguments
bctl batch \
  'click "button" -t "Sign in"' \
  'wait 1' \
  'type "#email" "user@example.com"' \
  'type "#password" "secret"' \
  'click "button[type=submit]"'
Shell scripting
# Extract all image URLs from a page
bctl go "https://example.com"
bctl eval "JSON.stringify(Array.from(document.images).map(i=>i.src))"

# Wait for SPA content to load
bctl go "https://app.example.com/dashboard"
bctl wait ".dashboard-loaded" 15
bctl text ".metric-value"

Output Format

All commands return JSON to stdout:

// Success
{"success": true, "data": {"url": "https://example.com", "title": "Example"}}

// Error
{"success": false, "error": "Element not found: .missing"}

Non-zero exit code on errors — works naturally with set -e and && chains.


Architecture

┌─────────────────────────────────────────────────────┐
│  AI Agent / Terminal                                │
│  $ bctl click "button.submit"                       │
│       │                                             │
│       ▼  HTTP POST localhost:19876/command           │
│  ┌──────────────────────┐                           │
│  │   Bridge Server      │  (Python, aiohttp)        │
│  │   :19876             │                           │
│  └──────────┬───────────┘                           │
│             │  WebSocket                            │
│             ▼                                       │
│  ┌──────────────────────┐                           │
│  │  Chrome Extension    │  (Manifest V3)            │
│  │  Service Worker      │                           │
│  └──────────┬───────────┘                           │
│             │  chrome.scripting / chrome.debugger    │
│             ▼                                       │
│  ┌──────────────────────┐                           │
│  │  Your Real Browser   │  (sessions, cookies, etc) │
│  └──────────────────────┘                           │
└─────────────────────────────────────────────────────┘
Component Details
CLI Stdlib only, communicates via HTTP
Bridge Server Async relay (aiohttp), auto-daemonizes
Extension MV3 service worker, auto-reconnects via chrome.alarms
Eval Dual strategy: MAIN-world injection (fast) + CDP fallback (CSP-safe)

Requirements

  • Python >= 3.11
  • Chrome / Chromium with the extension loaded
  • macOS, Linux, or Windows

Privacy

All communication is local (127.0.0.1). No analytics, no telemetry, no external servers. See PRIVACY.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browser_ctl-0.1.3.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

browser_ctl-0.1.3-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file browser_ctl-0.1.3.tar.gz.

File metadata

  • Download URL: browser_ctl-0.1.3.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for browser_ctl-0.1.3.tar.gz
Algorithm Hash digest
SHA256 411ef172511f891031690f0a96acc18bfb79d36106b3a668682b3c5eb32e5c67
MD5 d2453eb914b455bd0767794280f7bb81
BLAKE2b-256 a7a0bf1565bda1783ff4a39e7b5e1f54ccab5cbc58326535ead13b6523b0e9db

See more details on using hashes here.

Provenance

The following attestation bundles were made for browser_ctl-0.1.3.tar.gz:

Publisher: publish.yml on mikuh/browser-ctl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file browser_ctl-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: browser_ctl-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for browser_ctl-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6f03cd55a255e783a419d64352488dbcaeea9e72f8063ed57330658c5889dfc1
MD5 4eeb3c424b1c322a9d3a10e543213443
BLAKE2b-256 3a5fcfe168747feff145c579d3f21c36568c17fef30051ab2226f5626e3660f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for browser_ctl-0.1.3-py3-none-any.whl:

Publisher: publish.yml on mikuh/browser-ctl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page