Control your browser from the command line via a Chrome extension + WebSocket bridge
Project description
English | 中文
browser-ctl
Browser automation built for AI agents.
Give your LLM a real Chrome browser — with your sessions, cookies, and extensions — through simple CLI commands.
pip install browser-ctl
bctl go https://github.com
bctl click "a.search-button"
bctl type "input[name=q]" "browser-ctl"
bctl press Enter
bctl screenshot results.png
The Problem with Existing Browser Automation
Tools like browser-use, Playwright MCP, and Puppeteer are powerful, but they share a set of pain points when used with AI agents:
| Pain point | Typical tools | browser-ctl |
|---|---|---|
| Heavy browser binaries — must download and manage a bundled Chromium (~400 MB) | Playwright, Puppeteer | Uses your existing Chrome — zero browser downloads |
| No access to real sessions — launches a fresh, empty browser with no cookies, logins, or extensions | browser-use, Playwright MCP | Controls your real Chrome — all sessions, cookies, and extensions intact |
| Anti-bot detection — headless browsers are flagged and blocked by many websites | Puppeteer, Playwright | Uses your real browser profile — indistinguishable from normal browsing |
| Complex SDK integration — requires importing libraries and writing async code | browser-use, Stagehand | Pure CLI with JSON output — any LLM can call bctl click "button" |
| Heavy dependencies — Playwright alone pulls ~50 MB of packages + browser binary | Playwright, Puppeteer | CLI is stdlib-only; server needs only aiohttp |
| Token-inefficient for LLMs — verbose API calls waste context window tokens | SDK-based tools | Concise commands: bctl text h1 vs pages of boilerplate |
Designed for LLM Agents
browser-ctl is purpose-built for AI agent workflows:
- Tool-calling ready — every command is a single shell call returning structured JSON, perfect for function-calling / tool-use patterns
- Built-in AI skill — ships with
SKILL.mdthat teaches AI agents (Cursor, OpenCode, etc.) the full command set and best practices - Real browser = real access — your LLM can operate on authenticated pages (Gmail, Jira, internal tools) without credential management
- Deterministic output — JSON responses with CSS-selector-based queries, no vision model needed for most tasks
- Minimal token cost —
bctl select "a.link" -l 5returns structured data in one call vs multi-step screenshot → vision → parse loops
# Install the AI skill for Cursor IDE in one command
bctl setup cursor
How It Works
AI Agent / Terminal ──HTTP──▶ Bridge Server ◀──WebSocket── Chrome Extension
(bctl CLI) (:19876) (your browser)
- CLI (
bctl) sends commands via HTTP to a local bridge server - Bridge server relays them over WebSocket to the Chrome extension
- Extension executes commands using Chrome APIs & content scripts in your real browser
- Results flow back the same path as JSON
The bridge server auto-starts on first command — no manual setup needed.
Installation
Step 1 — Install the Python package:
pip install browser-ctl
Step 2 — Load the Chrome extension:
bctl setup
Then in Chrome: chrome://extensions → Enable Developer mode → Load unpacked → select ~/.browser-ctl/extension/
Step 3 — Verify:
bctl ping
# {"success": true, "data": {"server": true, "extension": true}}
Command Reference
Navigation
| Command | Description |
|---|---|
bctl navigate <url> |
Navigate to URL (aliases: nav, go; auto-prepends https://) |
bctl back |
Go back in history |
bctl forward |
Go forward (alias: fwd) |
bctl reload |
Reload current page |
Interaction
All <sel> arguments accept CSS selectors or element refs from snapshot (e.g. e5).
| Command | Description |
|---|---|
bctl click <sel> [-i N] [-t text] |
Click element; -t filters by visible text (substring) |
bctl dblclick <sel> [-i N] [-t text] |
Double-click element |
bctl hover <sel> [-i N] [-t text] |
Hover over element; -t filters by visible text |
bctl focus <sel> [-i N] [-t text] |
Focus element |
bctl type <sel> <text> |
Type text into input/textarea (React-compatible, replaces value) |
bctl input-text <sel> <text> |
Char-by-char typing for rich text editors [--clear] [--delay ms] |
bctl press <key> |
Press key — Enter submits forms, Escape closes dialogs |
bctl check <sel> [-i N] [-t text] |
Check a checkbox or radio button |
bctl uncheck <sel> [-i N] [-t text] |
Uncheck a checkbox |
bctl scroll <dir|sel> [px] |
Scroll: up / down / top / bottom or element into view |
bctl select-option <sel> <val> |
Select dropdown option (alias: sopt) [--text] |
bctl drag <src> [target] |
Drag to element or offset [--dx N --dy N] |
DOM Query
| Command | Description |
|---|---|
bctl snapshot [--all] |
List interactive elements with refs e0, e1, … (alias: snap) |
bctl text [sel] |
Get text content (default: body) |
bctl html [sel] |
Get innerHTML |
bctl attr <sel> [name] [-i N] |
Get attribute(s) of element |
bctl select <sel> [-l N] |
List matching elements (alias: sel) |
bctl count <sel> |
Count matching elements |
bctl status |
Current page URL and title |
bctl is-visible <sel> [-i N] |
Check if element is visible (returns bounding rect) |
bctl get-value <sel> [-i N] |
Get value of form element (input / select / textarea) |
JavaScript
| Command | Description |
|---|---|
bctl eval <code> |
Execute JS in page context (auto-bypasses CSP) |
Tabs
| Command | Description |
|---|---|
bctl tabs |
List all tabs |
bctl tab <id> |
Switch to tab by ID |
bctl new-tab [url] |
Open new tab |
bctl close-tab [id] |
Close tab (default: active) |
Screenshot & Files
| Command | Description |
|---|---|
bctl screenshot [path] |
Capture screenshot (alias: ss) |
bctl download <target> [-o path] [-i N] |
Download file/image (alias: dl; -o supports absolute paths) |
bctl upload <sel> <files...> |
Upload file(s) to <input type="file"> |
Wait & Dialog
| Command | Description |
|---|---|
bctl wait <sel|seconds> [timeout] |
Wait for element or sleep |
bctl dialog [accept|dismiss] [--text val] |
Handle next alert / confirm / prompt |
Batch / Pipe
| Command | Description |
|---|---|
bctl pipe |
Read commands from stdin, one per line (JSONL output). Consecutive DOM ops are auto-batched into a single browser call |
bctl batch '<cmd1>' '<cmd2>' ... |
Execute multiple commands in one call with smart batching |
Server
| Command | Description |
|---|---|
bctl ping |
Check server & extension status |
bctl serve |
Start server in foreground |
bctl stop |
Stop server |
Examples
Snapshot workflow (recommended for AI agents)
bctl go "https://example.com"
bctl snapshot # List all interactive elements as e0, e1, …
bctl click e3 # Click by ref — no CSS selector needed
bctl type e5 "hello world" # Type into element by ref
bctl get-value e5 # Read form value
bctl is-visible e3 # Check visibility
Search and extract
bctl go "https://news.ycombinator.com"
bctl select "a.titlelink" -l 5 # Top 5 links with text, href, etc.
Click by visible text (SPA-friendly)
bctl click "button" -t "Sign in" # Click button containing "Sign in"
bctl click "a" -t "Settings" # Click link containing "Settings"
bctl click "div[role=button]" -t "Save" # Works with any element + text filter
Fill a form
bctl type "input[name=email]" "user@example.com"
bctl type "input[name=password]" "hunter2"
bctl select-option "select#country" "US"
bctl upload "input[type=file]" ./resume.pdf
bctl click "button[type=submit]"
Scroll and screenshot
bctl go "https://en.wikipedia.org/wiki/Web_browser"
bctl scroll down 1000
bctl ss page.png
Handle dialogs
bctl dialog accept # Set up handler BEFORE triggering
bctl click "#delete-button" # This triggers a confirm() dialog
Drag and drop
bctl drag ".task-card" ".done-column"
bctl drag ".range-slider" --dx 50 --dy 0
Batch / Pipe (fast multi-step)
# Pipe mode: multiple commands in one call, auto-batched
bctl pipe <<'EOF'
click "button" -t "Select tag"
wait 1
type "input[placeholder='Search']" "v1.0.0"
wait 1
click "button" -t "Create new tag"
EOF
# Batch mode: same thing as arguments
bctl batch \
'click "button" -t "Sign in"' \
'wait 1' \
'type "#email" "user@example.com"' \
'type "#password" "secret"' \
'click "button[type=submit]"'
Shell scripting
# Extract all image URLs from a page
bctl go "https://example.com"
bctl eval "JSON.stringify(Array.from(document.images).map(i=>i.src))"
# Wait for SPA content to load
bctl go "https://app.example.com/dashboard"
bctl wait ".dashboard-loaded" 15
bctl text ".metric-value"
Output Format
All commands return JSON to stdout:
// Success
{"success": true, "data": {"url": "https://example.com", "title": "Example"}}
// Error
{"success": false, "error": "Element not found: .missing"}
Non-zero exit code on errors — works naturally with set -e and && chains.
Architecture
┌─────────────────────────────────────────────────────┐
│ AI Agent / Terminal │
│ $ bctl click "button.submit" │
│ │ │
│ ▼ HTTP POST localhost:19876/command │
│ ┌──────────────────────┐ │
│ │ Bridge Server │ (Python, aiohttp) │
│ │ :19876 │ │
│ └──────────┬───────────┘ │
│ │ WebSocket │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Chrome Extension │ (Manifest V3) │
│ │ Service Worker │ │
│ └──────────┬───────────┘ │
│ │ chrome.scripting / chrome.debugger │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Your Real Browser │ (sessions, cookies, etc) │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────┘
| Component | Details |
|---|---|
| CLI | Stdlib only, communicates via HTTP |
| Bridge Server | Async relay (aiohttp), auto-daemonizes |
| Extension | MV3 service worker, auto-reconnects via chrome.alarms |
| Eval | Dual strategy: MAIN-world injection (fast) + CDP fallback (CSP-safe) |
Requirements
- Python >= 3.11
- Chrome / Chromium with the extension loaded
- macOS, Linux, or Windows
Privacy
All communication is local (127.0.0.1). No analytics, no telemetry, no external servers. See PRIVACY.md.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file browser_ctl-0.2.0.tar.gz.
File metadata
- Download URL: browser_ctl-0.2.0.tar.gz
- Upload date:
- Size: 25.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d72d306a16b27424dc7780425515744fe70ddf9565a279758ed3e16861f8560
|
|
| MD5 |
08f52f9bbac51233d2a74a1d417242a9
|
|
| BLAKE2b-256 |
fe5dc12e89e3266111737ef606e277faa803dff67aeb08bdf843dca78d0f6c82
|
Provenance
The following attestation bundles were made for browser_ctl-0.2.0.tar.gz:
Publisher:
publish.yml on mikuh/browser-ctl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
browser_ctl-0.2.0.tar.gz -
Subject digest:
1d72d306a16b27424dc7780425515744fe70ddf9565a279758ed3e16861f8560 - Sigstore transparency entry: 928193431
- Sigstore integration time:
-
Permalink:
mikuh/browser-ctl@c5d758c934489880f28c2fb10f4e3bb8ec1a48b9 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/mikuh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5d758c934489880f28c2fb10f4e3bb8ec1a48b9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file browser_ctl-0.2.0-py3-none-any.whl.
File metadata
- Download URL: browser_ctl-0.2.0-py3-none-any.whl
- Upload date:
- Size: 23.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6cbb4cbdcb0a5fe857a6626d5020646427bed3395ae9f1554d3f28e576f16149
|
|
| MD5 |
66e30eea09cdc642ee5ab1b7ac9a65fd
|
|
| BLAKE2b-256 |
444d9bf4a41d2e42031cc5f5982fe33d15c1deff5e7dcd2e4feebaa061a5d0d1
|
Provenance
The following attestation bundles were made for browser_ctl-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on mikuh/browser-ctl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
browser_ctl-0.2.0-py3-none-any.whl -
Subject digest:
6cbb4cbdcb0a5fe857a6626d5020646427bed3395ae9f1554d3f28e576f16149 - Sigstore transparency entry: 928193433
- Sigstore integration time:
-
Permalink:
mikuh/browser-ctl@c5d758c934489880f28c2fb10f4e3bb8ec1a48b9 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/mikuh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c5d758c934489880f28c2fb10f4e3bb8ec1a48b9 -
Trigger Event:
release
-
Statement type: