Skip to main content

Browser automation daemon + CLI for coding agents. Persistent sessions, no MCP, no extensions.

Project description

Browser CLI

If you are an LLM, see AGENTS.md for quick setup and usage instructions.

A lightweight, self-hosted browser automation tool with a background daemon and CLI client. Enables authenticated web automation, screenshots, DOM snapshots, and page interactions via simple CLI commands. Share the SKILL.md file with your coding agent harness for seamless integration.

Why This Exists

Coding agents need to interact with authenticated web apps. Existing solutions all have tradeoffs:

  • Chrome DevTools MCP — requires Node.js, per-agent MCP server configuration, Google telemetry by default, and complex setup for each coding agent
  • BrowserMCP and similar tools — require installing Chrome extensions, tie into specific ecosystems, and use MCP which bloats the agent's context window with tool definitions and protocol overhead
  • Playwright/Puppeteer scripts — require writing code for every interaction, no persistent auth state
  • AI browser frameworks — heavy, opinionated, and framework-locked

Browser CLI solves this with a persistent daemon that any agent can call via subprocess. No extensions, no MCP config, no SDKs, no ecosystem lock-in. Sessions persist across agent calls so you only log in once.

Install

pipx install browser-automation-cli
browser install

If commands are not found after install, add ~/.local/bin to your PATH:

export PATH="$HOME/.local/bin:$PATH"

Quick Start

1. Start the daemon

browser-daemon

A browser window will open. Keep this terminal running.

2. Create a session

browser create

This opens a fresh browser window. Manually log into any sites you need (GitHub, Jira, etc.).

3. Run browser actions

# Navigate to a site
browser <session_id> navigate https://github.com

# Get page elements and their CSS selectors
browser <session_id> snapshot

# Click an element using a CSS selector
browser <session_id> click "button.login-btn"

# Type text into an input
browser <session_id> type "input[name=search]" "query"

# Take a screenshot (JPEG, saved to /tmp)
browser <session_id> screenshot

4. Manage sessions

browser list          # List active sessions
browser delete <id>   # Delete a session

5. Stop the daemon

Press Ctrl+C in the terminal running browser-daemon.


Commands Reference

Standalone (No Daemon Required)

Quick screenshot capture using headless Playwright. Uses JPEG format for efficient file sizes.

browser capture <url> [options]

Options:

Flag Description
-f, --full-page Capture full scrollable page (default: viewport only)
-o, --output <path> Custom output path

Examples:

browser capture https://example.com
browser capture https://example.com -f
browser capture https://example.com -o ./screenshot.jpg
browser capture http://localhost:3000

Daemon Commands

Requires browser-daemon running and an active session.

Command Description
browser install Install Chromium runtime
browser cleanup Kill stale Chrome processes
browser create Create new session (opens browser for login)
browser list List active sessions
browser <id> navigate <url> Navigate to URL
browser <id> snapshot [selector] Get page elements with CSS selectors
browser <id> click <selector> Click element
browser <id> type <selector> <text> Type text into input
browser <id> hover <selector> Hover element
browser <id> select <selector> <value> Select dropdown option
browser <id> press <key> Press keyboard key
browser <id> screenshot [selector] [-o <path>] Take screenshot (full page or element)
browser <id> back Go back
browser <id> forward Go forward
browser <id> delete Delete session

Architecture

  • Daemon (browser-daemon): Unix socket server managing persistent Playwright browser contexts. Each session is an isolated browser context with cookies/auth state.
  • CLI (browser): Sends commands to the daemon via Unix socket, or runs standalone capture directly via Playwright.
  • Session model: One session = one authenticated browser context. Sessions persist until deleted. Multiple sessions can run in parallel. Multiple agents can share the same session ID.

Anti-Detection

  • navigator.webdriver hidden via add_init_script
  • Explicit desktop Chrome user agent
  • 1920x1080 viewport to avoid mobile layouts

Output Format

All commands return JSON. Check success field first.

Action response:

{
  "success": true,
  "url": "https://github.com",
  "title": "GitHub"
}

Snapshot response:

{
  "success": true,
  "url": "https://github.com",
  "title": "GitHub",
  "scrollY": 0,
  "viewportHeight": 1080,
  "documentHeight": 2400,
  "elements": [
    {
      "ref": "el_0",
      "tag": "a",
      "selector": "a.header-link",
      "text": "Pull requests",
      "interactive": true,
      "href": "https://github.com/pulls",
      "ariaLabel": null
    }
  ]
}

Screenshot response:

{
  "success": true,
  "path": "/tmp/browser_screenshot_1234567890.jpg",
  "format": "jpeg"
}

Using with Coding Agents

Share the SKILL.md file with your coding agent harness. It contains agent-specific instructions, workflow patterns, and decision guides for when to use standalone vs daemon commands.

See AGENTS.md for complete agent integration guide.

Troubleshooting

"Command not found: browser"

export PATH="$HOME/.local/bin:$PATH"
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc

"Daemon not running"

browser-daemon

Browser doesn't open

browser install

Session not found

browser list

Stale Chrome processes

browser cleanup

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browser_automation_cli-0.1.3.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

browser_automation_cli-0.1.3-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file browser_automation_cli-0.1.3.tar.gz.

File metadata

  • Download URL: browser_automation_cli-0.1.3.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for browser_automation_cli-0.1.3.tar.gz
Algorithm Hash digest
SHA256 9a0e6db818b60c09663966955c25c85a07c3cdd47d30914d64c2300c2ff94093
MD5 cb42b32a701d32463b8f5c408b032e02
BLAKE2b-256 604751cb50f8702259abdfa2070ebbee92bac9f31065430a887dc5ab2b32838a

See more details on using hashes here.

File details

Details for the file browser_automation_cli-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for browser_automation_cli-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 cf82bfba590d04e9d6f2e78f5915d55bab26cecfe60c6f589d32655b21824591
MD5 639d1820ff79dc8ccb3b8b649a39dd8c
BLAKE2b-256 1e84ac93631d5d1c9f0e8484d3a7cd756478e45e297edeac0457f8e58cafcf9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page