Skip to main content

A robust browser automation tool for AI agents - control browsers via CLI or IPC

Project description

agent-browser

A robust browser automation tool designed for AI agents to control browsers via CLI commands.

PyPI version License: GPL v3 Python 3.9+

🎬 Feature Showcase

The Researcher (Claude) The Architect (Gemini) The Data Op (Interpreter)
Claude Demo Gemini Demo Interpreter Demo
Autonomous research & data extraction. Cross-page architectural audits. Complex table scraping to structured data.

How to use this with Claude Code / Aider / ChatGPT

Copy-paste this prompt to let your AI pair-programmer drive agent-browser safely:

You can run shell commands on my machine. Use `agent-browser start <url> --session <name>` to launch a browser, then `agent-browser cmd <action> --session <name>` for steps like `screenshot`, `click`, `fill`, `assert_visible`, and `wait_for`. Keep sessions isolated by always passing `--session <name>` and stop them with `agent-browser stop --session <name>` when done. Screenshots land in ./screenshots. Avoid writing outside the project; use relative paths only. If you need to upload a file, ask me for the path first.

Why This Exists

AI agents (like Claude Code, Codex, GPT-based tools) need to interact with web applications for testing and automation. However, most browser automation tools require:

  • Programmatic API access within a running process
  • Complex async/await patterns
  • Persistent connections

agent-browser solves this by providing:

  • Simple CLI commands - Any process that can run shell commands can control a browser
  • File-based IPC - Stateless CLI commands control a stateful browser session
  • Multi-session support - Run multiple browser sessions concurrently
  • Built for AI - Screenshots auto-resize for vision models, assertions return clear PASS/FAIL

Installation

pip install ai-agent-browser
playwright install chromium

Quick Start

# Terminal 1: Start browser (blocks while running)
agent-browser start http://localhost:8080

# Terminal 2: Send commands
agent-browser cmd screenshot home
agent-browser cmd click "button[type='submit']"
agent-browser cmd fill "#email" "test@example.com"
agent-browser cmd assert_visible ".success-message"

# When done
agent-browser stop

Security Features

  • Path traversal protection on file paths (screenshots, uploads) to keep writes inside allowed directories.
  • Session isolation via explicit --session flags so concurrent agents stay sandboxed from each other.

Architecture

+-------------------+       +----------------------+       +------------------+
| AI Agent / LLM    | <-->  | CLI + IPC files      | <-->  | Browser (PW)     |
| (Claude, Codex)   |       | cmd.json / result    |       | Chromium/Playwr. |
+-------------------+       +----------------------+       +------------------+

The browser runs in one process, listening for commands via JSON files. CLI commands write to cmd.json, the browser processes them and writes results to result.json. This decoupled architecture allows any process to control the browser.

Command Reference

Browser Control

Command Description Example
start <url> Start browser session (blocks) agent-browser start http://localhost:8080
start <url> --visible Start in headed mode agent-browser start http://localhost:8080 --visible
stop Close browser agent-browser stop
status Check if browser is running agent-browser status
cmd reload Reload current page agent-browser cmd reload
cmd goto <url> Navigate to URL agent-browser cmd goto http://example.com
cmd back Navigate back agent-browser cmd back
cmd forward Navigate forward agent-browser cmd forward
cmd url Print current URL agent-browser cmd url
cmd viewport <w> <h> Set viewport size agent-browser cmd viewport 1920 1080

Screenshots

Command Description Example
cmd screenshot [name] Full-page screenshot agent-browser cmd screenshot checkout_page
cmd screenshot viewport [name] Viewport only (faster) agent-browser cmd screenshot viewport header
cmd ss [name] Alias for screenshot agent-browser cmd ss step1

Screenshots are automatically resized to max 1500x1500 for AI vision model compatibility.

Interactions

Command Description Example
cmd click <selector> Click element agent-browser cmd click "#submit-btn"
cmd click_nth <selector> <n> Click nth element (0-indexed) agent-browser cmd click_nth ".item" 2
cmd fill <selector> <text> Fill input field agent-browser cmd fill "#email" "test@example.com"
cmd type <selector> <text> Type with key events agent-browser cmd type "#search" "query"
cmd select <selector> <value> Select dropdown option agent-browser cmd select "#country" "US"
cmd press <key> Press keyboard key agent-browser cmd press Enter
cmd scroll <direction> Scroll page agent-browser cmd scroll down
cmd hover <selector> Hover over element agent-browser cmd hover ".tooltip-trigger"
cmd focus <selector> Focus element agent-browser cmd focus "#input"
cmd upload <selector> <path> Upload file agent-browser cmd upload "#file" ./doc.pdf
cmd dialog <action> [text] Handle dialog agent-browser cmd dialog accept
cmd clear Clear localStorage/sessionStorage agent-browser cmd clear

Scroll directions: up, down, top, bottom, left, right

Dialog actions: accept, dismiss, accept <prompt_text>

Assertions

All assertions return [PASS] or [FAIL] prefix for easy parsing.

Command Description Example
cmd assert_visible <selector> Element is visible agent-browser cmd assert_visible ".modal"
cmd assert_hidden <selector> Element is hidden agent-browser cmd assert_hidden ".loading"
cmd assert_text <selector> <text> Element contains text agent-browser cmd assert_text ".msg" "Success"
cmd assert_text_exact <sel> <text> Text matches exactly agent-browser cmd assert_text_exact ".count" "42"
cmd assert_value <selector> <value> Input has value agent-browser cmd assert_value "#email" "test@example.com"
cmd assert_checked <selector> Checkbox is checked agent-browser cmd assert_checked "#agree"
cmd assert_url <pattern> URL contains pattern agent-browser cmd assert_url "/dashboard"

Data Extraction

Command Description Example
cmd text <selector> Get text content agent-browser cmd text ".title"
cmd value <selector> Get input value agent-browser cmd value "#email"
cmd attr <selector> <attr> Get attribute agent-browser cmd attr "a" "href"
cmd count <selector> Count matching elements agent-browser cmd count ".item"
cmd eval <javascript> Execute JavaScript agent-browser cmd eval "document.title"
cmd cookies Get all cookies (JSON) agent-browser cmd cookies
cmd storage Get localStorage (JSON) agent-browser cmd storage

Debugging

Command Description Example
cmd console View JS console logs agent-browser cmd console
cmd network View network requests agent-browser cmd network
cmd network_failed View failed requests agent-browser cmd network_failed
cmd clear_logs Clear console/network logs agent-browser cmd clear_logs
cmd wait <ms> Wait milliseconds agent-browser cmd wait 2000
cmd wait_for <selector> [ms] Wait for element agent-browser cmd wait_for ".loaded" 15000
cmd wait_for_text <text> Wait for text agent-browser cmd wait_for_text "Complete"
cmd help Show help agent-browser cmd help

Flag Tips

  • cmd --timeout <seconds> overrides the IPC wait when sending commands (e.g., agent-browser cmd --timeout 30 wait_for ".loaded" 20000).
  • interact --headless runs the interactive REPL without opening a visible browser window (e.g., agent-browser interact http://localhost:8080 --headless).

Session Management

Run multiple browser sessions concurrently using session IDs:

# Start two sessions
agent-browser start http://localhost:8080 --session app1
agent-browser start http://localhost:9090 --session app2

# Send commands to specific sessions
agent-browser cmd screenshot home --session app1
agent-browser cmd click "#login" --session app2

# Check status
agent-browser status --session app1

# Stop specific session
agent-browser stop --session app1

Configuration

Screenshot Output Directory

agent-browser start http://localhost:8080 --output-dir ./my-screenshots

Timeouts

Default timeouts:

  • Command timeout: 5 seconds (click, fill, etc.)
  • wait_for timeout: 10 seconds (can override: wait_for .element 15000)
  • IPC timeout: 10 seconds (waiting for browser response) — increase with cmd --timeout <seconds> if your action needs more time.

Selectors

Use standard Playwright/CSS selectors:

# CSS selectors
agent-browser cmd click ".btn-primary"
agent-browser cmd click "#submit"
agent-browser cmd click "button[type='submit']"
agent-browser cmd click "[data-testid='login-btn']"

# Text selectors
agent-browser cmd click "text='Sign In'"
agent-browser cmd click "text=Submit"

# Chained selectors
agent-browser cmd click ".card >> text='Edit'"

Interactive Mode

For manual testing with AI assistance:

agent-browser interact http://localhost:8080

Headless REPL run:

agent-browser interact http://localhost:8080 --headless

This starts a REPL where you can type commands directly:

> ss initial
Screenshot saved: ./screenshots/interactive/step_01_initial.png
> click #login
Clicked: #login
> ss after_login
Screenshot saved: ./screenshots/interactive/step_02_after_login.png
> quit

Integration with AI Agents

Claude Code Example

# In Claude Code conversation:
# "Test the login flow on localhost:8080"

# Claude runs:
agent-browser start http://localhost:8080 --session test1 &
sleep 2
agent-browser cmd screenshot login_page --session test1
# Claude analyzes screenshot...
agent-browser cmd fill "#username" "testuser" --session test1
agent-browser cmd fill "#password" "testpass" --session test1
agent-browser cmd click "button[type='submit']" --session test1
agent-browser cmd wait_for ".dashboard" --session test1
agent-browser cmd assert_url "/dashboard" --session test1
agent-browser cmd screenshot success --session test1
agent-browser stop --session test1

Generic LLM Integration

import subprocess

def browser_cmd(cmd: str, session: str = "default") -> str:
    result = subprocess.run(
        ["agent-browser", "cmd", *cmd.split(), "--session", session],
        capture_output=True, text=True
    )
    return result.stdout.strip()

# Start browser (in separate process)
subprocess.Popen(["agent-browser", "start", "http://localhost:8080", "--session", "test"])

# Send commands
browser_cmd("screenshot initial", "test")
browser_cmd("click #login", "test")
browser_cmd("assert_visible .dashboard", "test")

File Locations

File Location Purpose
State %TEMP%/agent_browser_{session}_state.json Browser running state
Commands %TEMP%/agent_browser_{session}_cmd.json Pending command
Results %TEMP%/agent_browser_{session}_result.json Command result
Console logs %TEMP%/agent_browser_{session}_console.json JS console output
Network logs %TEMP%/agent_browser_{session}_network.json Network requests
Screenshots ./screenshots/ (configurable) Captured screenshots

Troubleshooting

Problem Solution
Timeout waiting for result Browser may have crashed - run status to check
Element not found Use count to verify selector matches elements
Browser not responding Run status to ping the browser
Browser process has died State was stale - run start <url> to restart
Complex selector failing Use eval with JavaScript as fallback

Debug Workflow

# 1. Check browser status
agent-browser status

# 2. Check for JS errors
agent-browser cmd console

# 3. Check for failed requests
agent-browser cmd network_failed

# 4. Take screenshot to see current state
agent-browser cmd screenshot debug

# 5. Count elements to verify selector
agent-browser cmd count ".my-selector"

Python API

You can also use agent-browser as a Python library:

from agent_browser import BrowserDriver

driver = BrowserDriver(session_id="test", output_dir="./screenshots")

# Start browser (blocking - run in thread/process)
# driver.start("http://localhost:8080")

# Or send commands to running browser
result = driver.send_command("screenshot home")
print(result)

status = driver.status()  # Returns True if running

Contributing

See CONTRIBUTING.md for guidelines.

License

GNU General Public License v3.0 - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_agent_browser-0.1.3.tar.gz (27.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_agent_browser-0.1.3-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file ai_agent_browser-0.1.3.tar.gz.

File metadata

  • Download URL: ai_agent_browser-0.1.3.tar.gz
  • Upload date:
  • Size: 27.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_agent_browser-0.1.3.tar.gz
Algorithm Hash digest
SHA256 2487cbb1ee25f3c3d41b720c5286865505c5cf595b2b81d48477d8951739f7c8
MD5 82548a55d7421085930148afece33398
BLAKE2b-256 bf51fc0537ed2b19aa133d92350191b31a822a8e613d3bb7549b6b128296f210

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_agent_browser-0.1.3.tar.gz:

Publisher: publish.yml on abhinav-nigam/agent-browser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_agent_browser-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_agent_browser-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c74259499fb9f5a2cf1467e23aaa650efb81b7e79312efddb810c643c3cd4ab9
MD5 f0d3a11dd5d870a7a3e4c86ffd2757db
BLAKE2b-256 98e1333b81ac37524b803f569c75eccbf3ed7dd9ca1042e4d5d4aaa9f9f99d00

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_agent_browser-0.1.3-py3-none-any.whl:

Publisher: publish.yml on abhinav-nigam/agent-browser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page