A robust browser automation tool for AI agents - control browsers via CLI or IPC
Project description
agent-browser
A robust browser automation tool designed for AI agents to control browsers via CLI commands.
Why This Exists
AI agents (like Claude Code, Codex, GPT-based tools) need to interact with web applications for testing and automation. However, most browser automation tools require:
- Programmatic API access within a running process
- Complex async/await patterns
- Persistent connections
agent-browser solves this by providing:
- Simple CLI commands - Any process that can run shell commands can control a browser
- File-based IPC - Stateless CLI commands control a stateful browser session
- Multi-session support - Run multiple browser sessions concurrently
- Built for AI - Screenshots auto-resize for vision models, assertions return clear PASS/FAIL
Installation
pip install ai-agent-browser
playwright install chromium
Quick Start
# Terminal 1: Start browser (blocks while running)
agent-browser start http://localhost:8080
# Terminal 2: Send commands
agent-browser cmd screenshot home
agent-browser cmd click "button[type='submit']"
agent-browser cmd fill "#email" "test@example.com"
agent-browser cmd assert_visible ".success-message"
# When done
agent-browser stop
Architecture
┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐
│ AI Agent │ │ IPC Files │ │ Browser │
│ (Claude Code, │────▶│ (temp dir) │────▶│ (Playwright) │
│ Codex, etc) │ │ │ │ │
│ │◀────│ cmd.json │◀────│ Chromium │
│ CLI commands │ │ result.json │ │ │
└─────────────────┘ └──────────────┘ └─────────────────┘
The browser runs in one process, listening for commands via JSON files. CLI commands write to cmd.json, the browser processes them and writes results to result.json. This decoupled architecture allows any process to control the browser.
Command Reference
Browser Control
| Command | Description | Example |
|---|---|---|
start <url> |
Start browser session (blocks) | agent-browser start http://localhost:8080 |
start <url> --visible |
Start in headed mode | agent-browser start http://localhost:8080 --visible |
stop |
Close browser | agent-browser stop |
status |
Check if browser is running | agent-browser status |
cmd reload |
Reload current page | agent-browser cmd reload |
cmd goto <url> |
Navigate to URL | agent-browser cmd goto http://example.com |
cmd back |
Navigate back | agent-browser cmd back |
cmd forward |
Navigate forward | agent-browser cmd forward |
cmd url |
Print current URL | agent-browser cmd url |
cmd viewport <w> <h> |
Set viewport size | agent-browser cmd viewport 1920 1080 |
Screenshots
| Command | Description | Example |
|---|---|---|
cmd screenshot [name] |
Full-page screenshot | agent-browser cmd screenshot checkout_page |
cmd screenshot viewport [name] |
Viewport only (faster) | agent-browser cmd screenshot viewport header |
cmd ss [name] |
Alias for screenshot | agent-browser cmd ss step1 |
Screenshots are automatically resized to max 1500x1500 for AI vision model compatibility.
Interactions
| Command | Description | Example |
|---|---|---|
cmd click <selector> |
Click element | agent-browser cmd click "#submit-btn" |
cmd click_nth <selector> <n> |
Click nth element (0-indexed) | agent-browser cmd click_nth ".item" 2 |
cmd fill <selector> <text> |
Fill input field | agent-browser cmd fill "#email" "test@example.com" |
cmd type <selector> <text> |
Type with key events | agent-browser cmd type "#search" "query" |
cmd select <selector> <value> |
Select dropdown option | agent-browser cmd select "#country" "US" |
cmd press <key> |
Press keyboard key | agent-browser cmd press Enter |
cmd scroll <direction> |
Scroll page | agent-browser cmd scroll down |
cmd hover <selector> |
Hover over element | agent-browser cmd hover ".tooltip-trigger" |
cmd focus <selector> |
Focus element | agent-browser cmd focus "#input" |
cmd upload <selector> <path> |
Upload file | agent-browser cmd upload "#file" ./doc.pdf |
cmd dialog <action> [text] |
Handle dialog | agent-browser cmd dialog accept |
cmd clear |
Clear localStorage/sessionStorage | agent-browser cmd clear |
Scroll directions: up, down, top, bottom, left, right
Dialog actions: accept, dismiss, accept <prompt_text>
Assertions
All assertions return [PASS] or [FAIL] prefix for easy parsing.
| Command | Description | Example |
|---|---|---|
cmd assert_visible <selector> |
Element is visible | agent-browser cmd assert_visible ".modal" |
cmd assert_hidden <selector> |
Element is hidden | agent-browser cmd assert_hidden ".loading" |
cmd assert_text <selector> <text> |
Element contains text | agent-browser cmd assert_text ".msg" "Success" |
cmd assert_text_exact <sel> <text> |
Text matches exactly | agent-browser cmd assert_text_exact ".count" "42" |
cmd assert_value <selector> <value> |
Input has value | agent-browser cmd assert_value "#email" "test@example.com" |
cmd assert_checked <selector> |
Checkbox is checked | agent-browser cmd assert_checked "#agree" |
cmd assert_url <pattern> |
URL contains pattern | agent-browser cmd assert_url "/dashboard" |
Data Extraction
| Command | Description | Example |
|---|---|---|
cmd text <selector> |
Get text content | agent-browser cmd text ".title" |
cmd value <selector> |
Get input value | agent-browser cmd value "#email" |
cmd attr <selector> <attr> |
Get attribute | agent-browser cmd attr "a" "href" |
cmd count <selector> |
Count matching elements | agent-browser cmd count ".item" |
cmd eval <javascript> |
Execute JavaScript | agent-browser cmd eval "document.title" |
cmd cookies |
Get all cookies (JSON) | agent-browser cmd cookies |
cmd storage |
Get localStorage (JSON) | agent-browser cmd storage |
Debugging
| Command | Description | Example |
|---|---|---|
cmd console |
View JS console logs | agent-browser cmd console |
cmd network |
View network requests | agent-browser cmd network |
cmd network_failed |
View failed requests | agent-browser cmd network_failed |
cmd clear_logs |
Clear console/network logs | agent-browser cmd clear_logs |
cmd wait <ms> |
Wait milliseconds | agent-browser cmd wait 2000 |
cmd wait_for <selector> [ms] |
Wait for element | agent-browser cmd wait_for ".loaded" 15000 |
cmd wait_for_text <text> |
Wait for text | agent-browser cmd wait_for_text "Complete" |
cmd help |
Show help | agent-browser cmd help |
Session Management
Run multiple browser sessions concurrently using session IDs:
# Start two sessions
agent-browser start http://localhost:8080 --session app1
agent-browser start http://localhost:9090 --session app2
# Send commands to specific sessions
agent-browser cmd screenshot home --session app1
agent-browser cmd click "#login" --session app2
# Check status
agent-browser status --session app1
# Stop specific session
agent-browser stop --session app1
Configuration
Screenshot Output Directory
agent-browser start http://localhost:8080 --output-dir ./my-screenshots
Timeouts
Default timeouts:
- Command timeout: 5 seconds (click, fill, etc.)
- wait_for timeout: 10 seconds (can override:
wait_for .element 15000) - IPC timeout: 10 seconds (waiting for browser response)
Selectors
Use standard Playwright/CSS selectors:
# CSS selectors
agent-browser cmd click ".btn-primary"
agent-browser cmd click "#submit"
agent-browser cmd click "button[type='submit']"
agent-browser cmd click "[data-testid='login-btn']"
# Text selectors
agent-browser cmd click "text='Sign In'"
agent-browser cmd click "text=Submit"
# Chained selectors
agent-browser cmd click ".card >> text='Edit'"
Interactive Mode
For manual testing with AI assistance:
agent-browser interact http://localhost:8080
This starts a REPL where you can type commands directly:
> ss initial
Screenshot saved: ./screenshots/interactive/step_01_initial.png
> click #login
Clicked: #login
> ss after_login
Screenshot saved: ./screenshots/interactive/step_02_after_login.png
> quit
Integration with AI Agents
Claude Code Example
# In Claude Code conversation:
# "Test the login flow on localhost:8080"
# Claude runs:
agent-browser start http://localhost:8080 --session test1 &
sleep 2
agent-browser cmd screenshot login_page --session test1
# Claude analyzes screenshot...
agent-browser cmd fill "#username" "testuser" --session test1
agent-browser cmd fill "#password" "testpass" --session test1
agent-browser cmd click "button[type='submit']" --session test1
agent-browser cmd wait_for ".dashboard" --session test1
agent-browser cmd assert_url "/dashboard" --session test1
agent-browser cmd screenshot success --session test1
agent-browser stop --session test1
Generic LLM Integration
import subprocess
def browser_cmd(cmd: str, session: str = "default") -> str:
result = subprocess.run(
["agent-browser", "cmd", *cmd.split(), "--session", session],
capture_output=True, text=True
)
return result.stdout.strip()
# Start browser (in separate process)
subprocess.Popen(["agent-browser", "start", "http://localhost:8080", "--session", "test"])
# Send commands
browser_cmd("screenshot initial", "test")
browser_cmd("click #login", "test")
browser_cmd("assert_visible .dashboard", "test")
File Locations
| File | Location | Purpose |
|---|---|---|
| State | %TEMP%/agent_browser_{session}_state.json |
Browser running state |
| Commands | %TEMP%/agent_browser_{session}_cmd.json |
Pending command |
| Results | %TEMP%/agent_browser_{session}_result.json |
Command result |
| Console logs | %TEMP%/agent_browser_{session}_console.json |
JS console output |
| Network logs | %TEMP%/agent_browser_{session}_network.json |
Network requests |
| Screenshots | ./screenshots/ (configurable) |
Captured screenshots |
Troubleshooting
| Problem | Solution |
|---|---|
Timeout waiting for result |
Browser may have crashed - run status to check |
Element not found |
Use count to verify selector matches elements |
Browser not responding |
Run status to ping the browser |
Browser process has died |
State was stale - run start <url> to restart |
Complex selector failing |
Use eval with JavaScript as fallback |
Debug Workflow
# 1. Check browser status
agent-browser status
# 2. Check for JS errors
agent-browser cmd console
# 3. Check for failed requests
agent-browser cmd network_failed
# 4. Take screenshot to see current state
agent-browser cmd screenshot debug
# 5. Count elements to verify selector
agent-browser cmd count ".my-selector"
Python API
You can also use agent-browser as a Python library:
from agent_browser import BrowserDriver
driver = BrowserDriver(session_id="test", output_dir="./screenshots")
# Start browser (blocking - run in thread/process)
# driver.start("http://localhost:8080")
# Or send commands to running browser
result = driver.send_command("screenshot home")
print(result)
status = driver.status() # Returns True if running
Contributing
See CONTRIBUTING.md for guidelines.
License
GNU General Public License v3.0 - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_agent_browser-0.1.1.tar.gz.
File metadata
- Download URL: ai_agent_browser-0.1.1.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e432b71bed4574755cd462a63fa342b60f6b9ece3a41fedb23b1fc5858ba3397
|
|
| MD5 |
32c673125961534983026ccf4edc7618
|
|
| BLAKE2b-256 |
750d6199e96ffdcf8c22c86fc982247be5b6ef8974443fb261142e495a3fc706
|
Provenance
The following attestation bundles were made for ai_agent_browser-0.1.1.tar.gz:
Publisher:
publish.yml on abhinav-nigam/agent-browser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_agent_browser-0.1.1.tar.gz -
Subject digest:
e432b71bed4574755cd462a63fa342b60f6b9ece3a41fedb23b1fc5858ba3397 - Sigstore transparency entry: 780479618
- Sigstore integration time:
-
Permalink:
abhinav-nigam/agent-browser@d78c3d00e5fdbf69542fc9160d923424a2ade60c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/abhinav-nigam
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d78c3d00e5fdbf69542fc9160d923424a2ade60c -
Trigger Event:
release
-
Statement type:
File details
Details for the file ai_agent_browser-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ai_agent_browser-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29c0dbd5cee6a3724215a8dbb61380f8cd1611da4cbb2aa56828a7e697ffc2c7
|
|
| MD5 |
dd2ed8e6fe3931257c67122fe7292a51
|
|
| BLAKE2b-256 |
be75071dac2b260acb08f7fbb6fa76f68ee1e8a8d9b75561cf22e2236bc2832f
|
Provenance
The following attestation bundles were made for ai_agent_browser-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on abhinav-nigam/agent-browser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_agent_browser-0.1.1-py3-none-any.whl -
Subject digest:
29c0dbd5cee6a3724215a8dbb61380f8cd1611da4cbb2aa56828a7e697ffc2c7 - Sigstore transparency entry: 780479620
- Sigstore integration time:
-
Permalink:
abhinav-nigam/agent-browser@d78c3d00e5fdbf69542fc9160d923424a2ade60c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/abhinav-nigam
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d78c3d00e5fdbf69542fc9160d923424a2ade60c -
Trigger Event:
release
-
Statement type: