A robust browser automation tool for AI agents - control browsers via CLI or IPC

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

abhinav.nigam86

These details have not been verified by PyPI

Project description

agent-browser

A robust browser automation tool designed for AI agents to control browsers via CLI commands.

Why This Exists

AI agents (like Claude Code, Codex, GPT-based tools) need to interact with web applications for testing and automation. However, most browser automation tools require:

Programmatic API access within a running process
Complex async/await patterns
Persistent connections

agent-browser solves this by providing:

Simple CLI commands - Any process that can run shell commands can control a browser
File-based IPC - Stateless CLI commands control a stateful browser session
Multi-session support - Run multiple browser sessions concurrently
Built for AI - Screenshots auto-resize for vision models, assertions return clear PASS/FAIL

Installation

pip install ai-agent-browser
playwright install chromium

Quick Start

# Terminal 1: Start browser (blocks while running)
agent-browser start http://localhost:8080

# Terminal 2: Send commands
agent-browser cmd screenshot home
agent-browser cmd click "button[type='submit']"
agent-browser cmd fill "#email" "test@example.com"
agent-browser cmd assert_visible ".success-message"

# When done
agent-browser stop

Architecture

┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐
│   AI Agent      │     │  IPC Files   │     │  Browser        │
│  (Claude Code,  │────▶│  (temp dir)  │────▶│  (Playwright)   │
│   Codex, etc)   │     │              │     │                 │
│                 │◀────│  cmd.json    │◀────│  Chromium       │
│  CLI commands   │     │  result.json │     │                 │
└─────────────────┘     └──────────────┘     └─────────────────┘

The browser runs in one process, listening for commands via JSON files. CLI commands write to cmd.json, the browser processes them and writes results to result.json. This decoupled architecture allows any process to control the browser.

Command Reference

Browser Control

Command	Description	Example
`start <url>`	Start browser session (blocks)	`agent-browser start http://localhost:8080`
`start <url> --visible`	Start in headed mode	`agent-browser start http://localhost:8080 --visible`
`stop`	Close browser	`agent-browser stop`
`status`	Check if browser is running	`agent-browser status`
`cmd reload`	Reload current page	`agent-browser cmd reload`
`cmd goto <url>`	Navigate to URL	`agent-browser cmd goto http://example.com`
`cmd back`	Navigate back	`agent-browser cmd back`
`cmd forward`	Navigate forward	`agent-browser cmd forward`
`cmd url`	Print current URL	`agent-browser cmd url`
`cmd viewport <w> <h>`	Set viewport size	`agent-browser cmd viewport 1920 1080`

Screenshots

Command	Description	Example
`cmd screenshot [name]`	Full-page screenshot	`agent-browser cmd screenshot checkout_page`
`cmd screenshot viewport [name]`	Viewport only (faster)	`agent-browser cmd screenshot viewport header`
`cmd ss [name]`	Alias for screenshot	`agent-browser cmd ss step1`

Screenshots are automatically resized to max 1500x1500 for AI vision model compatibility.

Interactions

Command	Description	Example
`cmd click <selector>`	Click element	`agent-browser cmd click "#submit-btn"`
`cmd click_nth <selector> <n>`	Click nth element (0-indexed)	`agent-browser cmd click_nth ".item" 2`
`cmd fill <selector> <text>`	Fill input field	`agent-browser cmd fill "#email" "test@example.com"`
`cmd type <selector> <text>`	Type with key events	`agent-browser cmd type "#search" "query"`
`cmd select <selector> <value>`	Select dropdown option	`agent-browser cmd select "#country" "US"`
`cmd press <key>`	Press keyboard key	`agent-browser cmd press Enter`
`cmd scroll <direction>`	Scroll page	`agent-browser cmd scroll down`
`cmd hover <selector>`	Hover over element	`agent-browser cmd hover ".tooltip-trigger"`
`cmd focus <selector>`	Focus element	`agent-browser cmd focus "#input"`
`cmd upload <selector> <path>`	Upload file	`agent-browser cmd upload "#file" ./doc.pdf`
`cmd dialog <action> [text]`	Handle dialog	`agent-browser cmd dialog accept`
`cmd clear`	Clear localStorage/sessionStorage	`agent-browser cmd clear`

Scroll directions: up, down, top, bottom, left, right

Dialog actions: accept, dismiss, accept <prompt_text>

Assertions

All assertions return [PASS] or [FAIL] prefix for easy parsing.

Command	Description	Example
`cmd assert_visible <selector>`	Element is visible	`agent-browser cmd assert_visible ".modal"`
`cmd assert_hidden <selector>`	Element is hidden	`agent-browser cmd assert_hidden ".loading"`
`cmd assert_text <selector> <text>`	Element contains text	`agent-browser cmd assert_text ".msg" "Success"`
`cmd assert_text_exact <sel> <text>`	Text matches exactly	`agent-browser cmd assert_text_exact ".count" "42"`
`cmd assert_value <selector> <value>`	Input has value	`agent-browser cmd assert_value "#email" "test@example.com"`
`cmd assert_checked <selector>`	Checkbox is checked	`agent-browser cmd assert_checked "#agree"`
`cmd assert_url <pattern>`	URL contains pattern	`agent-browser cmd assert_url "/dashboard"`

Data Extraction

Command	Description	Example
`cmd text <selector>`	Get text content	`agent-browser cmd text ".title"`
`cmd value <selector>`	Get input value	`agent-browser cmd value "#email"`
`cmd attr <selector> <attr>`	Get attribute	`agent-browser cmd attr "a" "href"`
`cmd count <selector>`	Count matching elements	`agent-browser cmd count ".item"`
`cmd eval <javascript>`	Execute JavaScript	`agent-browser cmd eval "document.title"`
`cmd cookies`	Get all cookies (JSON)	`agent-browser cmd cookies`
`cmd storage`	Get localStorage (JSON)	`agent-browser cmd storage`

Debugging

Command	Description	Example
`cmd console`	View JS console logs	`agent-browser cmd console`
`cmd network`	View network requests	`agent-browser cmd network`
`cmd network_failed`	View failed requests	`agent-browser cmd network_failed`
`cmd clear_logs`	Clear console/network logs	`agent-browser cmd clear_logs`
`cmd wait <ms>`	Wait milliseconds	`agent-browser cmd wait 2000`
`cmd wait_for <selector> [ms]`	Wait for element	`agent-browser cmd wait_for ".loaded" 15000`
`cmd wait_for_text <text>`	Wait for text	`agent-browser cmd wait_for_text "Complete"`
`cmd help`	Show help	`agent-browser cmd help`

Session Management

Run multiple browser sessions concurrently using session IDs:

# Start two sessions
agent-browser start http://localhost:8080 --session app1
agent-browser start http://localhost:9090 --session app2

# Send commands to specific sessions
agent-browser cmd screenshot home --session app1
agent-browser cmd click "#login" --session app2

# Check status
agent-browser status --session app1

# Stop specific session
agent-browser stop --session app1

Configuration

Screenshot Output Directory

agent-browser start http://localhost:8080 --output-dir ./my-screenshots

Timeouts

Default timeouts:

Command timeout: 5 seconds (click, fill, etc.)
wait_for timeout: 10 seconds (can override: wait_for .element 15000)
IPC timeout: 10 seconds (waiting for browser response)

Selectors

Use standard Playwright/CSS selectors:

# CSS selectors
agent-browser cmd click ".btn-primary"
agent-browser cmd click "#submit"
agent-browser cmd click "button[type='submit']"
agent-browser cmd click "[data-testid='login-btn']"

# Text selectors
agent-browser cmd click "text='Sign In'"
agent-browser cmd click "text=Submit"

# Chained selectors
agent-browser cmd click ".card >> text='Edit'"

Interactive Mode

For manual testing with AI assistance:

agent-browser interact http://localhost:8080

This starts a REPL where you can type commands directly:

> ss initial
Screenshot saved: ./screenshots/interactive/step_01_initial.png
> click #login
Clicked: #login
> ss after_login
Screenshot saved: ./screenshots/interactive/step_02_after_login.png
> quit

Integration with AI Agents

Claude Code Example

# In Claude Code conversation:
# "Test the login flow on localhost:8080"

# Claude runs:
agent-browser start http://localhost:8080 --session test1 &
sleep 2
agent-browser cmd screenshot login_page --session test1
# Claude analyzes screenshot...
agent-browser cmd fill "#username" "testuser" --session test1
agent-browser cmd fill "#password" "testpass" --session test1
agent-browser cmd click "button[type='submit']" --session test1
agent-browser cmd wait_for ".dashboard" --session test1
agent-browser cmd assert_url "/dashboard" --session test1
agent-browser cmd screenshot success --session test1
agent-browser stop --session test1

Generic LLM Integration

import subprocess

def browser_cmd(cmd: str, session: str = "default") -> str:
    result = subprocess.run(
        ["agent-browser", "cmd", *cmd.split(), "--session", session],
        capture_output=True, text=True
    )
    return result.stdout.strip()

# Start browser (in separate process)
subprocess.Popen(["agent-browser", "start", "http://localhost:8080", "--session", "test"])

# Send commands
browser_cmd("screenshot initial", "test")
browser_cmd("click #login", "test")
browser_cmd("assert_visible .dashboard", "test")

File Locations

File	Location	Purpose
State	`%TEMP%/agent_browser_{session}_state.json`	Browser running state
Commands	`%TEMP%/agent_browser_{session}_cmd.json`	Pending command
Results	`%TEMP%/agent_browser_{session}_result.json`	Command result
Console logs	`%TEMP%/agent_browser_{session}_console.json`	JS console output
Network logs	`%TEMP%/agent_browser_{session}_network.json`	Network requests
Screenshots	`./screenshots/` (configurable)	Captured screenshots

Troubleshooting

Problem	Solution
`Timeout waiting for result`	Browser may have crashed - run `status` to check
`Element not found`	Use `count` to verify selector matches elements
`Browser not responding`	Run `status` to ping the browser
`Browser process has died`	State was stale - run `start <url>` to restart
`Complex selector failing`	Use `eval` with JavaScript as fallback

Debug Workflow

# 1. Check browser status
agent-browser status

# 2. Check for JS errors
agent-browser cmd console

# 3. Check for failed requests
agent-browser cmd network_failed

# 4. Take screenshot to see current state
agent-browser cmd screenshot debug

# 5. Count elements to verify selector
agent-browser cmd count ".my-selector"

Python API

You can also use agent-browser as a Python library:

from agent_browser import BrowserDriver

driver = BrowserDriver(session_id="test", output_dir="./screenshots")

# Start browser (blocking - run in thread/process)
# driver.start("http://localhost:8080")

# Or send commands to running browser
result = driver.send_command("screenshot home")
print(result)

status = driver.status()  # Returns True if running

Contributing

See CONTRIBUTING.md for guidelines.

License

GNU General Public License v3.0 - see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

abhinav.nigam86

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.7

Jan 11, 2026

0.2.6

Jan 11, 2026

0.2.5

Jan 10, 2026

0.2.4

Jan 9, 2026

0.2.3

Jan 9, 2026

0.2.2

Jan 9, 2026

0.2.1

Jan 9, 2026

0.2.0

Jan 9, 2026

0.1.9

Jan 8, 2026

0.1.8

Jan 7, 2026

0.1.7

Jan 7, 2026

0.1.6

Dec 29, 2025

0.1.5

Dec 29, 2025

0.1.4

Dec 28, 2025

0.1.3

Dec 28, 2025

0.1.2

Dec 27, 2025

This version

0.1.1

Dec 27, 2025

0.1.0

Dec 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_agent_browser-0.1.1.tar.gz (23.9 kB view details)

Uploaded Dec 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_agent_browser-0.1.1-py3-none-any.whl (20.7 kB view details)

Uploaded Dec 27, 2025 Python 3

File details

Details for the file ai_agent_browser-0.1.1.tar.gz.

File metadata

Download URL: ai_agent_browser-0.1.1.tar.gz
Upload date: Dec 27, 2025
Size: 23.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_agent_browser-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e432b71bed4574755cd462a63fa342b60f6b9ece3a41fedb23b1fc5858ba3397`
MD5	`32c673125961534983026ccf4edc7618`
BLAKE2b-256	`750d6199e96ffdcf8c22c86fc982247be5b6ef8974443fb261142e495a3fc706`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_agent_browser-0.1.1.tar.gz:

Publisher: publish.yml on abhinav-nigam/agent-browser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_agent_browser-0.1.1.tar.gz
- Subject digest: e432b71bed4574755cd462a63fa342b60f6b9ece3a41fedb23b1fc5858ba3397
- Sigstore transparency entry: 780479618
- Sigstore integration time: Dec 27, 2025
Source repository:
- Permalink: abhinav-nigam/agent-browser@d78c3d00e5fdbf69542fc9160d923424a2ade60c
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/abhinav-nigam
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d78c3d00e5fdbf69542fc9160d923424a2ade60c
- Trigger Event: release

File details

Details for the file ai_agent_browser-0.1.1-py3-none-any.whl.

File metadata

Download URL: ai_agent_browser-0.1.1-py3-none-any.whl
Upload date: Dec 27, 2025
Size: 20.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_agent_browser-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29c0dbd5cee6a3724215a8dbb61380f8cd1611da4cbb2aa56828a7e697ffc2c7`
MD5	`dd2ed8e6fe3931257c67122fe7292a51`
BLAKE2b-256	`be75071dac2b260acb08f7fbb6fa76f68ee1e8a8d9b75561cf22e2236bc2832f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_agent_browser-0.1.1-py3-none-any.whl:

Publisher: publish.yml on abhinav-nigam/agent-browser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_agent_browser-0.1.1-py3-none-any.whl
- Subject digest: 29c0dbd5cee6a3724215a8dbb61380f8cd1611da4cbb2aa56828a7e697ffc2c7
- Sigstore transparency entry: 780479620
- Sigstore integration time: Dec 27, 2025
Source repository:
- Permalink: abhinav-nigam/agent-browser@d78c3d00e5fdbf69542fc9160d923424a2ade60c
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/abhinav-nigam
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d78c3d00e5fdbf69542fc9160d923424a2ade60c
- Trigger Event: release

ai-agent-browser 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

agent-browser

Why This Exists

Installation

Quick Start

Architecture

Command Reference

Browser Control

Screenshots

Interactions

Assertions

Data Extraction

Debugging

Session Management

Configuration

Screenshot Output Directory

Timeouts

Selectors

Interactive Mode

Integration with AI Agents

Claude Code Example

Generic LLM Integration

File Locations

Troubleshooting

Debug Workflow

Python API

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance