A browser for AI to develop web automation — human-like automation that works seamlessly in a world designed for humans
Project description
ai-dev-browser
A browser for AI to develop web automation — human-like automation that works seamlessly in a world designed for humans.
What is this?
ai-dev-browser is a browser that AI agents (Claude, GPT, etc.) use to see and interact with web pages — similar to how Claude in Chrome works, but headless-compatible and embeddable.
Two interaction modes:
- Accessibility tree (
page_find): semantic element discovery with refs for clicking/typing - Screenshots (
page_screenshot+mouse_click --screenshot): visual coordinate-based interaction with automatic scaling
# AI discovers elements
python -m ai_dev_browser.tools.page_find
# AI clicks by ref (from accessibility tree)
python -m ai_dev_browser.tools.click_by_ref --ref "5#214"
# AI clicks by coordinates (from screenshot)
python -m ai_dev_browser.tools.mouse_click --x 105 --y 52 --screenshot screenshots/page.png
Screenshot Coordinate Alignment
Screenshots are automatically scaled to fit LLM vision limits (default: 1280px long edge for Claude). Scaling metadata is embedded in the PNG file. When you pass --screenshot to mouse tools, coordinates are auto-converted from screenshot space to CSS viewport space.
# Take screenshot (auto-scaled, metadata embedded in PNG)
python -m ai_dev_browser.tools.page_screenshot
# → screenshots/20260325_210000.png (1280x800)
# Click using coordinates from the screenshot — auto-scaled
python -m ai_dev_browser.tools.mouse_click --x 78 --y 117 --screenshot screenshots/20260325_210000.png
Configurable per model:
await screenshot(tab, max_long_edge=1280) # Claude (default)
await screenshot(tab, max_long_edge=2048) # GPT-4o
await screenshot(tab, max_long_edge=0) # Gemini (unlimited)
CLI = Python (SSOT)
Every tool works as both CLI command and Python function. Parameters are defined once in core functions, CLI tools are auto-generated. See cli-args-ssot.
python -m ai_dev_browser.tools.click_by_text --text "Sign in"
from ai_dev_browser.core import click_by_text
await click_by_text(tab, text="Sign in")
49 tools covering: navigation, element interaction, mouse, tabs, screenshots, cookies, storage, window management, dialogs, downloads, raw CDP, and Cloudflare bypass.
ls ai_dev_browser/tools/ # See all available tools
Tool Naming Convention
Most element-targeting tools follow <verb>_by_<spec> — verb is the action,
spec is how you identify the element. LLM mental model: "I have an X, I want
to do Y → look for Y_by_X."
| Spec | Source | Example tool |
|---|---|---|
_by_ref |
ref returned by page_discover (AX tree) |
click_by_ref |
_by_text |
visible text content | click_by_text |
_by_html_id |
id="..." HTML attribute (cross-frame) |
click_by_html_id |
_by_xpath |
XPath expression (document.evaluate) |
click_by_xpath |
Verbs currently in use: click, type, focus, hover, drag, highlight,
html (read), screenshot, select, upload, find.
page_* tools operate on the whole page (page_goto, page_screenshot,
page_discover, page_scroll). page_discover is broad exploration;
find_by_* is targeted single-element lookup.
Docstring First-Line Convention
Every tool's docstring first sentence is a decision signal, not a description. Two halves, always in this order:
- Input (when to pick me) — the condition that makes this tool the right choice. "Use when: you know the html id…", "Use when: no specific tool fits — last resort…"
- Output (what the return unlocks) — what the caller does with the
return value. "Returns
{found, tag, …}you branch on — pair withclick_by_html_idto act."
Why: LLMs ranking tools glance at the first line only. A pure
description ("Click an element located by html id, …") reads the same
as a lower-level alternative and gives no priority signal. A decision
signal ("Use when: you already know the html id. Prefer over click_by_ref when possible.") tells the LLM when to pick this tool
and what to do next. Measured effect on real LLM traces: the
intended tool goes from near-zero uptake to the obvious first choice
for its scenario.
When you add a new tool, write the first line in this shape before touching anything else. Everything after it (Args / Returns / Example) can stay conventional.
Quick Start
pip install ai-dev-browser
# or pin a specific version
pip install "ai-dev-browser>=0.5,<0.6"
# or with uv
uv add ai-dev-browser
Want the unreleased master or a specific commit?
pip install "ai-dev-browser @ git+https://github.com/sudoprivacy/ai-dev-browser.git@master"
from ai_dev_browser.core import goto, click_by_text, type_by_text, screenshot
await goto(tab, "https://example.com")
await type_by_text(tab, name="Email", text="user@example.com")
await click_by_text(tab, text="Sign in")
await screenshot(tab) # → screenshots/{timestamp}.png
Human-like Behavior
CDP-dispatched events produce isTrusted=true. Optional human-like features (all off by default, opt-in):
from ai_dev_browser.core import human
human.configure(
use_gaussian_path=True, # Bezier mouse curves (+50ms)
click_hold_enabled=True, # Hold before release (+45ms)
type_humanize=True, # Typing delays (+35ms/char)
)
Default: click offset randomization (free, always on). Everything else is opt-in for speed.
Architecture
- CDP WebSocket transport (
_transport.py): direct Chrome DevTools Protocol, no browser automation framework dependency - Auto-reconnect: tab WebSocket reconnection with target re-discovery (handles Electron SPA navigation)
- Connection reuse: same
host:portshares oneBrowserClientinstance across calls - CDP module: generated from Google's official CDP spec via cdp-python
Environment Variables
| Variable | Purpose |
|---|---|
AI_DEV_BROWSER_PORT |
Default CDP port (skips auto-detection) |
AI_DEV_BROWSER_HEADLESS |
Default headless mode (1/true) |
AI_DEV_BROWSER_REDIRECT |
Block direct CLI, print redirect message |
AI_DEV_BROWSER_OUTPUT_DIR |
Default directory for page_screenshot (overrides ./screenshots/). Consumers like sudowork set this to inject a persistent output path so LLMs don't need to learn host-specific conventions. |
License
AGPL-3.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_dev_browser-0.5.4.tar.gz.
File metadata
- Download URL: ai_dev_browser-0.5.4.tar.gz
- Upload date:
- Size: 429.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b32952477f3b76f7f8bf137d174afdcaf88d9e43839a3f4166c29b6ddaf2cbce
|
|
| MD5 |
ea970595c6441193d72c96c3b02a177c
|
|
| BLAKE2b-256 |
b4964931a07534709aaa8c59562af9363a9a1cdf571f51ca438ac4d837bdcd53
|
Provenance
The following attestation bundles were made for ai_dev_browser-0.5.4.tar.gz:
Publisher:
publish.yml on sudoprivacy/ai-dev-browser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_dev_browser-0.5.4.tar.gz -
Subject digest:
b32952477f3b76f7f8bf137d174afdcaf88d9e43839a3f4166c29b6ddaf2cbce - Sigstore transparency entry: 1338967092
- Sigstore integration time:
-
Permalink:
sudoprivacy/ai-dev-browser@5c4f107ef3c220572f8fe5c6ab85c5885c678f4c -
Branch / Tag:
refs/tags/v0.5.4 - Owner: https://github.com/sudoprivacy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5c4f107ef3c220572f8fe5c6ab85c5885c678f4c -
Trigger Event:
push
-
Statement type:
File details
Details for the file ai_dev_browser-0.5.4-py3-none-any.whl.
File metadata
- Download URL: ai_dev_browser-0.5.4-py3-none-any.whl
- Upload date:
- Size: 414.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e27148d8de3265f3c394108d05245748e0bc4ff438420a0b904f24e8649637e
|
|
| MD5 |
53b88f9bb8a826338ecfec482a45ff07
|
|
| BLAKE2b-256 |
4f5d408df9eadbf8e7a0fb48f8e9d2b522c12129477c256d902bdc242dba4787
|
Provenance
The following attestation bundles were made for ai_dev_browser-0.5.4-py3-none-any.whl:
Publisher:
publish.yml on sudoprivacy/ai-dev-browser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_dev_browser-0.5.4-py3-none-any.whl -
Subject digest:
3e27148d8de3265f3c394108d05245748e0bc4ff438420a0b904f24e8649637e - Sigstore transparency entry: 1338967093
- Sigstore integration time:
-
Permalink:
sudoprivacy/ai-dev-browser@5c4f107ef3c220572f8fe5c6ab85c5885c678f4c -
Branch / Tag:
refs/tags/v0.5.4 - Owner: https://github.com/sudoprivacy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5c4f107ef3c220572f8fe5c6ab85c5885c678f4c -
Trigger Event:
push
-
Statement type: