Skip to main content

A browser for AI to develop web automation — human-like automation that works seamlessly in a world designed for humans

Project description

ai-dev-browser

A browser for AI to develop web automation — human-like automation that works seamlessly in a world designed for humans.

What is this?

ai-dev-browser is a browser that AI agents (Claude, GPT, etc.) use to see and interact with web pages — similar to how Claude in Chrome works, but headless-compatible and embeddable.

Two interaction modes:

  • Accessibility tree (page_find): semantic element discovery with refs for clicking/typing
  • Screenshots (page_screenshot + mouse_click --screenshot): visual coordinate-based interaction with automatic scaling
# AI discovers elements
python -m ai_dev_browser.tools.page_find

# AI clicks by ref (from accessibility tree)
python -m ai_dev_browser.tools.click_by_ref --ref "5#214"

# AI clicks by coordinates (from screenshot)
python -m ai_dev_browser.tools.mouse_click --x 105 --y 52 --screenshot screenshots/page.png

Screenshot Coordinate Alignment

Screenshots are automatically scaled to fit LLM vision limits (default: 1280px long edge for Claude). Scaling metadata is embedded in the PNG file. When you pass --screenshot to mouse tools, coordinates are auto-converted from screenshot space to CSS viewport space.

# Take screenshot (auto-scaled, metadata embedded in PNG)
python -m ai_dev_browser.tools.page_screenshot
# → screenshots/20260325_210000.png (1280x800)

# Click using coordinates from the screenshot — auto-scaled
python -m ai_dev_browser.tools.mouse_click --x 78 --y 117 --screenshot screenshots/20260325_210000.png

Configurable per model:

await screenshot(tab, max_long_edge=1280)   # Claude (default)
await screenshot(tab, max_long_edge=2048)   # GPT-4o
await screenshot(tab, max_long_edge=0)      # Gemini (unlimited)

CLI = Python (SSOT)

Every tool works as both CLI command and Python function. Parameters are defined once in core functions, CLI tools are auto-generated. See cli-args-ssot.

python -m ai_dev_browser.tools.click_by_text --text "Sign in"
from ai_dev_browser.core import click_by_text
await click_by_text(tab, text="Sign in")

49 tools covering: navigation, element interaction, mouse, tabs, screenshots, cookies, storage, window management, dialogs, downloads, raw CDP, and Cloudflare bypass.

ls ai_dev_browser/tools/  # See all available tools

Tool Naming Convention

Most element-targeting tools follow <verb>_by_<spec> — verb is the action, spec is how you identify the element. LLM mental model: "I have an X, I want to do Y → look for Y_by_X."

Spec Source Example tool
_by_ref ref returned by page_discover (AX tree) click_by_ref
_by_text visible text content click_by_text
_by_html_id id="..." HTML attribute (cross-frame) click_by_html_id
_by_xpath XPath expression (document.evaluate) click_by_xpath

Verbs currently in use: click, type, focus, hover, drag, highlight, html (read), screenshot, select, upload, find.

page_* tools operate on the whole page (page_goto, page_screenshot, page_discover, page_scroll). page_discover is broad exploration; find_by_* is targeted single-element lookup.

Quick Start

pip install ai-dev-browser
# or pin a specific version
pip install "ai-dev-browser>=0.5,<0.6"
# or with uv
uv add ai-dev-browser

Want the unreleased master or a specific commit?

pip install "ai-dev-browser @ git+https://github.com/sudoprivacy/ai-dev-browser.git@master"
from ai_dev_browser.core import goto, click_by_text, type_by_text, screenshot

await goto(tab, "https://example.com")
await type_by_text(tab, name="Email", text="user@example.com")
await click_by_text(tab, text="Sign in")
await screenshot(tab)  # → screenshots/{timestamp}.png

Human-like Behavior

CDP-dispatched events produce isTrusted=true. Optional human-like features (all off by default, opt-in):

from ai_dev_browser.core import human

human.configure(
    use_gaussian_path=True,    # Bezier mouse curves (+50ms)
    click_hold_enabled=True,   # Hold before release (+45ms)
    type_humanize=True,        # Typing delays (+35ms/char)
)

Default: click offset randomization (free, always on). Everything else is opt-in for speed.

Architecture

  • CDP WebSocket transport (_transport.py): direct Chrome DevTools Protocol, no browser automation framework dependency
  • Auto-reconnect: tab WebSocket reconnection with target re-discovery (handles Electron SPA navigation)
  • Connection reuse: same host:port shares one BrowserClient instance across calls
  • CDP module: generated from Google's official CDP spec via cdp-python

Environment Variables

Variable Purpose
AI_DEV_BROWSER_PORT Default CDP port (skips auto-detection)
AI_DEV_BROWSER_HEADLESS Default headless mode (1/true)
AI_DEV_BROWSER_REDIRECT Block direct CLI, print redirect message
AI_DEV_BROWSER_OUTPUT_DIR Default directory for page_screenshot (overrides ./screenshots/). Consumers like sudowork set this to inject a persistent output path so LLMs don't need to learn host-specific conventions.

License

AGPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_dev_browser-0.5.3.tar.gz (425.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_dev_browser-0.5.3-py3-none-any.whl (411.0 kB view details)

Uploaded Python 3

File details

Details for the file ai_dev_browser-0.5.3.tar.gz.

File metadata

  • Download URL: ai_dev_browser-0.5.3.tar.gz
  • Upload date:
  • Size: 425.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_dev_browser-0.5.3.tar.gz
Algorithm Hash digest
SHA256 d8f02f4908b5943c95b7afe69715bbc8d924c2b5dc227ee48598243cc38552f2
MD5 efd2e2a2677b48f5483c44149d23cea3
BLAKE2b-256 2d4eb36c1105d3c190e5b4bbb88349fe2ec22ec8b70895b17354a657bd182ef1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_dev_browser-0.5.3.tar.gz:

Publisher: publish.yml on sudoprivacy/ai-dev-browser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_dev_browser-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: ai_dev_browser-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 411.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_dev_browser-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8f40a42000d0690907eb9aa4f510be3de273043e7ce4361d0b3aaf4aba9cd893
MD5 3279b3d4dc28ae5abb49bcb0bfd269a1
BLAKE2b-256 a8da8fd8c6bcd8faea0470619ce61884a7aca39c15abbafe29329229acefdde2

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_dev_browser-0.5.3-py3-none-any.whl:

Publisher: publish.yml on sudoprivacy/ai-dev-browser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page