Skip to main content

CLI tool for AI agents to observe and interact with Chrome via CDP

Project description

chrome-agent

PyPI version PyPI downloads Python versions License

A CLI tool that gives AI coding agents the ability to observe and interact with Chrome browsers via the Chrome DevTools Protocol.

Multiple agents and humans can share the same browser simultaneously. One agent drives while another observes. A human browses while an agent watches for errors. Four agents run a coordinated test suite against a single browser. The protocol supports all of it natively.

Why this exists

AI coding agents need to see and interact with browsers -- to test their code, debug automation, inspect page state. The standard approach (browser MCP tools) uses a persistent server with protocol negotiation and verbose response formatting. chrome-agent takes a different approach: direct access to Chrome's DevTools Protocol with no abstraction layer.

This means full CDP protocol access -- every command, every event, every domain Chrome exposes. Not a curated subset of capabilities, but the complete protocol. Agents compose interactions from CDP primitives the same way DevTools does.

Installation

uv tool install chrome-agent

Or add to a project:

uv add chrome-agent

Requires Google Chrome or Chromium installed on the system. No Playwright, no browser downloads.

Quick Start

# Launch a browser
chrome-agent launch

# Check it's running
chrome-agent status

# Read the page title
chrome-agent Runtime.evaluate '{"expression": "document.title", "returnByValue": true}'

# Navigate
chrome-agent Page.navigate '{"url": "https://example.com"}'

# Take a screenshot (returns base64 PNG in JSON)
chrome-agent Page.captureScreenshot '{"format": "png"}'

# Discover available commands
chrome-agent help Page
chrome-agent help Page.navigate

Two Modes

One-shot mode

Send a single CDP command. Connects, sends, prints JSON response, disconnects.

chrome-agent [--port PORT] Domain.method '{"param": "value"}'

Good for spot checks, screenshots, quick queries. ~350ms per call.

Session mode

Persistent CDP connection via stdin/stdout. Send commands, subscribe to events, get real-time notifications.

chrome-agent session [--port PORT]

Session protocol:

+Page.loadEventFired              # subscribe to event
+Page.frameNavigated              # subscribe to another
Page.navigate {"url": "https://example.com"}   # send command
-Page.loadEventFired              # unsubscribe

Responses and events are JSON lines on stdout. ~0.5ms per command.

Operational Commands

chrome-agent launch [--headless] [--fingerprint PATH] [--port PORT]
chrome-agent status [--port PORT]
chrome-agent session [--port PORT]
chrome-agent help [Domain | Domain.method]
chrome-agent cleanup
Command Description
launch Find Chrome, launch with CDP enabled. Refuses if port is occupied.
status Check if a browser is running on the CDP port.
session Start a persistent CDP session (stdin/stdout).
help Query the browser's protocol schema. Lists domains, commands, events, parameters.
cleanup Remove stale session directories from previous launches.

Interacting with Elements

Agents interact with page elements using a three-step pattern: locate, act, verify.

# Locate -- find element coordinates via JavaScript
chrome-agent Runtime.evaluate '{"expression": "(() => { const r = document.querySelector(\"#submit\").getBoundingClientRect(); return {x: r.x+r.width/2, y: r.y+r.height/2}; })()", "returnByValue": true}'

# Act -- dispatch real input events at those coordinates
chrome-agent Input.dispatchMouseEvent '{"type": "mousePressed", "x": 400, "y": 300, "button": "left", "clickCount": 1}'
chrome-agent Input.dispatchMouseEvent '{"type": "mouseReleased", "x": 400, "y": 300, "button": "left", "clickCount": 1}'

# Verify -- confirm the action worked
chrome-agent Runtime.evaluate '{"expression": "document.title", "returnByValue": true}'

Chrome processes dispatched input events identically to physical input. A human watching the browser sees the cursor move, buttons depress, text highlight, and pages load in real time.

Python API

from chrome_agent.cdp_client import CDPClient, get_ws_url
from chrome_agent.domains.page import Page
from chrome_agent.domains.runtime import Runtime

async with CDPClient(ws_url=get_ws_url(port=9222)) as cdp:
    page = Page(client=cdp)
    runtime = Runtime(client=cdp)

    await page.navigate(url="https://example.com")
    result = await runtime.evaluate(expression="document.title", return_by_value=True)
    print(result["result"]["value"])

54 typed domain classes with snake_case methods generated from Chrome's protocol schema.

Browser Fingerprinting

For sites that detect automated browsers, launch with a fingerprint profile:

chrome-agent launch --fingerprint profile.json
{
    "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 ...",
    "platform": "Linux x86_64",
    "vendor": "Google Inc.",
    "language": "en-US",
    "timezone": "America/Chicago",
    "viewport": {"width": 1920, "height": 1080}
}

Overrides user agent (HTTP header and JavaScript), viewport, language, timezone, navigator.webdriver, navigator.platform, navigator.vendor, and window.chrome. Persists across page navigations.

For AI Agents

See AGENTS.md for concise agent instructions (the standard for AI agent tool documentation). Covers commands, session protocol, interaction patterns, and gotchas.

Collaboration

Multiple participants -- humans, AI agents, or both -- can share a browser simultaneously. Chrome's CDP multiplexes connections: events fan out to all subscribers, DOM mutations are cross-visible, and concurrent access is handled gracefully.

See docs/collaboration-guide.md for:

  • Human-agent collaboration patterns (you browse, agent watches)
  • Agent-driven workflows (agent drives, you supervise)
  • Multi-agent setups (actor + observers)
  • The observation gap (what CDP sees vs what it misses)
  • Full interaction observation via the binding bridge

For real-time observation using Claude Code's Monitor tool, see docs/monitor-integration.md . Includes a ready-to-use observer script with three verbosity tiers and rate limiting for noisy pages.

Requirements

  • Python >= 3.11
  • Google Chrome or Chromium (system-installed)
  • Linux with xdotool (optional, for virtual desktop pinning)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chrome_agent-0.3.0.tar.gz (102.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chrome_agent-0.3.0-py3-none-any.whl (136.8 kB view details)

Uploaded Python 3

File details

Details for the file chrome_agent-0.3.0.tar.gz.

File metadata

  • Download URL: chrome_agent-0.3.0.tar.gz
  • Upload date:
  • Size: 102.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for chrome_agent-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0051a378a8a781393afbaeba7a0b04ef235ed9d6ec88aa8da9dcf79a37f853d3
MD5 57cee2d638a6278d932bb403c3980ea6
BLAKE2b-256 43778e5ec5f08771533deb24dd281715e7d4d7db4a96109eef9a395883fa74e5

See more details on using hashes here.

File details

Details for the file chrome_agent-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: chrome_agent-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 136.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for chrome_agent-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 21d46491bec5eda9a6c0d4a9fbae07b4b95e3784f3d8fa67712d55e7ea0e435e
MD5 2eb02a04b5d84a9334c1cd91642f3e2a
BLAKE2b-256 3dacb64ae3a0510d40f4491308d43f50c3541d08dad97e38e450c671b21bf54f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page