CLI tool for AI agents to observe and interact with Chrome via CDP

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

chrome-agent

A CLI tool that gives AI coding agents the ability to observe and interact with Chrome browsers via the Chrome DevTools Protocol.

Multiple agents and humans can share the same browser simultaneously, each with isolated event subscriptions. One agent drives while another observes network traffic. A human browses while an agent watches for errors. Four agents run a coordinated test suite against a single browser. Each participant sees only the events they subscribed to -- no interference.

Why this exists

AI coding agents need to see and interact with browsers -- to test their code, debug automation, inspect page state. The standard approach (browser MCP tools) uses a persistent server with protocol negotiation and verbose response formatting. chrome-agent takes a different approach: direct access to Chrome's DevTools Protocol with no abstraction layer.

This means full CDP protocol access -- every command, every event, every domain Chrome exposes. Not a curated subset of capabilities, but the complete protocol. Agents compose interactions from CDP primitives the same way DevTools does.

Tracks the running browser, not its own version

Because there is no abstraction layer, chrome-agent tracks your browser rather than its own release. The CLI sends the method name and parameters you give it straight to Chrome and streams back the events you subscribe to -- nothing is validated against a bundled schema. So any command, event, or domain your installed Chrome supports just works, including protocol surface added after the version of chrome-agent you installed. There is no curated subset to fall behind.

For example, the CrashReportContext and WebMCP domains (added to CDP in later Chrome releases) are both absent from an older chrome-agent's typed bindings, yet a method on one returns a normal result through the CLI, with no change to chrome-agent:

chrome-agent myproject-01 CrashReportContext.getEntries
# {"entries": []}

The one point-in-time artifact is the typed Python classes (see Python API) -- an optional convenience layer that snapshots the schema at generation time. They never gate access: CDPClient.send(method=..., params=...) reaches any method regardless. And help reads the protocol schema live from the running browser, so its documentation is always as current as your Chrome.

Installation

uv tool install chrome-agent

Or add to a project:

uv add chrome-agent

Requires Google Chrome or Chromium installed on the system. Single runtime dependency (websockets). No Playwright, no browser downloads.

Quick Start

# Launch a browser -- auto-allocates a port and names the instance
chrome-agent launch
# {"name": "myproject-01", "port": 9222, "pid": 58469, "browser_version": "Chrome/147"}

# Check what's running
chrome-agent status
# myproject-01  port 9222
#   [1] 956FD3C2  https://example.com  "Example Domain"

# Read the page title
chrome-agent myproject-01 Runtime.evaluate '{"expression": "document.title", "returnByValue": true}'

# Navigate
chrome-agent myproject-01 Page.navigate '{"url": "https://example.com"}'

# Take a screenshot (returns base64 PNG in JSON)
chrome-agent myproject-01 Page.captureScreenshot '{"format": "png"}'

# Discover available commands
chrome-agent help myproject-01 Page
chrome-agent help myproject-01 Page.navigate

# Stop the browser when done
chrome-agent stop myproject-01

Two Channels

chrome-agent uses a two-channel pattern for browser interaction:

One-shot mode (commands)

Send a single CDP command. Connects, sends, prints JSON response, disconnects.

chrome-agent <instance> Domain.method '{"param": "value"}'

Good for spot checks, screenshots, quick queries. ~50-80ms per call. If only one instance is running, the instance name can be omitted.

Attach mode (events)

Persistent connection with isolated event subscriptions. Streams events to stdout as JSON lines.

chrome-agent attach <instance> +Page.loadEventFired +Network.requestWillBeSent

Run it in the background while sending one-shot commands:

# Background: observe events
chrome-agent attach myproject-01 +Page.loadEventFired +Network.requestWillBeSent > /tmp/events.jsonl &

# Foreground: send commands -- events appear in the attach stream
chrome-agent myproject-01 Page.navigate '{"url": "https://example.com"}'

Subscribe to exactly the events you need. Each attach session is isolated -- subscribing to Network events in one session does not affect other sessions.

Operational Commands

chrome-agent launch [--headless] [--fingerprint PATH] [--port PORT] [--no-window-border]
chrome-agent status [<instance>]
chrome-agent attach <instance> [+Event ...] [--target SPEC] [--url SUBSTRING]
chrome-agent stop <instance> [--target SPEC] [--url SUBSTRING]
chrome-agent help [<instance>] [Domain | Domain.method]
chrome-agent cleanup
chrome-agent --version

Command	Description
`launch`	Find Chrome, launch with CDP enabled. Auto-allocates a port and names the instance from the current directory.
`status`	List running instances with their page targets (IDs, URLs, titles).
`attach`	Persistent event observation with isolated subscriptions. Use `--target N` or `--url substring` for multi-tab browsers.
`stop`	Gracefully shut down a browser instance (`Browser.close`) or close a specific tab (`Target.closeTarget`). Use `--target` or `--url` to close a single tab without affecting the browser.
`help`	Query the browser's protocol schema. Lists domains, commands, events, parameters.
`cleanup`	Remove stale instances (dead browsers) and their session directories.
`--version`	Print the installed chrome-agent version (`-V` alias) and exit.

Instances are tracked in a registry at /tmp/chrome-agent/registry.json. A headed browser's instance is automatically removed from the registry when its window is closed (its session directory is cleaned up too), so status reflects what is actually running. Liveness is determined by whether the browser's CDP port is reachable -- not just the launched PID -- so browsers started via wrapper/snap launchers (which fork the real browser into another process) are reported correctly. A transient connection drop does not retire a live instance: a host suspend/resume severs the supervisor's CDP connection while Chrome keeps running, so the supervisor reconnects and keeps supervising; retirement happens only once the CDP port stops listening. cleanup removes any entries that remain (headless instances, or browsers that were killed abruptly).

Interacting with Elements

Agents interact with page elements using a three-step pattern: locate, act, verify.

# Locate -- find element coordinates via JavaScript
chrome-agent myproject-01 Runtime.evaluate '{"expression": "(() => { const r = document.querySelector(\"#submit\").getBoundingClientRect(); return {x: r.x+r.width/2, y: r.y+r.height/2}; })()", "returnByValue": true}'

# Act -- dispatch real input events at those coordinates
chrome-agent myproject-01 Input.dispatchMouseEvent '{"type": "mousePressed", "x": 400, "y": 300, "button": "left", "clickCount": 1}'
chrome-agent myproject-01 Input.dispatchMouseEvent '{"type": "mouseReleased", "x": 400, "y": 300, "button": "left", "clickCount": 1}'

# Verify -- confirm the action worked
chrome-agent myproject-01 Runtime.evaluate '{"expression": "document.title", "returnByValue": true}'

Chrome processes dispatched input events identically to physical input. A human watching the browser sees the cursor move, buttons depress, text highlight, and pages load in real time.

Python API

from chrome_agent.cdp_client import CDPClient, get_ws_url
from chrome_agent.domains.page import Page
from chrome_agent.domains.runtime import Runtime

async with CDPClient(ws_url=get_ws_url(port=9222)) as cdp:
    page = Page(client=cdp)
    runtime = Runtime(client=cdp)

    await page.navigate(url="https://example.com")
    result = await runtime.evaluate(expression="document.title", return_by_value=True)
    print(result["result"]["value"])

54 typed domain classes with snake_case methods, generated from Chrome's protocol schema. They are an optional convenience layer -- a point-in-time snapshot, not a gate. For any method newer than the snapshot, call CDPClient.send(method=..., params=...) directly (see Tracks the running browser, not its own version).

Window Border

So you can tell an agent-driven window apart from your own Chrome windows, every launched browser is marked by default: a colored border + corner badge around each tab (a stable, per-instance color derived from the instance name), and a title prefix so the window reads as 🤖 <instance> — <the page's own title> in the taskbar / Alt-Tab. The marker is drawn in a closed shadow DOM and adds no automation-detection signal (verified against bot.sannysoft.com and CreepJS).

chrome-agent launch                     # marked (default)
chrome-agent launch --no-window-border  # no marker

The marker is suppressed automatically when running --headless (no visible window) or with --fingerprint (the in-page marker is page-observable, and stealth is the point on the sites where fingerprinting is used).

Browser Fingerprinting

For sites that detect automated browsers, launch with a fingerprint profile:

chrome-agent launch --fingerprint profile.json

{
    "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 ...",
    "platform": "Linux x86_64",
    "vendor": "Google Inc.",
    "language": "en-US",
    "timezone": "America/Chicago",
    "viewport": {"width": 1920, "height": 1080}
}

Spoofs the user agent (HTTP header and JavaScript), viewport, language, and timezone via Chrome launch flags -- persistent across navigations, with no JavaScript injection.

It deliberately does not patch navigator.webdriver, navigator.platform, navigator.vendor, or window.chrome. An empirical detection audit found those JS overrides are each independently detectable and make the browser more detectable, not less: they flip bot.sannysoft.com's WebDriver test from pass to fail (the override makes navigator.webdriver an own property) and raise CreepJS's headless score. A plain CDP-attached Chrome already reports the native navigator.webdriver === false and keeps the genuine window.chrome shape, so the cleanest profile is one that leaves the JS environment untouched. A profile's platform/vendor should match the host OS (they are retained in the schema but not spoofed). Note that WebRTC can still leak the real IP regardless of profile.

For AI Agents

See AGENTS.md for concise agent instructions (the standard for AI agent tool documentation). It covers the mental model (address an instance, send any CDP command), the sense ⇄ act loop, the two channels, the command reference, and gotchas.

AGENTS.md is a tailorable example, not gospel. Its mechanics are exact, but the operating judgment in it is general -- adapt it to your own sites, tasks, and constraints. A good pattern is to keep a private, project-specific layer on top -- site-specific field notes, extraction playbooks, hard-won gotchas -- that references this public manual and extends it, rather than forking a separate set of instructions. Grow yours the same way.

Collaboration

Multiple participants -- humans, AI agents, or both -- can share a browser simultaneously. Each participant creates an independent CDP session with isolated event subscriptions. One agent enabling Network observation does not flood another agent's event stream.

See docs/collaboration-guide.md for:

Human-agent collaboration patterns (you browse, agent watches)
Agent-driven workflows (agent drives, you supervise)
Multi-agent setups with isolated event subscriptions
The observation gap (what CDP sees vs what it misses)
Full interaction observation via the binding bridge

For real-time observation using Claude Code's Monitor tool, see docs/monitor-integration.md.

Requirements

Python >= 3.11
Google Chrome or Chromium (system-installed)
Linux with xdotool (optional, for virtual desktop pinning)

Releasing (maintainer)

Releases are cut from the project root with release X.Y.Z (or release for an interactive version prompt). The tool bumps pyproject.toml, commits, tags, pushes -- which triggers the PyPI publish workflow via GitHub Actions Trusted Publishing. Release notes auto-generate from commit messages between tags, so commits should read well as changelog entries.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

captivus

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.5.3

Jul 13, 2026

0.5.2

Jul 9, 2026

0.5.1

Jun 18, 2026

0.5.0

Jun 17, 2026

0.4.1

Apr 22, 2026

0.4.0

Apr 15, 2026

0.3.0

Apr 14, 2026

0.2.1

Apr 11, 2026

0.2.0

Apr 11, 2026

0.1.0

Apr 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chrome_agent-0.5.3.tar.gz (115.1 kB view details)

Uploaded Jul 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chrome_agent-0.5.3-py3-none-any.whl (151.0 kB view details)

Uploaded Jul 13, 2026 Python 3

File details

Details for the file chrome_agent-0.5.3.tar.gz.

File metadata

Download URL: chrome_agent-0.5.3.tar.gz
Upload date: Jul 13, 2026
Size: 115.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.28 {"installer":{"name":"uv","version":"0.11.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for chrome_agent-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`2c9d38f2e0dbc637066ccbc2b2d1e46d94097fba2f1d9861cde778012adc936d`
MD5	`ff193dc6ee6b351f0d754b63fe555d28`
BLAKE2b-256	`496b2aea052cf394798e76e929bcbd937f94e9dd0a0d34dc285852cee0ce38c6`

See more details on using hashes here.

File details

Details for the file chrome_agent-0.5.3-py3-none-any.whl.

File metadata

Download URL: chrome_agent-0.5.3-py3-none-any.whl
Upload date: Jul 13, 2026
Size: 151.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.28 {"installer":{"name":"uv","version":"0.11.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for chrome_agent-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`859336d82dbf7fe5f472ab7618a6406d33333dfdcb80cb11b0038facddb06929`
MD5	`d203ba4a5fa21f3f68261b80af4e3cac`
BLAKE2b-256	`0ee414bd64f438a7a5465f6dd6029f8e2b57059272559b1b4a76c0bc2a8b2668`

See more details on using hashes here.

chrome-agent 0.5.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

chrome-agent

Why this exists

Tracks the running browser, not its own version

Installation

Quick Start

Two Channels

One-shot mode (commands)

Attach mode (events)

Operational Commands

Interacting with Elements

Python API

Window Border

Browser Fingerprinting

For AI Agents

Collaboration

Requirements

Releasing (maintainer)

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes