Remote browser inspection and automation tool.
Project description
๐ซฐ Saidkick
A self-hosted sidekick that lets your terminal drive your browser.
Saidkick is a small, opinionated tool that lets scripts, shells, and AI agents drive a real browser end-to-end โ listing tabs, navigating, clicking, typing into any rich-text field, dispatching keyboard events, taking screenshots โ without the overhead of a full headless automation framework. It uses a FastAPI server as a hub and a Chrome extension as the spoke; you run saidkick start, install the extension once, and then every command in every language with an HTTP client can talk to a real, logged-in Chrome session.
It's the right size for terminal-driven debugging, agent automation, and personal scripting โ not quite Playwright (which needs its own browser, its own auth, its own set of tricks to look "real") and not quite a remote-control MCP (which is gated on a specific agent runtime). Saidkick lives in the middle: your browser, your session, your cookies, driven from anywhere with curl.
โก Features
- ๐ฏ Semantic locators. Target elements by what the user sees:
--by-text "Send",--by-label "Password",--by-placeholder "Searchโฆ". Falls back to CSS/XPath when you need precision. - ๐งญ Scroll-into-view.
saidkick scroll --tab $TAB --by-text "Chapter 3"brings an element into the viewport โ essential before screenshotting something offscreen, and handy for pulling more content on infinite-scroll pages. - ๐ด Highlight.
saidkick highlight --tab $TAB --by-text "Deploy"draws a temporary red ring around an element. Use it to point the user at exactly what to click when you're guiding them โ pair withscreenshotand they see the ring in the image. - ๐ค Real keyboard events.
saidkick press Enter --tab $TABdispatches a native CDPInput.dispatchKeyEventโ frameworks (Lexical, ProseMirror, React) treat it as a real keystroke, not a synthesised blob. - ๐ธ Screenshots.
saidkick screenshot --tab $TAB --output /tmp/shot.pngvia CDPPage.captureScreenshot. Optional locator clips to an element;--full-pagecaptures beyond the viewport. - โ๏ธ Rich-text input.
saidkick typeunderstandscontenteditableviadocument.execCommand("insertText", โฆ)โ works on WhatsApp, Slack, Discord, Gmail compose, GitHub comments, Notion, and every other Lexical/ProseMirror/Quill/Slate/Draft-backed editor. - โณ Wait-for-element built in.
--wait-ms Non every selector-using command polls the DOM until it resolves. Default 0 preserves fail-fast behaviour. - ๐งต Multi-browser, multi-tab. Each extension connection gets an ephemeral
br-XXXXID; commands address tabs asbr-XXXX:Ncomposites. Pipe the output ofsaidkick openstraight into the next command. - ๐ก๏ธ CSP bypass. Runs scripts via
chrome.debuggeron pages that block content-script injection. - ๐ Pipe-friendly CLI. One token per stdout (
saidkick openprintsbr-XXXX:N;saidkick screenshotemits raw PNG bytes). Everything composes in bash.
๐ Quickstart
Install
pip install saidkick
Or pull the latest from GitHub:
pip install git+https://github.com/apiad/saidkick.git
Load the extension
- Open
chrome://extensions/in Chrome. - Enable Developer mode.
- Click Load unpacked and point at
src/saidkick/extension/(inside the cloned repo, or inside your installedsaidkickpackage โpython -c "import saidkick, os; print(os.path.dirname(saidkick.__file__) + '/extension')").
The extension connects to ws://localhost:6992/ws and auto-reconnects every 5 seconds if the server comes and goes.
Start the server and drive
# Terminal 1: start the hub
$ saidkick start
# Terminal 2: list connected tabs
$ saidkick tabs
br-a1b2:12 https://example.com/ "Example Domain" (active)
br-a1b2:15 https://docs.python.org/ "Python 3.12 Docs"
# Open a new tab, talk to it, screenshot the result
$ BR=br-a1b2
$ TAB=$(saidkick open --browser "$BR" https://example.com/)
$ saidkick text --tab "$TAB" --css "h1"
Example Domain
$ saidkick screenshot --tab "$TAB" --output /tmp/shot.png
Wrote 28934 bytes to /tmp/shot.png
๐ฎ Driving a chat app end-to-end
No exec, no selector archaeology โ just semantic locators and a keystroke:
TAB=br-a1b2:15
saidkick click --tab "$TAB" --by-text "Alice Chen"
saidkick type "Hello Alice" --tab "$TAB" --by-label "Type a message"
saidkick press Enter --tab "$TAB"
saidkick screenshot --tab "$TAB" --output /tmp/sent.png
That's WhatsApp Web, Slack, Discord, Gmail compose, or any similar app, in four lines.
๐งญ Pointing the user at something
When an agent is guiding the user through an app, it often needs to say "click this button." Two primitives make that precise:
# Scroll the element into view (it may be offscreen)
saidkick scroll --tab "$TAB" --by-text "Deploy"
# Draw a temporary red ring around it (default 2s)
saidkick highlight --tab "$TAB" --by-text "Deploy"
# Screenshot so the user sees the ring in the image too
saidkick screenshot --tab "$TAB" --output /tmp/click-this.png
Good uses:
- "Click that button" โ highlight + screenshot + send the image to the user.
- "The error is in this field" โ
highlight --color "#f59e0b"(amber) on a form field the user needs to correct. - Pre-screenshot framing โ
scrollbeforescreenshotso what you want to capture is actually in the viewport. - Checklist walkthroughs โ highlight each step as you narrate it; use
--duration-ms 0to keep the ring up until you place the next one. - Infinite-scroll content extraction โ scroll to the last visible item, wait for more to load, repeat.
scroll takes --block {center|start|end|nearest} and --behavior {auto|smooth}. highlight takes --color (any CSS color) and --duration-ms (0 = persist until page reload).
๐งญ Command reference
| Command | What it does |
|---|---|
saidkick start |
Start the FastAPI hub (defaults to 0.0.0.0:6992). |
saidkick tabs |
List tabs across connected browsers (--active filter). |
saidkick find --tab T --by-text X |
Return JSON list of matching elements (debug). |
saidkick dom --tab T --css X |
Outer-HTML of matched element(s). |
saidkick text --tab T [--css X] |
innerText of the tab or a scoped region. |
saidkick click --tab T --by-text X |
Click. |
saidkick type "msg" --tab T --by-label X |
Type (contenteditable-aware). |
saidkick select "value" --tab T --css X |
Select an <option>. |
saidkick press Enter --tab T [--mod ctrl,shift] |
Dispatch a keyboard event. |
saidkick scroll --tab T --by-text X [--block center|start|end] |
Scroll element into view. |
saidkick highlight --tab T --by-text X [--color red] [--duration-ms N] |
Temporary ring around an element. |
saidkick screenshot --tab T [--output PATH] |
Capture PNG. |
saidkick navigate URL --tab T [--wait dom|full|none] |
Redirect a tab. |
saidkick open URL --browser BR |
New tab; prints the composite br-XXXX:N. |
saidkick exec --tab T "return โฆ" |
Arbitrary JS via CDP (must return a value). |
saidkick logs [--grep X] [--browser BR] |
Console-log buffer. |
Every selector-using command accepts the same locator options: --css, --xpath, --by-text, --by-label, --by-placeholder, --within-css, --nth, --exact, --regex, --wait-ms. Exactly one locator must be set (400 otherwise).
๐ Python client
Everything the CLI does is also available as a library:
from saidkick.client import SaidkickClient
c = SaidkickClient()
tabs = c.list_tabs(active=True)
tab = tabs[0]["tab"]
# Search for something on DuckDuckGo
c.type(tab, "saidkick", css="input[name=q]")
c.press(tab, "Enter")
# Screenshot the results
shot = c.screenshot(tab)
import base64; open("/tmp/ddg.png", "wb").write(base64.b64decode(shot["png_base64"]))
๐งฑ Architecture
Hub-and-spoke. The FastAPI server is the hub; the Chrome extension (MV3) is the spoke.
โโโโโโโโโโโโโโโโ WebSocket โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your CLI/ โโโโโโโโโโถ hub โโโโโโโโโ Chrome MV3 extension โ
โ agent/script โ REST โ โข service worker โ
โโโโโโโโโโโโโโโโ โ โข content + main-world โ
โ โข popup w/ reconnect โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- The hub is stateless between restarts except for a circular log buffer and the set of live WebSocket connections.
- The spoke stores an ephemeral
br-XXXXID on handshake, runs content scripts in every tab on demand (with lazy injection fallback), and drives CDP viachrome.debuggerfor JS execution, keyboard events, screenshots, and page-load waits. - Tabs are addressed by the composite
br-XXXX:Nโbr-XXXXidentifies the browser connection;Nis Chrome's nativetab.id.
The extension popup shows current connection state and a reconnect button โ useful when the MV3 service worker goes idle.
๐ Docs
- User Guide โ full CLI / REST / client reference.
- Design Doc โ architecture, error policy, protocol details.
- Deploy Guide โ server + extension setup.
- SKILL.md โ how an AI agent should use saidkick.
- CHANGELOG โ release history.
๐ค Why saidkick (vs. โฆ)
- vs. Playwright / Selenium. Those spawn their own browser with a fresh profile โ no cookies, no logins, no browser extensions. Saidkick drives your Chrome, logged in, with the session state you already have. Trade-off: you're automating the real thing, so destructive actions are real.
- vs.
claude-in-chrome/ MCP browser tools. Saidkick is self-hosted and agent-agnostic. Anything with an HTTP client can use it โ shell scripts, cron jobs, arbitrary Python, any LLM runtime. Not gated on a specific agent host or credential. - vs. raw Chrome DevTools Protocol. CDP is powerful but verbose. Saidkick wraps the patterns you actually use (locators, keyboard, screenshots, waits) behind one-line CLI commands.
๐ ๏ธ Development
git clone https://github.com/apiad/saidkick
cd saidkick
uv sync --all-groups
uv run pytest -m "not e2e" # unit + integration
uv run saidkick start # hub
๐ License
MIT โ see LICENSE if present, otherwise standard MIT applies.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file saidkick-0.4.3.tar.gz.
File metadata
- Download URL: saidkick-0.4.3.tar.gz
- Upload date:
- Size: 132.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f86baeebe7c421b40f12bc1664e75f8226203b5830011ec177d306a9f7e43007
|
|
| MD5 |
3ad4d846658e3c8ca01633f84e7b0378
|
|
| BLAKE2b-256 |
9c4e4cc54fb3fa5fce7edc43e81e4129288c03d7f1c54e161a39254049e0edb3
|
File details
Details for the file saidkick-0.4.3-py3-none-any.whl.
File metadata
- Download URL: saidkick-0.4.3-py3-none-any.whl
- Upload date:
- Size: 26.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b804e94c5432eb7bd4996d79517514be492db7a9d4ea46dc643e39d1ccf7a9b1
|
|
| MD5 |
28af8a019dc5e26e25f9999269575319
|
|
| BLAKE2b-256 |
8184b7de393e5d3878af49b940be50c9055730c0ee797d816cc3f5a55e2b1741
|