Skip to main content

Remote browser inspection and automation tool.

Project description

๐Ÿซฐ Saidkick

A self-hosted sidekick that lets your terminal drive your browser.

PyPI - Version PyPi - Python Version Github - Open Issues Github - Commits


Saidkick is a small, opinionated tool that lets scripts, shells, and AI agents drive a real browser end-to-end โ€” listing tabs, navigating, clicking, typing into any rich-text field, dispatching keyboard events, taking screenshots โ€” without the overhead of a full headless automation framework. It uses a FastAPI server as a hub and a Chrome extension as the spoke; you run saidkick start, install the extension once, and then every command in every language with an HTTP client can talk to a real, logged-in Chrome session.

It's the right size for terminal-driven debugging, agent automation, and personal scripting โ€” not quite Playwright (which needs its own browser, its own auth, its own set of tricks to look "real") and not quite a remote-control MCP (which is gated on a specific agent runtime). Saidkick lives in the middle: your browser, your session, your cookies, driven from anywhere with curl.

โšก Features

  • ๐ŸŽฏ Semantic locators. Target elements by what the user sees: --by-text "Send", --by-label "Password", --by-placeholder "Searchโ€ฆ". Falls back to CSS/XPath when you need precision.
  • ๐Ÿงญ Scroll-into-view. saidkick scroll --tab $TAB --by-text "Chapter 3" brings an element into the viewport โ€” essential before screenshotting something offscreen, and handy for pulling more content on infinite-scroll pages.
  • ๐Ÿ”ด Highlight. saidkick highlight --tab $TAB --by-text "Deploy" draws a temporary red ring around an element. Use it to point the user at exactly what to click when you're guiding them โ€” pair with screenshot and they see the ring in the image.
  • ๐Ÿ”ค Real keyboard events. saidkick press Enter --tab $TAB dispatches a native CDP Input.dispatchKeyEvent โ€” frameworks (Lexical, ProseMirror, React) treat it as a real keystroke, not a synthesised blob.
  • ๐Ÿ“ธ Screenshots. saidkick screenshot --tab $TAB --output /tmp/shot.png via CDP Page.captureScreenshot. Optional locator clips to an element; --full-page captures beyond the viewport.
  • โœ๏ธ Rich-text input. saidkick type understands contenteditable via document.execCommand("insertText", โ€ฆ) โ€” works on WhatsApp, Slack, Discord, Gmail compose, GitHub comments, Notion, and every other Lexical/ProseMirror/Quill/Slate/Draft-backed editor.
  • โณ Wait-for-element built in. --wait-ms N on every selector-using command polls the DOM until it resolves. Default 0 preserves fail-fast behaviour.
  • ๐Ÿงต Multi-browser, multi-tab. Each extension connection gets an ephemeral br-XXXX ID; commands address tabs as br-XXXX:N composites. Pipe the output of saidkick open straight into the next command.
  • ๐Ÿ›ก๏ธ CSP bypass. Runs scripts via chrome.debugger on pages that block content-script injection.
  • ๐Ÿš Pipe-friendly CLI. One token per stdout (saidkick open prints br-XXXX:N; saidkick screenshot emits raw PNG bytes). Everything composes in bash.

๐Ÿš€ Quickstart

Install

pip install saidkick

Or pull the latest from GitHub:

pip install git+https://github.com/apiad/saidkick.git

Load the extension

  1. Open chrome://extensions/ in Chrome.
  2. Enable Developer mode.
  3. Click Load unpacked and point at src/saidkick/extension/ (inside the cloned repo, or inside your installed saidkick package โ€” python -c "import saidkick, os; print(os.path.dirname(saidkick.__file__) + '/extension')").

The extension connects to ws://localhost:6992/ws and auto-reconnects every 5 seconds if the server comes and goes.

Start the server and drive

# Terminal 1: start the hub
$ saidkick start

# Terminal 2: list connected tabs
$ saidkick tabs
br-a1b2:12  https://example.com/  "Example Domain"  (active)
br-a1b2:15  https://docs.python.org/  "Python 3.12 Docs"

# Open a new tab, talk to it, screenshot the result
$ BR=br-a1b2
$ TAB=$(saidkick open --browser "$BR" https://example.com/)
$ saidkick text --tab "$TAB" --css "h1"
Example Domain
$ saidkick screenshot --tab "$TAB" --output /tmp/shot.png
Wrote 28934 bytes to /tmp/shot.png

๐ŸŽฎ Driving a chat app end-to-end

No exec, no selector archaeology โ€” just semantic locators and a keystroke:

TAB=br-a1b2:15
saidkick click  --tab "$TAB" --by-text "Alice Chen"
saidkick type   "Hello Alice" --tab "$TAB" --by-label "Type a message"
saidkick press  Enter --tab "$TAB"
saidkick screenshot --tab "$TAB" --output /tmp/sent.png

That's WhatsApp Web, Slack, Discord, Gmail compose, or any similar app, in four lines.

๐Ÿงญ Pointing the user at something

When an agent is guiding the user through an app, it often needs to say "click this button." Two primitives make that precise:

# Scroll the element into view (it may be offscreen)
saidkick scroll --tab "$TAB" --by-text "Deploy"

# Draw a temporary red ring around it (default 2s)
saidkick highlight --tab "$TAB" --by-text "Deploy"

# Screenshot so the user sees the ring in the image too
saidkick screenshot --tab "$TAB" --output /tmp/click-this.png

Good uses:

  • "Click that button" โ€” highlight + screenshot + send the image to the user.
  • "The error is in this field" โ€” highlight --color "#f59e0b" (amber) on a form field the user needs to correct.
  • Pre-screenshot framing โ€” scroll before screenshot so what you want to capture is actually in the viewport.
  • Checklist walkthroughs โ€” highlight each step as you narrate it; use --duration-ms 0 to keep the ring up until you place the next one.
  • Infinite-scroll content extraction โ€” scroll to the last visible item, wait for more to load, repeat.

scroll takes --block {center|start|end|nearest} and --behavior {auto|smooth}. highlight takes --color (any CSS color) and --duration-ms (0 = persist until page reload).

๐Ÿงญ Command reference

Command What it does
saidkick start Start the FastAPI hub (defaults to 0.0.0.0:6992).
saidkick tabs List tabs across connected browsers (--active filter).
saidkick find --tab T --by-text X Return JSON list of matching elements (debug).
saidkick dom --tab T --css X Outer-HTML of matched element(s).
saidkick text --tab T [--css X] innerText of the tab or a scoped region.
saidkick click --tab T --by-text X Click.
saidkick type "msg" --tab T --by-label X Type (contenteditable-aware).
saidkick select "value" --tab T --css X Select an <option>.
saidkick press Enter --tab T [--mod ctrl,shift] Dispatch a keyboard event.
saidkick scroll --tab T --by-text X [--block center|start|end] Scroll element into view.
saidkick highlight --tab T --by-text X [--color red] [--duration-ms N] Temporary ring around an element.
saidkick screenshot --tab T [--output PATH] Capture PNG.
saidkick navigate URL --tab T [--wait dom|full|none] Redirect a tab.
saidkick open URL --browser BR New tab; prints the composite br-XXXX:N.
saidkick exec --tab T "return โ€ฆ" Arbitrary JS via CDP (must return a value).
saidkick logs [--grep X] [--browser BR] Console-log buffer.

Every selector-using command accepts the same locator options: --css, --xpath, --by-text, --by-label, --by-placeholder, --within-css, --nth, --exact, --regex, --wait-ms. Exactly one locator must be set (400 otherwise).

๐Ÿ Python client

Everything the CLI does is also available as a library:

from saidkick.client import SaidkickClient
c = SaidkickClient()

tabs = c.list_tabs(active=True)
tab = tabs[0]["tab"]

# Search for something on DuckDuckGo
c.type(tab, "saidkick", css="input[name=q]")
c.press(tab, "Enter")

# Screenshot the results
shot = c.screenshot(tab)
import base64; open("/tmp/ddg.png", "wb").write(base64.b64decode(shot["png_base64"]))

๐Ÿงฑ Architecture

Hub-and-spoke. The FastAPI server is the hub; the Chrome extension (MV3) is the spoke.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      WebSocket       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Your CLI/    โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ hub โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ Chrome MV3 extension       โ”‚
โ”‚ agent/script โ”‚    REST              โ”‚  โ€ข service worker          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                      โ”‚  โ€ข content + main-world    โ”‚
                                      โ”‚  โ€ข popup w/ reconnect      โ”‚
                                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  • The hub is stateless between restarts except for a circular log buffer and the set of live WebSocket connections.
  • The spoke stores an ephemeral br-XXXX ID on handshake, runs content scripts in every tab on demand (with lazy injection fallback), and drives CDP via chrome.debugger for JS execution, keyboard events, screenshots, and page-load waits.
  • Tabs are addressed by the composite br-XXXX:N โ€” br-XXXX identifies the browser connection; N is Chrome's native tab.id.

The extension popup shows current connection state and a reconnect button โ€” useful when the MV3 service worker goes idle.

๐Ÿ“– Docs

  • User Guide โ€” full CLI / REST / client reference.
  • Design Doc โ€” architecture, error policy, protocol details.
  • Deploy Guide โ€” server + extension setup.
  • SKILL.md โ€” how an AI agent should use saidkick.
  • CHANGELOG โ€” release history.

๐Ÿค Why saidkick (vs. โ€ฆ)

  • vs. Playwright / Selenium. Those spawn their own browser with a fresh profile โ€” no cookies, no logins, no browser extensions. Saidkick drives your Chrome, logged in, with the session state you already have. Trade-off: you're automating the real thing, so destructive actions are real.
  • vs. claude-in-chrome / MCP browser tools. Saidkick is self-hosted and agent-agnostic. Anything with an HTTP client can use it โ€” shell scripts, cron jobs, arbitrary Python, any LLM runtime. Not gated on a specific agent host or credential.
  • vs. raw Chrome DevTools Protocol. CDP is powerful but verbose. Saidkick wraps the patterns you actually use (locators, keyboard, screenshots, waits) behind one-line CLI commands.

๐Ÿ› ๏ธ Development

git clone https://github.com/apiad/saidkick
cd saidkick
uv sync --all-groups
uv run pytest -m "not e2e"   # unit + integration
uv run saidkick start        # hub

๐Ÿ“œ License

MIT โ€” see LICENSE if present, otherwise standard MIT applies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saidkick-0.4.4.tar.gz (138.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

saidkick-0.4.4-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file saidkick-0.4.4.tar.gz.

File metadata

  • Download URL: saidkick-0.4.4.tar.gz
  • Upload date:
  • Size: 138.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for saidkick-0.4.4.tar.gz
Algorithm Hash digest
SHA256 4556e527849e90b71f2a7472035cffc6a6c2e29230acf00212bb82b88ca02a8d
MD5 5fb412ea4ade71f932092f476e94d04a
BLAKE2b-256 cf290ae014e0a141a9828b8e4a8cc170edfa7f60ef09251933de537606fdcb39

See more details on using hashes here.

File details

Details for the file saidkick-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: saidkick-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for saidkick-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 deebd4b2a1c9dc5c6d65964855a96f437b90af6583c78c7c2d317161fc57f0de
MD5 82c0ae9c00ac799ae715d7140ad4e6c0
BLAKE2b-256 f7b10855027048e0b94a7ab89bc031bd447fe0e1951ae1a8c5a1529742d185f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page