Skip to main content

Remote browser inspection and automation tool.

Project description

๐Ÿซฐ Saidkick

A self-hosted sidekick that lets your terminal drive your browser.

PyPI - Version PyPi - Python Version Github - Open Issues Github - Commits


Saidkick is a small, opinionated tool that lets scripts, shells, and AI agents drive a real browser end-to-end โ€” listing tabs, navigating, clicking, typing into any rich-text field, dispatching keyboard events, taking screenshots โ€” without the overhead of a full headless automation framework. It uses a FastAPI server as a hub and a Chrome extension as the spoke; you run saidkick start, install the extension once, and then every command in every language with an HTTP client can talk to a real, logged-in Chrome session.

It's the right size for terminal-driven debugging, agent automation, and personal scripting โ€” not quite Playwright (which needs its own browser, its own auth, its own set of tricks to look "real") and not quite a remote-control MCP (which is gated on a specific agent runtime). Saidkick lives in the middle: your browser, your session, your cookies, driven from anywhere with curl.

โšก Features

  • ๐ŸŽฏ Semantic locators. Target elements by what the user sees: --by-text "Send", --by-label "Password", --by-placeholder "Searchโ€ฆ". Falls back to CSS/XPath when you need precision.
  • ๐Ÿ”ค Real keyboard events. saidkick press Enter --tab $TAB dispatches a native CDP Input.dispatchKeyEvent โ€” frameworks (Lexical, ProseMirror, React) treat it as a real keystroke, not a synthesised blob.
  • ๐Ÿ“ธ Screenshots. saidkick screenshot --tab $TAB --output /tmp/shot.png via CDP Page.captureScreenshot. Optional locator clips to an element; --full-page captures beyond the viewport.
  • โœ๏ธ Rich-text input. saidkick type understands contenteditable via document.execCommand("insertText", โ€ฆ) โ€” works on WhatsApp, Slack, Discord, Gmail compose, GitHub comments, Notion, and every other Lexical/ProseMirror/Quill/Slate/Draft-backed editor.
  • โณ Wait-for-element built in. --wait-ms N on every selector-using command polls the DOM until it resolves. Default 0 preserves fail-fast behaviour.
  • ๐Ÿงต Multi-browser, multi-tab. Each extension connection gets an ephemeral br-XXXX ID; commands address tabs as br-XXXX:N composites. Pipe the output of saidkick open straight into the next command.
  • ๐Ÿ›ก๏ธ CSP bypass. Runs scripts via chrome.debugger on pages that block content-script injection.
  • ๐Ÿš Pipe-friendly CLI. One token per stdout (saidkick open prints br-XXXX:N; saidkick screenshot emits raw PNG bytes). Everything composes in bash.

๐Ÿš€ Quickstart

Install

pip install saidkick

Or pull the latest from GitHub:

pip install git+https://github.com/apiad/saidkick.git

Load the extension

  1. Open chrome://extensions/ in Chrome.
  2. Enable Developer mode.
  3. Click Load unpacked and point at src/saidkick/extension/ (inside the cloned repo, or inside your installed saidkick package โ€” python -c "import saidkick, os; print(os.path.dirname(saidkick.__file__) + '/extension')").

The extension connects to ws://localhost:6992/ws and auto-reconnects every 5 seconds if the server comes and goes.

Start the server and drive

# Terminal 1: start the hub
$ saidkick start

# Terminal 2: list connected tabs
$ saidkick tabs
br-a1b2:12  https://example.com/  "Example Domain"  (active)
br-a1b2:15  https://docs.python.org/  "Python 3.12 Docs"

# Open a new tab, talk to it, screenshot the result
$ BR=br-a1b2
$ TAB=$(saidkick open --browser "$BR" https://example.com/)
$ saidkick text --tab "$TAB" --css "h1"
Example Domain
$ saidkick screenshot --tab "$TAB" --output /tmp/shot.png
Wrote 28934 bytes to /tmp/shot.png

๐ŸŽฎ Driving a chat app end-to-end

No exec, no selector archaeology โ€” just semantic locators and a keystroke:

TAB=br-a1b2:15
saidkick click  --tab "$TAB" --by-text "Alice Chen"
saidkick type   "Hello Alice" --tab "$TAB" --by-label "Type a message"
saidkick press  Enter --tab "$TAB"
saidkick screenshot --tab "$TAB" --output /tmp/sent.png

That's WhatsApp Web, Slack, Discord, Gmail compose, or any similar app, in four lines.

๐Ÿงญ Command reference

Command What it does
saidkick start Start the FastAPI hub (defaults to 0.0.0.0:6992).
saidkick tabs List tabs across connected browsers (--active filter).
saidkick find --tab T --by-text X Return JSON list of matching elements (debug).
saidkick dom --tab T --css X Outer-HTML of matched element(s).
saidkick text --tab T [--css X] innerText of the tab or a scoped region.
saidkick click --tab T --by-text X Click.
saidkick type "msg" --tab T --by-label X Type (contenteditable-aware).
saidkick select "value" --tab T --css X Select an <option>.
saidkick press Enter --tab T [--mod ctrl,shift] Dispatch a keyboard event.
saidkick screenshot --tab T [--output PATH] Capture PNG.
saidkick navigate URL --tab T [--wait dom|full|none] Redirect a tab.
saidkick open URL --browser BR New tab; prints the composite br-XXXX:N.
saidkick exec --tab T "return โ€ฆ" Arbitrary JS via CDP (must return a value).
saidkick logs [--grep X] [--browser BR] Console-log buffer.

Every selector-using command accepts the same locator options: --css, --xpath, --by-text, --by-label, --by-placeholder, --within-css, --nth, --exact, --regex, --wait-ms. Exactly one locator must be set (400 otherwise).

๐Ÿ Python client

Everything the CLI does is also available as a library:

from saidkick.client import SaidkickClient
c = SaidkickClient()

tabs = c.list_tabs(active=True)
tab = tabs[0]["tab"]

# Search for something on DuckDuckGo
c.type(tab, "saidkick", css="input[name=q]")
c.press(tab, "Enter")

# Screenshot the results
shot = c.screenshot(tab)
import base64; open("/tmp/ddg.png", "wb").write(base64.b64decode(shot["png_base64"]))

๐Ÿงฑ Architecture

Hub-and-spoke. The FastAPI server is the hub; the Chrome extension (MV3) is the spoke.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      WebSocket       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Your CLI/    โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ hub โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ Chrome MV3 extension       โ”‚
โ”‚ agent/script โ”‚    REST              โ”‚  โ€ข service worker          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                      โ”‚  โ€ข content + main-world    โ”‚
                                      โ”‚  โ€ข popup w/ reconnect      โ”‚
                                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  • The hub is stateless between restarts except for a circular log buffer and the set of live WebSocket connections.
  • The spoke stores an ephemeral br-XXXX ID on handshake, runs content scripts in every tab on demand (with lazy injection fallback), and drives CDP via chrome.debugger for JS execution, keyboard events, screenshots, and page-load waits.
  • Tabs are addressed by the composite br-XXXX:N โ€” br-XXXX identifies the browser connection; N is Chrome's native tab.id.

The extension popup shows current connection state and a reconnect button โ€” useful when the MV3 service worker goes idle.

๐Ÿ“– Docs

  • User Guide โ€” full CLI / REST / client reference.
  • Design Doc โ€” architecture, error policy, protocol details.
  • Deploy Guide โ€” server + extension setup.
  • SKILL.md โ€” how an AI agent should use saidkick.
  • CHANGELOG โ€” release history.

๐Ÿค Why saidkick (vs. โ€ฆ)

  • vs. Playwright / Selenium. Those spawn their own browser with a fresh profile โ€” no cookies, no logins, no browser extensions. Saidkick drives your Chrome, logged in, with the session state you already have. Trade-off: you're automating the real thing, so destructive actions are real.
  • vs. claude-in-chrome / MCP browser tools. Saidkick is self-hosted and agent-agnostic. Anything with an HTTP client can use it โ€” shell scripts, cron jobs, arbitrary Python, any LLM runtime. Not gated on a specific agent host or credential.
  • vs. raw Chrome DevTools Protocol. CDP is powerful but verbose. Saidkick wraps the patterns you actually use (locators, keyboard, screenshots, waits) behind one-line CLI commands.

๐Ÿ› ๏ธ Development

git clone https://github.com/apiad/saidkick
cd saidkick
uv sync --all-groups
uv run pytest -m "not e2e"   # unit + integration
uv run saidkick start        # hub

๐Ÿ“œ License

MIT โ€” see LICENSE if present, otherwise standard MIT applies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saidkick-0.4.2.tar.gz (128.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

saidkick-0.4.2-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file saidkick-0.4.2.tar.gz.

File metadata

  • Download URL: saidkick-0.4.2.tar.gz
  • Upload date:
  • Size: 128.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for saidkick-0.4.2.tar.gz
Algorithm Hash digest
SHA256 dd7c31a14828e92537676d92937013737992be6f35ec43f0d6e5544f5b2b19b1
MD5 5251d1adf61570283e5752a2d08bdbbe
BLAKE2b-256 da08aacff07d4180a9ff94d52bf3d3febda410c5bbd04c7bd6018c9ddc874345

See more details on using hashes here.

File details

Details for the file saidkick-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: saidkick-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for saidkick-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e95ec4fc510c9a39aba0e311ffe96a9cc16b965f30cc351e3b26456b776f14ec
MD5 d37b6668f18f4742cab53f173cc1afa8
BLAKE2b-256 fe1e8fb6b4ae74e1c483825a091a48a519bf2be285ac61264dd54250c1b7624e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page