Skip to main content

OS-level macOS app testing via MCP — native apps, Electron apps, browsers, web pages, system dialogs

Project description

Klyk

⚠️ What klyk is — and what it isn't

Read this before you install.

klyk gives an AI agent real, OS-level control of your Mac — the same input a human has. It moves the cursor, fires keystrokes, and clicks what's on screen. In its default mode it does this invisibly and autonomously: no cursor movement you can watch, no confirmation prompts — acting on its own once the agent decides to.

This is powerful, and it is dangerous. Be clear-eyed about what that means:

  • It can take real, irreversible actions. A click is a click. klyk can press Buy, Send, Confirm Transfer, Delete, or Sign just as you could. The only guardrails are a window-bounds check and a text instruction asking the agent to confirm first — an instruction the agent can ignore. There is no sandbox and no spending limit.
  • It runs with your full user privileges. Anything you can do on your Mac, klyk can do. It does not isolate itself or drop privileges.
  • It is a prompt-injection target. If the agent driving klyk also reads untrusted content — a web page, an email, a document — a malicious instruction hidden there can become real clicks and keystrokes on your machine. "Reads the web" + "controls the Mac" is the high-risk combination. Run klyk only with an agent and a workflow you trust.
  • It relies on an undocumented Apple API. Invisible input uses Apple's private SkyLight framework. Apple does not support or guarantee it; a macOS update can change or break it without notice (klyk falls back to a visible cursor when it can).

What klyk is, honestly: an early, experimental, open-source tool built by a solo author — a business student, not a professional developer — working with AI, in good faith. It has not had an independent professional security audit. Treat it as unproven: it works in the cases its author has tested, and has not been verified end-to-end on every path.

No warranty. Use at your own risk. klyk is provided "as is" under the MIT license, with no warranty of any kind. You are responsible for what the agent does on your machine. Don't point it at anything — money, accounts, irreplaceable files — you aren't willing to have an autonomous agent touch.

OS-level macOS app testing for AI agents. Click real buttons, type real keys, see what actually rendered. Native apps, Electron apps, browsers, web pages, system dialogs — anything visible on the screen.

Status: Portfolio project. Showcases product thinking and shipped tooling. Bug reports won't be actively triaged. Well-scoped PRs are welcome — but expect a slow review cadence.


The problem

AI assistants are increasingly asked to test, validate, or operate desktop apps end-to-end. Today they can't. Existing automation tools either require deep app instrumentation (XCUITest, Appium) or simulate user input at a layer too brittle to be trusted (pixel-only click frameworks, headless DOM scrapers). The result: agents that can write apps faster than ever, but can't verify they actually work.

Klyk closes that gap. It gives an AI agent the same input channel a human has — real cursor moves via Apple's CoreGraphics API, real keystrokes posted to the HID event tap, real composited screenshots — and a clean MCP interface to drive it. The agent observes, decides, acts, verifies. The way a human would.

What it does

> screenshot the app, then click "Sign in"
[ Klyk takes a real screenshot via CoreGraphics, returns it + the AX tree ]
[ click_element finds "Sign in" via accessibility, falls back to OCR, then to template match ]
[ Real click fires through the HID event tap — indistinguishable from a human pressing the button ]

A flat, MECE tool surface across observation, interaction, evaluation, session management, and system operations. Three-tier click targeting (AX → on-device OCR → pixel template) so a label is reachable regardless of how the app exposes it. Cross-app drag, right-click-then-select, and multilingual OCR are all first-class. Per-call latency + reasoning-gap metrics so the agent can self-pace. Best-effort AX folded into screenshots so most tasks finish in one round-trip.

Install

Not yet on PyPI. The block below is the target experience. The code is written but not yet verified end-to-end on the current build (recent changes include removing the HTTP bridge). Treat klyk as experimental and unproven — see the warning above and the Disclaimer.

pipx install klyk        # isolated install — recommended
klyk install

Use pipx (or uv tool install klyk), not bare pip. Klyk pulls a modern NumPy; installing it into your global Python can clash with other packages pinned to older versions. pipx/uv give klyk its own environment while still putting klyk and klyk-call on your PATH — same commands, zero blast radius. Plain pip install klyk works only if you want it in the current environment and accept that risk.

klyk install is a turnkey first-run flow:

  1. Adds Klyk to ~/.claude.json (so it appears in every Claude Code session).
  2. Walks you through granting the two macOS permissions Klyk needs — opens the exact System Settings panes for Accessibility and Screen Recording, waits for you to add your terminal app, then verifies the grant actually came through before continuing.
  3. Runs a final klyk doctor pass to confirm every piece is green.
  4. Detects every other AI client on your Mac and offers to wire them all in one confirmation — permissions carry over, so it's a single prompt, not a setup pass per client.

For clients that read a natural-language context file (Gemini CLI's GEMINI.md), install also offers — opt-in, defaults to no — to add a short, clearly-marked klyk note there, so the agent can fall back to the klyk-call shell if its own MCP ever fails to surface klyk. It never edits that file without an explicit yes, merges around your existing content, and uninstall removes it.

Restart Claude Code (or whichever MCP client you use) and klyk is live. Try inspect Finder to see it in action. To wire every detected client up front in one shot: klyk install --all.

Troubleshooting: klyk doctor. Run it any time something's off. Reports every dependency, permission, and config grant klyk needs as ✓ / ⚠ / ✗ with the exact next step on anything that's not green. --json gives a structured payload for tooling.

One klyk per Mac. Klyk claims an exclusive fcntl.flock on ~/.klyk/server.lock at startup. If you configure klyk in two MCP clients (e.g. Claude Code and Cursor) and both are running, the second to launch exits cleanly with a "another klyk is already running" message — preventing duplicate menu-bar items and interleaved input collisions. Close one client to free the lock.

Use with other MCP clients

klyk install <client> auto-configures any supported client — same turnkey flow as Claude (writes the config, grants permissions, runs a health check):

klyk install cursor      # or: windsurf · continue · cline · codex · gemini · antigravity (agy)
klyk install --list      # show every supported client and its config path
Client Config file
Claude Code ~/.claude.json (the default: klyk install)
Cursor ~/.cursor/mcp.json
Windsurf ~/.codeium/windsurf/mcp_config.json
Continue ~/.continue/config.json
Cline VS Code globalStorage cline_mcp_settings.json
OpenAI Codex CLI ~/.codex/config.toml
Gemini CLI ~/.gemini/settings.json
Antigravity CLI (agy) ~/.gemini/antigravity-cli/mcp_config.json

Any other MCP client works too — klyk speaks MCP natively. Add this entry to its config wherever it lives — use the full path to the Python klyk is installed in as command (run python -c "import sys;print(sys.executable)" in that env; klyk install fills this in automatically). A bare python3 only works if klyk is in your global Python:

{ "mcpServers": { "klyk": { "command": "/path/to/python", "args": ["-m", "klyk.mcp_server"] } } }

Permissions, the singleton lock, and klyk doctor work identically regardless of which client launches klyk.

Drive klyk from any AI (no MCP integration required)

MCP support varies a lot between agent harnesses — some gate it, some implement it incompletely, some don't have it. So klyk ships an extra front door that works even when a client's own MCP plumbing doesn't. It reaches the same persistent klyk session.

klyk-call — one shell command, any tool. For any agent that can run a shell command (great for smaller models — no handshake to reason about):

klyk-call --list                              # all tools + their parameter names
klyk-call --schema inspect                    # full JSON schema for one tool
klyk-call --tool inspect --app Finder         # call any tool
klyk-call --tool screenshot --app Finder      # screenshot → saved to disk, path returned
echo '{"tool":"screen_info","args":{}}' | klyk-call --batch   # many calls, one session

Vision over the shell. When a tool returns a screenshot (screenshot, inspect, verdict, image-producing run steps), klyk-call writes the PNG to ~/.klyk/captures/ and returns its saved_path instead of dumping base64 — so the agent views the capture with its own image reader (Claude Code's file read, Gemini's @path, etc.) and keeps the same observe→act→verify loop the native MCP transport has, with no context-flooding payload. The cache keeps the most recent 20 captures.

Quick example

screenshot(app="Google Chrome")
# → returns the image plus the AX element list

run(app="Google Chrome", actions=[
    {"tool": "click", "x": 580, "y": 389},
    {"tool": "fill_field", "x": 580, "y": 389, "text": "This video is incredible!"},
    {"tool": "screenshot"}
])
# → executes the full sequence at OS speed, returns one batched response

verdict(app="Google Chrome", test_description="Posted a comment on a YouTube video")
# → returns final screenshot + logs + grading criteria for the agent to synthesize PASS/FAIL

Why these specific trade-offs

The interesting product decisions weren't tools to build but tools to not build:

  • No third-party computer-use libraries. Everything runs through Apple's CoreGraphics, Vision, and Accessibility frameworks via Python's ctypes. Keeps the dependency footprint tiny and the failure surface predictable.
  • No visual grounding models. UI-TARS and OmniParser would have closed the "no AX, no text" gap — at the cost of a multi-GB model download and 1–2s per call. Rejected. Template matching covers the same case at 30ms with zero ML dependencies.
  • No Chrome DevTools Protocol. It would have helped only Chromium browsers and broken the "like a human" model. Skipped in favor of forcing the renderer-accessibility flag, which gets the entire web AX tree for free.
  • macOS only. Cross-platform compromises every primitive. The honest framing: ship one OS well rather than three OSes badly.

Every tool is designed against the same set of failure modes — ambiguity, accidental retries, token bloat, lost reactivity from batching, and so on. The agent-facing contract for each tool lives in its description field in klyk/mcp_server.py.

What's inside

Module Purpose
mcp_server.py MCP server, tool definitions, dispatch
session.py Per-app session registry, auto-launch, template cache
computer.py CoreGraphics input synthesis (click, drag, keyboard, scroll, AX)
capture.py CoreGraphics screenshot capture (in-memory primary, screencapture fallback)
launcher.py App launch with browser-aware AX flag injection
ocr.py Apple Vision OCR (two-pass: fast then accurate)
matcher.py Pure-NumPy template matching (FFT + integral-image NCC) with template cache support
grader.py, reporter.py Verdict + UI grading helpers
keycodes.py, logs.py Low-level support

For the full tool reference and behavior contracts, see the tool description fields in klyk/mcp_server.py. For how the internals are shaped and why, see ARCHITECTURE.md.

Security model & trust scope

Klyk is a thin pipe between the agent and the OS. Its trust model is straightforward:

  • Local only. As of this release, klyk makes no network calls — your screen contents and inputs stay on your machine. Screenshots, OCR results, AX labels, and tool responses never leave your machine via klyk.
  • macOS permissions are the consent surface. Accessibility and Screen Recording must be granted explicitly via System Settings; klyk doctor shows the current state.
  • The agent controls every action. Klyk doesn't decide what to click or type — it executes what the agent asks. Run klyk only with agents you trust to act on your behalf.
  • Stderr from launched apps is captured for the verdict payload. klyk attempts to scrub common credential patterns (passwords, API keys, JWTs, AWS keys, bearer tokens) on a best-effort basis — it cannot catch every format, so do not rely on it as your only safeguard.
  • Private framework usage. Klyk uses Apple's private SkyLight framework for invisible input delivery (the "autonomous" mode). This is fine for CLI / PyPI distribution but is the reason klyk can't ship via the Mac App Store.
  • Reporting a vulnerability. See SECURITY.md.

License

MIT — see LICENSE.

Disclaimer — No Warranty

klyk is provided "AS IS", without warranty of any kind, express or implied, including merchantability, fitness for a particular purpose, and non-infringement (see LICENSE). To the maximum extent permitted by law, the author is not liable for any damage, data loss, financial loss, account action, privacy exposure, or other harm arising from the use, misuse, or malfunction of klyk — whether caused by the software, the AI agent driving it, or a third-party dependency or macOS framework it relies on. Its safety measures (credential scrubbing, bounds checks, the emergency-stop chord, confirm-destructive flags) are best-effort and agent-cooperative only — not a guarantee, and not to be relied upon as your sole safeguard. You run klyk at your own risk and are solely responsible for what you connect it to and what it does.


Designed and shipped using AI as implementation partner. The product decisions, scope choices, and trade-offs are mine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

klyk-0.1.0.tar.gz (199.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

klyk-0.1.0-py3-none-any.whl (210.2 kB view details)

Uploaded Python 3

File details

Details for the file klyk-0.1.0.tar.gz.

File metadata

  • Download URL: klyk-0.1.0.tar.gz
  • Upload date:
  • Size: 199.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for klyk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 79f7836d8113f63dfc1155b6079c9cfb0601c5232ca99f4baec62aaa3c82f19a
MD5 5655ec6b69dc78de9edbb709c14eb907
BLAKE2b-256 fed7b6519c5c673f6f8b2b516135434fda70fe14ade4b39504d491219392c95b

See more details on using hashes here.

Provenance

The following attestation bundles were made for klyk-0.1.0.tar.gz:

Publisher: publish-pypi.yml on legetdev/klyk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file klyk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: klyk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 210.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for klyk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 53f4acc078098b1948d46842db3c62597f82db4ac32a5b6710da754931d9081f
MD5 f91034c27815281eaffe73c0a64881bb
BLAKE2b-256 bd2b231087815f9829ca93772da48fe458ebf6042bea1eacbcce08043f8661c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for klyk-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on legetdev/klyk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page