Skip to main content

CapusQA: persona-driven LLM agent testing for macOS and web apps, served as a local MCP daemon

Project description

CapusQA

AI usability testing for real app workflows.

PyPI Python License: Apache-2.0

Package | MCP setup | Agent driver | Codex guide | Examples | Security

CapusQA lets Claude, Codex, Cursor, and other MCP-capable agents test local web apps and native macOS apps like realistic users: run persona sessions, click through workflows, file reproducible findings, and produce evidence bundles your coding agent can use to fix and verify issues.

Runs locally on 127.0.0.1. CapusQA stores artifacts, masks secrets, drives browsers or macOS windows, and does not make hidden LLM calls.

Start in 2 minutes | Recipes | Run the invoice demo | See the evidence bundle | Connect an agent

Why CapusQA

Traditional UI tests prove that selectors still work. CapusQA looks for the product failures scripted tests miss: dead controls, confusing flows, broken business rules, inconsistent copy, accessibility friction, and crashes.

Use CapusQA when you want an agent to explore the app like a user, collect evidence like a tester, and return findings a developer can reproduce.

Best for:

  • Local web apps, prototypes, dashboards, and product workflows.
  • MCP-driven testing with Claude, Codex, Cursor, or another coding agent.
  • Evidence-heavy usability, workflow, and business-rule checks.
  • Fast feedback before demos, releases, design reviews, and agent-assisted fix loops.

Not a replacement for:

  • Unit tests, API tests, or deterministic browser regression suites.
  • Production monitoring.
  • Unsupervised testing against live production accounts.

Guiding Principles

CapusQA is designed around a few constraints that make agent-driven UI testing useful, reproducible, and safe to hand to a coding agent:

  • Local-first: The daemon binds to 127.0.0.1 by default and stores run data on your machine.
  • Agent-native: Any MCP-capable coding agent can drive the same daemon, dashboard, traces, and reports.
  • Evidence-first: Findings are expected vs. observed behavior with screenshots, traces, oracle signals, and stable IDs.
  • Replayable: Traces are first-class artifacts so fixes can be checked against the workflow that found the issue.
  • No hidden reasoning: The daemon observes and acts. Your agent, or the optional runner, does the reasoning.

Quickstart

Install CapusQA with browser support:

uv tool install --python 3.12 'capusqa[browser]'
capusqa setup

Or the one-liner, which installs uv if needed and runs setup:

curl -fsSL https://raw.githubusercontent.com/DanielBirk04/capusqa/main/scripts/install.sh | sh

If you do not have uv yet:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool update-shell

Windows

CapusQA runs the web/URL testing path on Windows (native macOS-app testing is, by nature, macOS-only — its dependencies are skipped automatically). In PowerShell:

powershell -ExecutionPolicy Bypass -c "irm https://astral.sh/uv/install.ps1 | iex"
uv tool install --python 3.12 'capusqa[browser]'
capusqa setup

Or the one-liner, which installs uv if needed and runs setup:

powershell -ExecutionPolicy Bypass -c "irm https://raw.githubusercontent.com/DanielBirk04/capusqa/main/scripts/install.ps1 | iex"

Open a new terminal if capusqa is not found after installation.

capusqa setup prepares browser support, can wire supported MCP clients, and normally starts the local daemon. To start it later:

capusqa serve --open

Dashboard:

http://127.0.0.1:7777/

MCP endpoint:

http://127.0.0.1:7777/mcp

Useful commands:

capusqa doctor                 # Check local setup.
capusqa capacity               # Estimate local browser capacity.
capusqa issues                 # List stored findings.
capusqa report RUN_ID          # Write report.html, report.md, feedback.json.
capusqa agents --run-id RUN_ID # Play queued sessions; needs Codex, Claude Code, OPENAI_API_KEY, or ANTHROPIC_API_KEY.

Tutorials And Recipes

Pick the path that matches what you are trying to do:

Goal Start here
Run CapusQA for the first time Quickstart
Prove the browser pipeline works Try the invoice demo
Connect Claude, Codex, Cursor, Cline, Windsurf, VS Code, or Zed client/mcp/CONNECT.md
Teach any MCP agent how to drive CapusQA client/mcp/DRIVER.md
Use CapusQA from Codex client/codex/AGENTS.md
Test a local web app Start CapusQA, then point a run at http://127.0.0.1:<port> or a file:// URL
Test a native macOS app Read Targets, install the vision extra, and run capusqa doctor --request
Hand findings to a coding agent Generate the evidence bundle

Common first prompts:

Use CapusQA to test my local app at http://127.0.0.1:3000. Act as realistic users,
report reproducible findings, and generate the CapusQA report artifacts.
Use CapusQA to run the invoice demo in examples/invoice_web with the scenario pack
at examples/invoice_web/spec.yaml. Report every planted bug with evidence.

Try the Demo

The bundled invoice app is a fast end-to-end proof: CapusQA should find four planted product bugs and generate report artifacts for the run.

Clone the repository to use the demo files:

git clone https://github.com/DanielBirk04/capusqa.git
cd capusqa

Demo files:

Planted bugs:

  • Export PDF does nothing.
  • The promised 10 percent discount is never applied.
  • Sending an invoice confirms with the wrong message.
  • Invalid amounts are silently ignored.

Print a copy-pasteable file:// URL for the dashboard:

python3 -c 'from pathlib import Path; print(Path("examples/invoice_web/index.html").resolve().as_uri())'

Or ask a connected agent:

Use CapusQA to test examples/invoice_web/index.html with the scenario pack in
examples/invoice_web/spec.yaml. Report the findings and generate the CapusQA
report artifacts.

Source checkout only:

capusqa dev test-run --out /tmp/capusqa-invoice-web

A useful run should produce findings for dead controls, rule violations, inconsistent confirmation copy, and missing validation.

Evidence You Can Hand To A Coding Agent

Every run can produce a fix-ready evidence bundle: screenshots, traces, findings, expected vs. observed behavior, and machine-readable feedback.json for follow-up automation.

Default storage:

~/.capusqa/
  capusqa.db
  artifacts/<run-id>/
    report.html
    report.md
    feedback.json
    screenshots
    traces

Core artifacts:

Artifact Use it for
report.html Review screenshots, sessions, findings, and evidence in a browser.
report.md Share a compact developer report.
feedback.json Feed stable finding IDs, repro steps, expected vs. observed behavior, evidence, and status to a coding agent.
Traces Replay action histories and verify fixes.

Example finding shape:

{
  "id": "CAP-001",
  "kind": "rule-violation",
  "title": "Volume discount is not applied above 100 EUR",
  "expected": "Subtotal above 100 EUR applies a 10 percent discount",
  "observed": "Subtotal and total remain identical after adding qualifying items",
  "evidence": ["screenshots", "repro_trace"]
}

Set CAPUSQA_DATA_DIR or pass --data-dir to store data somewhere else.

Connect an Agent

CapusQA is built for MCP clients. Point your agent at:

http://127.0.0.1:7777/mcp

Agent-specific guides:

Claude Code and Codex users can run capusqa setup to register the same local MCP server. Claude Code also gets the optional /capusqa command menu; the main loop there is /capusqa:test, /capusqa:runs, and /capusqa:issues.

Targets

Target Use it for Setup
Web URL or file:// Local web apps, demos, parallel runs, CI-style checks capusqa[browser]; no Screen Recording or Accessibility permissions
Native macOS app Desktop workflows, AppKit/Cocoa targets, real-window testing Advanced path; requires Screen Recording and Accessibility permissions

Browser targets run in isolated Chromium contexts. Native targets use window screenshots, OCR/vision perception, and synthesized mouse and keyboard input.

For native macOS targets:

uv tool install --force --python 3.12 'capusqa[browser,vision]'
capusqa models download
capusqa doctor --request
export CAPUSQA_MACOS_EXPERIMENTAL=1
capusqa serve --open

Keep the machine free during native runs. Browser runs do not contend with your mouse.

How It Works

persona goals or scenario specs
        |
        v
MCP client or optional capusqa agents runner
        |
        v
CapusQA daemon on 127.0.0.1
        |
        +-- browser driver: isolated Chromium sessions
        +-- macOS driver: native window screenshots and input
        |
        v
dashboard, SQLite store, reports, feedback.json, replayable traces

The core loop is:

run_create -> task_claim -> session_start
           -> observe -> click/type/scroll/press/wait
           -> finding_report / checkpoint_mark / rule_mark
           -> session_end -> report_generate

The split is deliberate:

  • The client decides what a persona should try and how to interpret evidence.
  • The daemon observes, actuates, stores, masks secrets, reports, and replays.

Examples

Security and Privacy

CapusQA runs locally and binds to 127.0.0.1 by default. The dashboard and MCP server assume a localhost trust boundary.

Set CAPUSQA_DASHBOARD_TOKEN before exposing the dashboard beyond localhost. Mutating dashboard routes and sensitive reads honor it as a Bearer token when the token is set.

Credentials for test accounts live in a local SQLite vault. Fields whose names look secret, such as password, secret, token, pin, key, otp, or code, are masked in traces and reports as {{secret:...}}. Replay resolves them locally.

Use dedicated test accounts. Do not point CapusQA at production systems unless you have explicitly designed the run, data, and account permissions for that risk.

Generated reports and traces may contain app content. Attach only sanitized artifacts to public issues.

CapusQA Intelligence and CapusQA Atlas are optional retrieval and hosted-assistance features. They are off by default and require explicit environment configuration plus local consent:

capusqa intelligence status
capusqa intelligence accept
capusqa intelligence export
capusqa intelligence withdraw

Development

From a source checkout:

uv venv --python 3.12 .venv
uv pip install --python .venv/bin/python -e '.[browser]'
.venv/bin/playwright install chromium
.venv/bin/capusqa doctor
.venv/bin/capusqa serve --open

Repository map:

Path Purpose
capusqad/ Python daemon, MCP server, drivers, dashboard server, reports, and CLI.
client/ MCP prompts, connection guides, Codex guide, and Claude Code plugin assets.
examples/ Demo apps and scenario packs.
scripts/install.sh Source-checkout installer and setup helper.
pyproject.toml Package metadata, dependencies, extras, and build configuration.

Contributing

Keep contributions evidence-oriented:

  • Bug reports should include the target app, CapusQA version, install method, relevant run ID, logs or report artifacts, and expected vs. observed behavior.
  • Pull requests should include the smallest useful change plus the focused check or demo command that covers it.
  • Security-sensitive issues should not include live credentials, production data, or unredacted reports.

License

Apache-2.0. OmniParser v2 icon-detector weights are AGPL-3.0; review their license before redistributing a package or service that includes those weights.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capusqa-2.3.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

capusqa-2.3.0-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file capusqa-2.3.0.tar.gz.

File metadata

  • Download URL: capusqa-2.3.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for capusqa-2.3.0.tar.gz
Algorithm Hash digest
SHA256 db946df95cb2d6214bf38577790fb058ff29b174a4430fd3e9d7280f84530cb7
MD5 6a05eee320be9b1b0035d980dc64407a
BLAKE2b-256 6de8f864408f6062b6c529a5bc281f799d915df08a1754cd4104ef8862c8eea7

See more details on using hashes here.

Provenance

The following attestation bundles were made for capusqa-2.3.0.tar.gz:

Publisher: release.yml on DanielBirk04/capusqa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file capusqa-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: capusqa-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for capusqa-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff542cfdce60357eaff4daaf0eede8badfa6951af6dd0299ed1fd3f778c186a7
MD5 025ee35b1e20816701788da00911273f
BLAKE2b-256 d51d5eecd2a77cdb40d78f5a6fa1cf7cc007f5cec3bf7721d15da3dc307d0c61

See more details on using hashes here.

Provenance

The following attestation bundles were made for capusqa-2.3.0-py3-none-any.whl:

Publisher: release.yml on DanielBirk04/capusqa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page