Skip to main content

MCP server + cross-host agent skill (Claude Code / Codex / OpenClaw / Hermes via agentskills.io) for running tests across pytest / Jest / Cypress / Go / Maestro / Schemathesis / Newman, with self-improvement analytics, bilingual EN/zh-TW QA knowledge, the v0.7 AI Visual Challenge Solver (reCAPTCHA / hCaptcha), the v0.8 OWASP API Top 10 (2023) security scanner, and the v0.9 plan + verify bookend pattern (AI 測試大師)

Project description

mk-qa-master logo

MK QA Master

AI 測試大師 — your AI QA loop, from analyze to advise.

English · 繁體中文

PyPI CI Glama score License: MIT Buy Me a Coffee

Universal MCP server for running tests across pytest / Jest / Cypress / Go, with built-in DOM analyzer, run history, and a self-improvement coach.

A Model Context Protocol server that lets Claude Desktop / Cursor / any MCP client drive your test suite end-to-end: run tests, inspect failures (screenshot + video + trace), analyze a live URL to draft test cases, and — after each run — produce a prioritized action plan telling you exactly what to fix or write next.

QA_RUNNER Framework Language Target
pytest / pytest-playwright / playwright pytest + Playwright Python Web
jest Jest JavaScript Web
cypress Cypress JavaScript Web
go / go-test go test Go Backend
maestro / mobile Maestro YAML iOS + Android
schemathesis / api Schemathesis OpenAPI 3.x / Swagger 2.0 API (since v0.6.0)
newman / postman Newman Postman collection v2.x API (since v0.6.1)

Full design notes: docs/framework.md.


What's in the box

  • Run tests across multiple frameworks (web + mobile + API) via a single MCP surface

  • Mobile via Maestro (since v0.3.0): same MCP tools, iOS Simulator / Android Emulator / real device; YAML flows; cross-platform without rewrites

  • Native API testing — two runners (since v0.6.0 / v0.6.1): two peers now share the API testing slot, each fed by the artifact your team already maintains.

    • Schemathesis (QA_RUNNER=schemathesis, since v0.6.0): point at an OpenAPI 3.x / Swagger 2.0 URL or file:// schema and get property-based fuzzed tests covering status codes, response schemas, content types, and 5xx-on-fuzz violations.
    • Newman (QA_RUNNER=newman, since v0.6.1): point at an exported Postman 2.x collection (plus optional environment / globals files) and Newman replays every request, runs the embedded pm.test(...) assertions, and returns one mk-qa-master nodeid per assertion. Newman is a system prerequisite (npm install -g newman) — it's an npm package, not pip, so it doesn't ship as a Python extra.

    Both drop into the same MCP tool surface as the web / mobile runners, and both feed the same report.json / history / flake / optimizer pipeline. Existing API tests written in pytest+httpx, Jest+supertest, Cypress cy.request(), or Go net/http/httptest still ride their existing runners — no migration needed. Pact provider verification stays on the v0.7.0 conditional roadmap.

  • Failure artifacts: screenshot (base64-inlined), video, Playwright trace.zip / Maestro recordings

  • Run history: every run snapshotted; HTML report shows a sparkline trend

  • DOM / Screen analyzeranalyze_url for web (forms / nav / dialogs / CTAs + the API endpoints the page hits) and analyze_screen for mobile (maestro hierarchy → form / cta / tab_bar modules)

  • Smart test generation (generate_test): hand it an analyzer module and it writes a runnable Playwright .py or Maestro .yaml with concrete selectors, not # TODO stubs

  • Auto-retry flakes — pytest side via pytest-rerunfailures; Maestro side via custom retry wrapper (no native --reruns); flaky tests surfaced separately from real failures

  • Self-improvement coach (get_optimization_plan): post-run analysis across three lenses — suite quality, MCP usability, AI generation effectiveness

  • JUnit XML output for CI integrations (GitHub Actions / Jenkins / GitLab)


Install

Two paths — pick the one that matches how you'll use it.

A. Run via uvx (zero install, recommended for end users)

Add mk-qa-master to your client config without installing anything globally; uv fetches and runs it in an ephemeral environment per session:

{
  "mcpServers": {
    "mk-qa-master": {
      "command": "uvx",
      "args": ["mk-qa-master"],
      "env": { "QA_RUNNER": "pytest", "QA_PROJECT_ROOT": "/path/to/your-test-project" }
    }
  }
}

That's the whole setup. First call downloads the package; subsequent calls are cached. Switching versions: uvx mk-qa-master@0.4.1 ....

B. Install into a project venv (for contributors / hacking)

pip install mk-qa-master       # or: pip install -e . from a clone
playwright install                # only if you use pytest-playwright
pip install pytest-rerunfailures  # optional, enables auto-retry

Then point your client config at the same Python interpreter:

"command": "/path/to/.venv/bin/python",
"args": ["-m", "mk_qa_master.server"]

Runner-specific prerequisites

QA_RUNNER You also need
pytest / pytest-playwright pip install pytest-playwright + playwright install chromium
jest A Node project with jest installed (npm i -D jest)
cypress A Node project with cypress installed (npm i -D cypress)
go Go toolchain on PATH
maestro Maestro CLI + a booted simulator / emulator / device (or BlueStacks reachable via adb connect)
schemathesis / api pip install 'mk-qa-master[api]' (pulls in schemathesis>=3.0,<4)
newman / postman npm install -g newman (Newman is an npm package, not pip — no extra to install)

API testing (QA_RUNNER=schemathesis)

Point the runner at any OpenAPI 3.x / Swagger 2.0 schema and Schemathesis generates property-based test cases per operation — covering response schema conformance, status code conformance, content-type checks, and 5xx-on-fuzz. Results flow through the same report.json / history / flake / optimizer pipeline as your UI tests.

End-to-end walkthrough lives in docs/walkthrough-api.md; a self-contained 3-endpoint sample lives at examples/sample_api_project/.

5-line config

"env": {
  "QA_RUNNER": "schemathesis",
  "QA_OPENAPI_URL": "https://api.example.com/openapi.json"
}

Environment variables

Variable Required Default What it does
QA_OPENAPI_URL yes OpenAPI URL. http(s)://... for live schemas, file://... for local files. Plain filesystem paths are not accepted — they need the file:// prefix.
QA_SCHEMATHESIS_CHECKS no all Comma-separated subset: response_schema_conformance,status_code_conformance,not_a_server_error,content_type_conformance,response_headers_conformance.
QA_SCHEMATHESIS_AUTH no Authorization header value. Sent as -H "Authorization: <value>". Never logged; redacted from archived reports.
QA_SCHEMATHESIS_MAX_EXAMPLES no 20 Hypothesis examples per operation. Higher = deeper fuzz, slower run.
QA_SCHEMATHESIS_DRY_RUN no 0 Set to 1 to plan-without-HTTP — useful for safety preview against production, or CI smoke against a schema-only artifact.
QA_NO_REDACT no 0 Disables secret redaction in archived reports. Default redacts Authorization: Bearer …, "password": …, "token" / "api_key" / "secret" / "access_token" / "refresh_token": ….

Standard QA_TIMEOUT_SECONDS still applies (default 600s).

API testing (QA_RUNNER=newman)

Point the runner at any exported Postman 2.x collection and Newman 6.x replays every request, runs the embedded pm.test(...) assertions, and returns one mk-qa-master "test" per assertion. Results flow through the same report.json / history / flake / optimizer pipeline as the Schemathesis and UI runners.

System prerequisite: Newman ships via npm, not pip. Install once:

npm install -g newman

There's no pip install 'mk-qa-master[postman]' extra — the runner just shells out to the newman binary on PATH. If it's missing, the runner raises a clear ImportError pointing at the npm install line.

The same 3-endpoint Library API that the OpenAPI sample targets ships as a Postman collection at examples/sample_api_project/postman-collection.json — pair it with prism mock examples/sample_api_project/openapi.yaml for a fully self-contained dev loop, or point at your own staging server.

5-line config

"env": {
  "QA_RUNNER": "newman",
  "QA_POSTMAN_COLLECTION": "/absolute/path/to/your-collection.json"
}

Environment variables

Variable Required Default What it does
QA_POSTMAN_COLLECTION yes Plain filesystem path to a Postman 2.x collection JSON. No file:// prefix — Newman doesn't need scheme disambiguation since collections are always local artifacts.
QA_POSTMAN_ENVIRONMENT no Plain path to a Postman environment file (-e <path>). Provides values for {{var_name}} placeholders in the collection.
QA_POSTMAN_GLOBALS no Plain path to a Postman globals file (-g <path>). Same shape as the environment, globally scoped.
QA_POSTMAN_ITERATIONS no 1 Replay the whole collection N times (-n <N>). Useful for soak tests and flake detection.
QA_POSTMAN_FOLDER no CSV of Postman folder names to restrict the run to (repeated --folder flags). run_failed also uses folder-scoping when failures cluster in known folders.
QA_POSTMAN_TIMEOUT_REQUEST_MS no 30000 Per-request HTTP timeout in milliseconds (--timeout-request). Distinct from QA_TIMEOUT_SECONDS, which caps the whole subprocess.
QA_NO_REDACT no 0 Same redaction policy as the Schemathesis runner — disable only for short debug sessions.

Standard QA_TIMEOUT_SECONDS still applies (default 600s).

AI Visual Challenge Solver (v0.7.0)

When backend bypass isn't an option: Claude looks at the CAPTCHA, mk-qa-master does the clicks.

Supports reCAPTCHA v2 (since v0.7.0) and hCaptcha (since v0.7.1).

The first capability in the family where the AI client's vision is load-bearing, not optional. Two new MCP tools (inspect_visual_challenge + solve_visual_challenge) detect a reCAPTCHA v2 or hCaptcha image-grid challenge on the active Playwright page, screenshot it for the multimodal AI client, accept the tile-selection the AI returns, and execute the click chain. The runner is the eyes and hands; the AI client (Claude / Cursor / Gemini / GPT-4o) is the actual solver.

When to use this — Tier 1 vs Tier 3

The built-in QA knowledge layer (get_qa_context section="CAPTCHA") codifies three tiers. Reach for them in order:

Tier Approach When
1 — bypass reCAPTCHA test keys, feature flags, IP allowlist, test-mode headers Default. Covers ~90% of cases.
2 — degrade Mark as external_dependency, skip downstream assertions When you can't change the backend but the test isn't about the CAPTCHA itself.
3 — AI visual judgment This feature. Only when 1 + 2 don't fit (client sites with authorization but no backend access, staging that mirrors prod CAPTCHA, mobile webviews where IP allowlist isn't reachable).

Consent gate

The solver does nothing until you explicitly opt in. Two env vars drive it:

Variable Required Default What it does
QA_VISUAL_CHALLENGE_CONSENT yes false Must be set to true for either tool to function. Without it, both tools return a consent_required error carrying the full legal disclaimer (the AI client surfaces this to the user).
QA_VISUAL_CHALLENGE_AUTHORIZED_DOMAINS no (recommended) Comma-separated allowlist of domains where the tool may operate. When SET, refuses any other domain. When UNSET, warn-only — proceeds but stamps the response with a warning telling you to set one. Recommended for shared CI / multi-tenant environments.
QA_VISUAL_CHALLENGE_TIMEOUT no 120 Wall-clock budget in seconds for the inspect→solve cycle. Honors QA_TIMEOUT_SECONDS as a hard ceiling.

Quick start

"env": {
  "QA_RUNNER": "pytest",
  "QA_PROJECT_ROOT": "/path/to/project",
  "QA_VISUAL_CHALLENGE_CONSENT": "true",
  "QA_VISUAL_CHALLENGE_AUTHORIZED_DOMAINS": "client-staging.example.com"
}

Then, when a run_tests call surfaces an external_dependency failure that points at a CAPTCHA, the AI client can escalate:

mk-qa-master.inspect_visual_challenge()  # screenshot + tile grid
→ AI vision picks tiles [0, 4, 7]
mk-qa-master.solve_visual_challenge(
    challenge_id="...", selected_tile_indices=[0, 4, 7], confirm=true,
)
→ status: "passed", token: "...", hint: "CAPTCHA verified. Resume your test."

Full walkthrough lives in docs/walkthrough-visual-challenge.md. PRD: docs/prd-v0.7-visual-challenge.md.

Hard-stop domains

Regardless of consent or allowlist, the solver refuses to operate on known third-party identity providers (accounts.google.com, login.microsoftonline.com, id.apple.com, facebook.com, login.live.com, etc.). No legitimate QA scenario justifies a CAPTCHA solver against someone else's login portal.

Privacy

No screenshot retention beyond the active inspect→solve cycle. Telemetry logs the boolean outcome only — never the screenshot, never the challenge text, never the tile selection. The 5-minute LRU cache holds at most 10 outstanding challenges per process and never touches disk.

Success rate caveat

The AI client's vision model does the actual judging — Claude Sonnet 4, GPT-4o, and Gemini 2.5 all ship with native vision but their accuracy on a 3x3 reCAPTCHA varies. Plan for at least one retry per challenge (reCAPTCHA gives you three before locking out). get_telemetry will eventually surface aggregate pass-rate so you can size that expectation per-client.

Scope: reCAPTCHA v2 image-grid only in v0.7.0. hCaptcha lands in v0.7.1. reCAPTCHA v3 / Cloudflare Turnstile are permanently out of scope — they don't surface a visible challenge to inspect.

OWASP API Security scanning (v0.8.0)

Schemathesis catches correctness drift. v0.8.0 adds the layer that catches the security drift hiding behind a passing schema.

v0.8.0 ships an OWASP API Security Top 10 (2023) rule-based scanner as a new MCP tool: run_api_security_scan. It loads an OpenAPI 3.x spec, walks each (path × method), and dispatches five purely-HTTP- observable rules:

OWASP # Rule Severity when triggered
API1 BOLA / IDOR — alice's token reads bob's object via path-id tampering CRITICAL
API2 Broken Authentication — server accepts alg:none, malformed, or wrong-signature JWTs MEDIUM / HIGH / CRITICAL by probe
API3 Mass Assignment — server persists dangerous extra fields like role: admin, is_verified: true HIGH
API5 Function-Level Authz — non-admin user accesses admin-shaped endpoints HIGH
API8 Security Misconfiguration — missing HSTS/CSP/X-Frame headers, wildcard CORS with credentials LOW / MEDIUM / HIGH

API4 (rate limit DoS risk), API6 (business flow modeling), API7 (SSRF callback infra), API9 (prod recon), API10 (upstream APIs) are deferred — see docs/prd-v0.8-api-security.md §3.

Consent + authorization gates

Mirrors the v0.7 visual-challenge consent model:

Variable Required What it does
QA_API_SECURITY_CONSENT yes Must be true. Without it, returns consent_required.
QA_API_SECURITY_AUTHORIZED_DOMAINS yes for external hosts Comma-separated allowlist. Localhost / 127.0.0.1 are implicitly authorized.

The mass_assignment rule mutates server state — it's excluded from default categories. Callers must opt in: categories=["headers", "broken_auth", "bola", "function_authz", "mass_assignment"].

Quick start

"env": {
  "QA_RUNNER": "pytest",
  "QA_PROJECT_ROOT": "/path/to/project",
  "QA_API_SECURITY_CONSENT": "true",
  "QA_API_SECURITY_AUTHORIZED_DOMAINS": "api.staging.example.com"
}

Then ask the AI client to scan:

mk-qa-master.run_api_security_scan(
    spec_url="https://api.staging.example.com/openapi.yaml",
    auth={
        "token": "alice's bearer token",
        "alt_user_token": "bob's bearer token",
        "bola_test_ids": {"user_a": [101, 103], "user_b": [202]}
    },
    severity_threshold="medium"
)

Returns the v0.8 security report block:

{
  "scan_id": "a3f8d1c9b7e2",
  "spec_url": "...",
  "base_url": "https://api.staging.example.com",
  "categories_run": ["headers", "broken_auth", "bola", "function_authz"],
  "rules_ran": ["OWASP-API8-Headers", "OWASP-API2-BrokenAuth", ...],
  "ops_scanned": 23,
  "severity_threshold": "medium",
  "findings": [
    {
      "rule_id": "OWASP-API1-BOLA-CrossUserDataExposure",
      "severity": "critical",
      "endpoint": "GET /orders/{id}",
      "title": "user_a can read user_b's object id=202 — missing object-level authorization check",
      "evidence": {"actor": "user_a", "target_owner": "user_b", "target_id": 202, "probed_path": "/orders/202", "status_code": 200, ...},
      "remediation_hint": "Compare the caller's identity to the object's owner before returning..."
    },
    ...
  ],
  "summary": {"total": 7, "by_severity": {"critical": 2, "high": 4, "medium": 1, "low": 0, "info": 0}}
}

The Tier 1 ground truth

examples/sample_vulnerable_api/ ships a deliberately-vulnerable Flask app where every in-scope OWASP category has a vuln/safe endpoint pair. Run it locally to see what each rule looks like in action:

cd examples/sample_vulnerable_api
pip install -r requirements.txt
python app.py  # binds 127.0.0.1:5099
# Then from another shell, point run_api_security_scan at
# http://127.0.0.1:5099 + the bundled openapi.yaml

The scanner finds all 5 categories on /vuln/* and produces zero false positives on /safe/*. That property is enforced by the Tier 1 dogfood tests on every PR.

Security note

The scanner runs adversarial test cases. Do not point it at production systems you don't own, and do not point it at any system where you don't have authorization. The two env vars above are the contract.

PRD: docs/prd-v0.8-api-security.md. The earlier v0.8 mobile attempt was parked — see docs/v0.8-mobile-postmortem.md for what we learned and how it shaped the API-security PRD's testing gates.

Use as a Claude Code / Codex / Hermes / OpenClaw skill (v0.9.0)

Same skill folder loads in four different agent hosts via the agentskills.io convention.

v0.9.0 packages mk-qa-master as a cross-host agent skill in addition to its MCP-server form. The skills/mk-qa-master/ folder is the single source of truth — the same SKILL.md, slash commands, and reference docs load into:

  • Claude Code — via .claude-plugin/plugin.json (this repo is a plugin marketplace).
  • OpenAI Codex — via .codex-plugin/plugin.json (Codex reads Claude- style marketplaces).
  • OpenClaw — install from local checkout: openclaw plugins install /path/to/mk-qa-master.
  • Hermes Agent — symlink the skill folder into ~/.hermes/skills/.

Quick install (Claude Code)

# Inside Claude Code:
/plugin marketplace add kao273183/mk-qa-master
/plugin install mk-qa-master@mk-qa-master

Restart Claude Code so the skill registers. Then any QA testing prompt auto-activates the skill — or explicitly invoke a slash command:

/mk-qa-master:run-tests login
/mk-qa-master:generate https://staging.example.com
/mk-qa-master:api-security https://api.staging.example.com/openapi.yaml

What the skill does

The skill is a single-file operating contract that teaches the host how to drive mk-qa-master's 19 MCP tools coherently. It encodes:

  • When to auto-activate — phrases like "run my tests", "why did this test fail", "scan this API for OWASP issues" trigger it.
  • Five flows — run tests / generate tests / debug failures / solve CAPTCHAs / scan APIs.
  • Hard rules — surface consent errors verbatim, don't silently re-run with relaxed filters, confirm before destructive runs.

Full reference at skills/mk-qa-master/SKILL.md.

Why a skill on top of an MCP server?

The MCP server makes the 19 tools callable by any client. The skill makes them discoverable + governed: it gives the host's skill router enough context to decide when to use the tools and which flow to follow. Inspired by microsoft/Webwright, which uses the same pattern.

Wire into Claude Desktop (legacy MCP-only path)

If you prefer the bare MCP-server wiring (no plugin/skill layer), copy examples/configs/claude_desktop_config.example.json to:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Two environment variables drive the runtime:

Variable Example What it does
QA_RUNNER pytest / jest / cypress / go / maestro / schemathesis / newman Selects which test framework
QA_PROJECT_ROOT /path/to/your/project Points at the project under test
QA_ANDROID_HOST (optional) 127.0.0.1:5555 Remote-ADB endpoint for BlueStacks / Genymotion / Nox / cloud Android. When set, the Maestro runner auto-runs adb connect <host> before each test / analyze_screen call. Requires adb on PATH.
QA_TIMEOUT_SECONDS (optional) 600 (default) Hard ceiling on any single subprocess invocation (pytest / jest / cypress / go test / maestro). Returns exit_code=124 with a [TIMEOUT…] tag in stderr when exceeded, so the AI client can react cleanly instead of hanging the MCP server forever.

Per-runner snippet

pytest-playwright:

"env": { "QA_RUNNER": "pytest", "QA_PROJECT_ROOT": "/path/to/python-project" }

Jest:

"env": { "QA_RUNNER": "jest", "QA_PROJECT_ROOT": "/path/to/node-project" }

Cypress:

"env": { "QA_RUNNER": "cypress", "QA_PROJECT_ROOT": "/path/to/cypress-project" }

Go test:

"env": { "QA_RUNNER": "go", "QA_PROJECT_ROOT": "/path/to/go-project" }

Schemathesis (API):

"env": {
  "QA_RUNNER": "schemathesis",
  "QA_OPENAPI_URL": "https://api.example.com/openapi.json"
}

Newman (Postman):

"env": {
  "QA_RUNNER": "newman",
  "QA_POSTMAN_COLLECTION": "/absolute/path/to/collection.json"
}

Other MCP clients

MCP is an open protocol — this server isn't Claude-only. The same Python process talks to any MCP client over JSON-RPC stdio. What differs across clients is (1) the config file format and (2) how reliably the underlying model auto-chains tool calls.

Client Config Format Model Tool-chain quality
Claude Desktop / Cursor ~/Library/Application Support/Claude/...json · ~/.cursor/mcp.json JSON Claude Opus / Sonnet Best tested
Codex CLI ~/.codex/config.toml TOML GPT-5 family Strong (well-trained on tool chaining)
Gemini CLI ~/.gemini/settings.json JSON Gemini 3.1 Pro / Flash Works; prefers explicit prompts ("first analyze, then write")
Cline / Continue / Zed each has its own MCP config slot varies varies depends on configured model

Example configs ship in the repo: codex-config.example.toml · gemini-config.example.json · claude_desktop_config.example.json.

Codex (TOML):

[mcp_servers.mk-qa-master]
command = "/path/to/.venv/bin/python"
args = ["-m", "mk_qa_master.server"]
cwd = "/path/to/mk-qa-master"
[mcp_servers.mk-qa-master.env]
QA_RUNNER = "pytest"
QA_PROJECT_ROOT = "/path/to/your-test-project"

Gemini (JSON, same shape as Claude Desktop):

{
  "mcpServers": {
    "mk-qa-master": {
      "command": "/path/to/.venv/bin/python",
      "args": ["-m", "mk_qa_master.server"],
      "cwd": "/path/to/mk-qa-master",
      "env": {
        "QA_RUNNER": "pytest",
        "QA_PROJECT_ROOT": "/path/to/your-test-project"
      }
    }
  }
}

Tool descriptions already nudge the recommended chains (analyze_url → generate_test, get_qa_context before generating domain tests). Clients with weaker tool-selection benefit most from explicit prompts that name the steps.


Tool surface

Shared across all runners (some tools degrade gracefully on non-pytest runners):

Tool Purpose
get_runner_info Which runner is active + all available ones
list_tests Enumerate tests in the project
run_tests Run tests (filter / headed / browser; last two pytest-playwright only)
run_failed Re-run last failures (pytest --lf)
get_test_report Summary (pass / fail / skipped / duration / flaky-in-run)
get_failure_details Per-failure message + screenshot / trace / video paths
generate_test Test skeleton; with module from analyze_url/analyze_screen, a runnable one (Playwright .py or Maestro .yaml)
auto_generate_tests One-shot: analyze URL → generate one test per discovered module
codegen Launch Playwright codegen (web) / hint to maestro studio (mobile)
generate_html_report Render the latest run as self-contained HTML
get_test_history Last N archived run summaries (for trend / flake debugging)
analyze_url Web: DOM probe → modules + selectors + candidate TCs + API endpoints + layout overflow warnings
analyze_screen Mobile: maestro hierarchy → form / cta / tab_bar modules + candidate TCs (noise-filtered)
init_qa_knowledge / get_qa_context Scaffold + read the project's QA knowledge layer (methodology + domain). Bilingual since v0.6.2 — methodology ships in English by default (QA_LANG=en) or Traditional Chinese (QA_LANG=zh-tw); same 13 sections in both, the four newest cover API testing methodology, flakiness root-cause taxonomy, test doubles (mock / stub / fake / spy), and test data management. Domain example: docs/qa-knowledge-en.example.md (zh-TW: docs/qa-knowledge.example.md).
get_optimization_plan Three-layer self-improvement coach (suite / MCP / AI strategy)
inspect_visual_challenge / solve_visual_challenge v0.7.0 AI Visual Challenge Solver — detect a reCAPTCHA v2 image-grid challenge, screenshot it, accept the AI client's tile selection, execute the click chain. Gated by QA_VISUAL_CHALLENGE_CONSENT=true + per-call confirm=true. See the dedicated section above.
run_api_security_scan v0.8.0 OWASP API Security Top 10 (2023) rule-based scanner — load an OpenAPI 3.x spec, walk path × method, dispatch 5 in-scope rules (API1 BOLA, API2 Broken Auth, API3 Mass Assignment, API5 Function-Level Authz, API8 Misconfig). Gated by QA_API_SECURITY_CONSENT=true + QA_API_SECURITY_AUTHORIZED_DOMAINS. See the dedicated section above.

Resources

URI What
report://html Live-rendered HTML report (dark mode, self-contained)
report://json Raw pytest-json-report JSON
report://optimization Latest optimization-plan.md

Self-improvement loop

After every run, _archive_report() snapshots report.json into test-results/history/ and writes a fresh optimization-plan.md covering:

  1. Suite quality — outcomes string per test (PFPFP); transitions → flake score; 3+ identical-signature fails → broken; rerun-passed → flaky-in-run
  2. MCP usability — top tools, error rates, repeat-arg patterns, common A→B chains (from telemetry JSONL logs)
  3. AI strategy — adoption rate of generate_test outputs, coverage gaps from analyze_url modules with no matching test files

The plan emits prioritized actions (high / medium / low) each with target + evidence + suggestion + optional auto_action_hint the MCP client can chain into the next tool call.


Project layout

mk-qa-master/
├── pyproject.toml
├── src/mk_qa_master/
│   ├── server.py            # MCP entry (tool routing + telemetry wrap)
│   ├── config.py            # Paths + env vars
│   ├── runners/             # Per-framework plugins
│   │   ├── base.py          # TestRunner abstract interface
│   │   ├── pytest_playwright.py
│   │   ├── jest.py
│   │   ├── cypress.py
│   │   └── go_test.py
│   ├── reporters/
│   │   └── html.py          # Self-contained HTML render
│   └── tools/               # Thin shims + analyzer + optimizer + telemetry
└── tests_project/           # Example project under test

Adding a runner

  1. Create src/mk_qa_master/runners/your_runner.py, subclass TestRunner, implement the abstract methods
  2. Register the name in runners/__init__.py's REGISTRY
  3. Done

End-to-end workflow

The intended pipeline — from a URL to "what should I improve next time":

flowchart LR
    URL[URL] -->|analyze_url| MOD[modules<br/>+ candidate TCs<br/>+ API endpoints]
    MOD -->|generate_test<br/>module=...| TEST[tests/test_*.py<br/>runnable skeleton]
    TEST -->|run_tests| RES[report.json<br/>+ screenshots<br/>+ trace.zip<br/>+ junit.xml]
    RES -->|auto archive| HIST[history/ snapshot]
    RES -->|generate_html_report| HTML[HTML report<br/>self-contained]
    HIST -->|auto write| PLAN[optimization-plan.md]
    PLAN -.->|next session reads| URL

The loop is the point: every run feeds the optimizer, the optimizer points at the weakest link, the next run hits that link first.

Walkthrough — testing a login page

In a Claude / Cursor session:

You: 分析 https://shop.example/login,幫我寫對應測試

Claude: [analyze_url] Found 1 form (email_password_form_0) + 3 API endpoints. 5 candidate TCs. [generate_test with the form module] Wrote tests/test_login.py — runnable with concrete selectors, no # TODO stubs.

You: 跑

Claude: [run_tests] 23 passed, 0 failed in 31s. Screenshots + step traces captured for every test.

You: 下一步該做什麼?

Claude: [opens report://optimization] Top: tests/test_login.py::test_invalid_credentials is flaky (flake_score=0.4, outcomes=PFPFP). Suggestion: add wait_for_response('/api/login') before asserting the error message.

The three optimizer lenses (suite quality / MCP usability / AI generation effectiveness) make every "下一步" answer data-driven, not gut feel.

Walkthrough — testing a mobile app (Maestro)

Same shape, different runner. Requires Maestro CLI installed + simulator/emulator booted + your app launched (or pass launch_app=true with app_id):

You: 分析 your mobile app 首頁的條碼按鈕、寫對應測試

Claude: [analyze_screen(app_id="com.example.app", launch_app=true)] Found 15 interactive elements; matched barcode_button (text="條碼", resource_id="barcodeButton") + candidate TCs. [generate_test with the cta module] Wrote maestro-flows/test_barcode.yamltapOn: { id: barcodeButton } + waitForAnimationToEnd + takeScreenshot, ready to maestro test.

You: 跑

Claude: [run_tests] 5 flows pass, retry didn't fire. Screenshots embedded in HTML report.

You: 上面這個按鈕有時候會 fail、為什麼?

Claude: [get_optimization_plan] barcode_button::barcode_button flagged flaky (flake_score=0.4, outcomes=PFPFP, rerun_count=1). Suggestion: 加 waitForAnimationToEndextendedWaitUntil 等動畫穩定後再 tap。

Mobile-specific notes:

  • The same qa-knowledge.md (built-in methodology + your domain) feeds both web and mobile runs — write your business rules once.
  • analyze_screen filters out iOS status bar (signal / wifi / battery) and asset-name labels (bg_*, *_filled); the result is signal-heavy.
  • Maestro's takeScreenshot: <name> directive controls which screens show up as inline images in the HTML report.

Prompting cookbook

Each row shows a phrase you can paste into a Claude / Cursor session and the underlying MCP tool call it should trigger. Use as a reference for "how do I get the AI to do X without naming the tool myself."

One-time setup

You say Claude calls
"Initialize the QA knowledge file." init_qa_knowledge → writes qa-knowledge.md to your project root
"Show me the current QA knowledge." get_qa_context → methodology + your domain sections
"Open the ISTQB principles section." get_qa_context(section="ISTQB")

Day-to-day testing

You say Claude calls
"Run all tests." run_tests
"Run only login-related tests." run_tests(filter="login")
"Re-run just the failures." run_failed
"Show me the summary." get_test_report
"Which ones failed? Give me screenshots and trace." get_failure_details
"Generate the HTML report." generate_html_report

Building tests from a URL (web)

You say Claude calls
"Auto-generate tests for https://shop.example/." auto_generate_tests(url=...) — one-shot
"Analyze https://shop.example/coupon first, then write one test per module." analyze_urlgenerate_test × N
"Analyze coupon page and write a regression test for our past idempotency bug." get_qa_context(section="Bug")analyze_urlgenerate_test(business_context=...)
"Just record a checkout flow as a baseline." codegen(url=...)

Building tests from a mobile screen (Maestro)

Requires QA_RUNNER=maestro, Maestro CLI, and a booted simulator/emulator/device.

You say Claude calls
"Analyze the current your mobile app screen and write a test for the barcode button." analyze_screen(app_id="com.example.app", launch_app=true)generate_test(module=<cta>)
"Test the login form on this app." analyze_screen(launch_app=true) → pick form module → generate_test
"Cover the tab bar — write one flow per tab." analyze_screen → take the tab_bar module → generate_test
"Use Maestro Studio to record a flow." codegen(url=...) returns a hint pointing at maestro studio (record + save manually)

BlueStacks / remote Android instances: set QA_ANDROID_HOST=127.0.0.1:5555 (or whatever host:port BlueStacks exposes — see Settings → Advanced → Android Debug Bridge). The Maestro runner will adb connect before each test and analyze_screen, and bumps the hierarchy timeout to 60s to absorb the slower TCP-ADB path. Genymotion / Nox / LDPlayer / WSA work the same way; any host:port that responds to adb connect is fine.

Continuous improvement

You say Claude calls
"What should I fix next?" get_optimization_plan
"Has test_login_invalid been flaky lately?" get_test_history + plan lookup
"Why did it fail? Show me the trace." get_failure_details (returns screenshot/trace/video paths)

Tips — getting Claude to pick the right tool

  • Mention QA knowledge explicitly — "reference qa knowledge when testing coupon" pushes Claude to call get_qa_context first; saying just "test coupon" may skip it.
  • State the order — "analyze first, then write" forces analyze_url before generate_test; "just write a test for X" skips analysis.
  • Batch vs precise — "auto-generate the whole page" → auto_generate_tests; "write one test per candidate_tc" → manual chain.
  • Failure debugging — Asking "why did it fail / show me the screenshot" reliably triggers get_failure_details (which now returns screenshot + trace + video paths).

Anti-patterns

  • ❌ "Run it 5 times to see if it's flaky" — the runner has auto-retry + history; just ask "is it flaky" and let get_optimization_plan answer.
  • ❌ "Generate 100 tests" — noise > signal. Use get_optimization_plan first to find what's missing.
  • ❌ "Test all edge cases" — too vague. Phrase as "test every candidate_tc for this form" — concrete, bounded, traceable.

Sample outputs

analyze_url (excerpt)

{
  "url": "https://shop.example/login",
  "page_title": "Login",
  "module_count": 3,
  "modules": [
    {
      "kind": "form",
      "name": "email_password_form_0",
      "selectors": {
        "container": "#login",
        "fields": [
          {"label": "Email", "selector": "#email", "type": "email", "required": true},
          {"label": "Password", "selector": "#password", "type": "password", "required": true}
        ],
        "submit": "button[type='submit']"
      },
      "candidate_tcs": [
        "所有必填欄位為空時送出,應顯示必填錯誤",
        "Email 欄位填入格式錯誤的字串(無 @),應顯示格式錯誤",
        "Password 欄位輸入後應預設遮蔽(type=password)",
        "全部填入合法值後送出,應觸發成功流程"
      ]
    }
  ],
  "api_endpoints": [
    {
      "method": "POST",
      "path": "/api/login",
      "status": 401,
      "candidate_tcs": [
        "POST /api/login payload 缺必填欄位應回 400 + 欄位錯誤訊息",
        "POST /api/login 合法 payload 應回 2xx",
        "POST /api/login 缺少 auth header 應回 401/403"
      ]
    }
  ]
}

generate_test output (smart, with module)

"""
Login happy path

Auto-generated from analyze_url module: email_password_form_0 (kind=form)
"""
from playwright.sync_api import Page, expect


def test_login(page: Page):
    page.goto('https://shop.example/login')
    page.locator('#email').fill('test@example.com')
    page.locator('#password').fill('TestPass123!')
    page.locator("button[type='submit']").click()
    # TC: Email 欄位填入格式錯誤的字串(無 @),應顯示格式錯誤
    # TC: Password 欄位輸入後應預設遮蔽
    # TC: 正確 Email + 正確密碼 → 導向 dashboard
    # TODO: 補上實際斷言,例如:
    # expect(page).to_have_url(...)
    # expect(page.get_by_text("成功")).to_be_visible()

optimization-plan.md (excerpt)

# Optimization Plan — 2026-05-12T14:03:40

_Based on 6 archived runs._

## Prioritized Actions

### 1. 🔴 HIGH — flaky
- **Target**: `tests/test_login.py::test_invalid_credentials`
- **Evidence**: flake_score=0.4, outcomes=PFPFP, rerun_count=1
- **Suggestion**: 加 explicit wait(wait_for_response / locator wait)

### 2. 🟡 MEDIUM — coverage_gap
- **Target**: `register_form`
- **Evidence**: 由 analyze_url 偵測但 repo 內找不到對應 test_*.py
- **Suggestion**: `call generate_test(description="...", filename="test_register_form.py")`

HTML report

Open the live rendered demo → (served via GitHub Pages — clicking the link in GitHub's UI to sample_report.html would only show source).

The demo shows the stats grid, trend sparkline, failure cards with embedded screenshots + step lists, and the collapsed Passed section.


Integrations

mk-qa-master doesn't bundle third-party SDKs — it stays a pure test-execution + analysis layer. Real QA workflows are composed by running multiple MCP servers side-by-side in the same client config; Claude orchestrates the chain across servers. There's no MCP-to-MCP RPC — each server is independent, the AI client is the conductor.

The pairings below are the ones that complete the loop most often:

Pair with Why Example chain
Atlassian MCP (JIRA + Confluence) Auto-open bug tickets from failures; sync optimization-plan.md to a team Confluence page run_testsget_failure_detailsatlassian.createJiraIssue (attaches screenshot + trace path)
Slack MCP Notify channels on failure, share the rendered HTML report, mention oncall for flaky tests generate_html_reportslack.send_message(channel="#qa-bots", attachments=...)
GitHub MCP Read PR description / linked issues for business context before generating tests; post results back as PR comments github.get_pull_requestanalyze_urlgenerate_test(business_context=PR body)github.create_issue_comment
Sentry MCP Production errors drive regression priority: top crashes → matching regression tests sentry.list_issues(sort="frequency")generate_test(business_context=stack trace)run_tests
Filesystem MCP Read a shared qa-knowledge.md or TC source files that live outside QA_PROJECT_ROOT (monorepos, multi-project setups) filesystem.read_file("~/shared/qa-knowledge.md")init_qa_knowledge

Honorable mention — Google Drive MCP: pairs with Google-Sheet-based TC management (read TCs from a sheet → generate_test → write status back).

Composing in your client config

All five run as separate processes alongside mk-qa-master:

{
  "mcpServers": {
    "mk-qa-master": { "command": "python", "args": ["-m", "mk_qa_master.server"], "env": { "QA_RUNNER": "maestro" } },
    "atlassian":       { "command": "npx", "args": ["-y", "@atlassian/mcp"] },
    "slack":           { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-slack"] },
    "github":          { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"] }
  }
}

Then a single prompt walks the chain:

"Run the checkout suite. For each failure, open a JIRA in project QA with the RIDER format and the screenshot attached. Post the HTML report to #qa-bots when done."

Why this matters: mk-qa-master stays focused on the test loop (analyze → generate → run → coach). JIRA / Slack / Sentry are entire domains with their own dedicated servers — bolting them into this one would dilute the scope, duplicate auth handling, and force every user to inherit dependencies they may not want.

本 repo 不打包任何第三方 SDK——維持「測試執行 + 分析」單一職責。實務上 QA 工作流是多個 MCP server 並存、由 Claude 編排跨 server 的 tool chain達成的。範例配套:JIRA / Slack / GitHub / Sentry / Filesystem 各自獨立 MCP server,配上 mk-qa-master 拼出完整測試管線。


Publishing (maintainer-only)

Releases ship to PyPI via Trusted Publishing — no API tokens stored in the repo. The flow:

  1. Bump version = "x.y.z" in pyproject.toml (via a normal PR — main is branch-protected).
  2. After merge, tag main and push:
    git tag -a vX.Y.Z -m "vX.Y.Z — short summary"
    git push origin vX.Y.Z
    
  3. Create a GitHub Release for that tag (gh release create vX.Y.Z ...).
  4. The release event fires .github/workflows/publish.yml → builds sdist + wheel → uploads to PyPI.

One-time PyPI setup (must be done once before the first publish works):

  • Sign in at https://pypi.org → enable 2FA.
  • Project page → Settings → Publishing → add a pending publisher with:
    • Owner: kao273183
    • Repository: mk-qa-master
    • Workflow filename: publish.yml
    • Environment name: pypi

After the first successful run, PyPI auto-promotes the pending publisher to a trusted one and subsequent releases authenticate via OIDC.

The workflow refuses to publish if the release tag doesn't match pyproject.version, which catches "tagged but forgot to bump" mistakes before they hit PyPI.


Support the project ☕

mk-qa-master is built and maintained solo on nights and weekends. If it saved you time or shaped how your team thinks about AI-driven QA, a coffee keeps the late-night Maestro debugging sessions going:

Buy Me a Coffee

Your support funds: keeping this repo free + actively maintained, more device variants for Maestro testing (real iPhones / Android tablets / BlueStacks), recorded tutorials for the QA community, and the next 2am bug hunt.

No ads, no sponsorships, no enterprise upsell — just the work.


Contributing

This repo is maintained solo. Ideas and bug reports are very welcome — please open an Issue or start a Discussion. I read every one and will implement what fits the project's direction.

External pull requests are auto-closed. Not because contributions aren't appreciated, but because keeping the codebase coherent under a single voice matters more here than the throughput a multi-contributor model would bring. If you really want a specific change, an Issue describing the problem gets you further than a PR.

本 repo 由我一人維護。歡迎透過 Issue / Discussion 提想法或回報問題,我會親自評估並實作。外部 PR 會自動關閉——不是不歡迎貢獻,而是想保持程式碼風格與走向一致。


License

MIT © 2026 Jack Kao — see LICENSE (中文翻譯參考: LICENSE.zh-TW.md; the English version is authoritative).

In plain English: you can use this for anything (personal projects, commercial work, modifications, redistribution). The only ask is that you keep the copyright + license notice in any copy you ship. There's no warranty — use at your own risk.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mk_qa_master-0.9.5.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mk_qa_master-0.9.5-py3-none-any.whl (187.4 kB view details)

Uploaded Python 3

File details

Details for the file mk_qa_master-0.9.5.tar.gz.

File metadata

  • Download URL: mk_qa_master-0.9.5.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mk_qa_master-0.9.5.tar.gz
Algorithm Hash digest
SHA256 45e3a19e4d61919a60f65f33fbe059e2b68a6924b35c376a2bdf6facf403867c
MD5 d8b3ab16cfa3b74c2395d15e1c52e27e
BLAKE2b-256 77e4899319318bea87e86f0d35f2440cbdca1b530532d03a5ce088f0b53be829

See more details on using hashes here.

Provenance

The following attestation bundles were made for mk_qa_master-0.9.5.tar.gz:

Publisher: publish.yml on kao273183/mk-qa-master

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mk_qa_master-0.9.5-py3-none-any.whl.

File metadata

  • Download URL: mk_qa_master-0.9.5-py3-none-any.whl
  • Upload date:
  • Size: 187.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mk_qa_master-0.9.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b1348eaba5bbce73c200d2631e972c78019fe78834cc71de617f10ed9981adbd
MD5 f3ffefff18e810e7883a0c000966fc35
BLAKE2b-256 6c36180513a8342ad24baa1fe2bd7eff60c47d5ab3a1a5e5ef037b2339ba0873

See more details on using hashes here.

Provenance

The following attestation bundles were made for mk_qa_master-0.9.5-py3-none-any.whl:

Publisher: publish.yml on kao273183/mk-qa-master

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page