Skip to main content

Agentic QA for web apps — local or deployed

Project description

qaprobe

Agentic QA for web apps — local or deployed. Give it a URL and a plain-English user story. It drives a real browser with an LLM, records video, and has a second model independently verify the result. Every run also produces an accessibility audit for free.

Works against anything with a URL: localhost dev servers, staging environments, production. No setup on the target app — if you can open it in a browser, QAProbe can test it.

30-Second Quickstart

pip install qaprobe
qaprobe install          # downloads Chromium

export ANTHROPIC_API_KEY=sk-ant-...

qaprobe run --url http://localhost:3000 \
  --story "Add an item to the cart and verify the total updates"

That's it. You'll get a verdict (PASS / FAIL / INCONCLUSIVE), an HTML report, video recording, Playwright trace, and accessibility findings.

Usage

Single run

# Test your local dev server
qaprobe run --url http://localhost:3000 --story "Add an item to the cart and verify the total updates"

# Test staging before a deploy
qaprobe run --url https://staging.myapp.com --story "Log in and verify the dashboard loads"

# Test any public site
qaprobe run --url https://example.com --story "Click 'More information' and verify I land on IANA's site"

Suite runner

Define a YAML suite and run all stories at once:

# probes/myapp.yml
name: my-app
base_url: http://localhost:3000
auth:
  storage_state: .auth/qa.json

macros:
  login_as: "Go to /login, fill {{1}} in username, fill {{2}} in password, click Login"

stories:
  - name: browse_catalog
    path: /
    story: "Browse to the catalog page and verify at least 3 products are listed"

  - name: add_to_cart
    path: /catalog
    story: "Add the first product to the cart and verify the cart badge shows 1"
    depends_on: browse_catalog
qaprobe suite probes/myapp.yml

The suite produces a single index.html dashboard with per-story status, step logs, a11y findings, and links to videos/traces.

Authentication

# Save login state once
qaprobe login --url https://myapp.com/login --save .auth/state.json

# Reuse it
qaprobe run --url https://myapp.com/dashboard --story "..." --auth .auth/state.json

Standalone accessibility audit

# JSON output
qaprobe a11y --url https://example.com

# HTML report
qaprobe a11y --url https://example.com --html

Record and generate stories

# Open a browser, interact, then close to generate a story
qaprobe record --url http://localhost:3000

# Append to an existing suite
qaprobe record --url http://localhost:3000 --append-to probes/myapp.yml

# Record as a deterministic critical path (no LLM needed)
qaprobe record --url http://localhost:3000 --critical-path --name checkout_flow

Critical path replay

Replay recorded paths deterministically — no LLM cost, fast, reliable:

# Replay all paths in a file
qaprobe replay probes/critical_paths.yml

# With optional LLM verification of the final state (uses Haiku, very cheap)
qaprobe replay probes/critical_paths.yml --verify

# JSON output for CI integration
qaprobe replay probes/critical_paths.yml --json-output

Continuous monitoring

Watch critical paths on a schedule and alert on failures:

# Check every 5 minutes
qaprobe watch probes/critical_paths.yml --interval 5m

# With webhook notifications on failure
qaprobe watch probes/critical_paths.yml --interval 1h --webhook https://hooks.slack.com/...

# Stop after 10 runs
qaprobe watch probes/critical_paths.yml --interval 30s --max-runs 10

Scaffold a new project

qaprobe init
# Creates probes/example.yml and .auth/ directory

CI / GitHub Actions

- uses: qaprobe/action@v1
  with:
    suite: probes/myapp.yml
    auth-state: .auth/qa.json
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

On failure, trace and video artifacts are uploaded automatically.

Baseline mode

# Save current results as the baseline
qaprobe suite probes/myapp.yml --baseline

# Future runs only fail on regressions (stories that were passing now fail)
qaprobe suite probes/myapp.yml

Architecture

qaprobe run --url <URL> --story "<story>"
    │
    ├─ Launch headless Chromium (Playwright)
    │   ├─ Video recording ON
    │   └─ Tracing ON
    │
    ├─ Agent loop (Claude Sonnet, up to 40 steps)
    │   ├─ Observe: CDP Accessibility.getFullAXTree → stable element refs
    │   ├─ Decide: LLM picks one tool (click, fill, select, press_key, navigate, scroll, wait, done)
    │   ├─ Act: RefResolver maps ref → Playwright locator (role+name, not CSS)
    │   ├─ SPA debouncing: waits for AX tree to stabilize after actions
    │   └─ Repeat until done or step budget exhausted
    │
    ├─ Verifier (Claude Opus, 1 call, fresh context)
    │   ├─ Sees: story + snapshot history + step log + screenshot + agent verdict
    │   └─ Returns: {goal_achieved, confidence, reasoning}
    │
    └─ Reconcile
        ├─ Both agree pass (high confidence) → PASS
        ├─ Both agree fail → FAIL
        ├─ Both agree pass but low confidence → INCONCLUSIVE
        └─ Disagree → INCONCLUSIVE (needs human review)

Output per run: runs/<timestamp>/{report.json, report.html, trace.zip, video/*.webm}

Configuration

All configuration is via environment variables:

Variable Default Description
ANTHROPIC_API_KEY (required) Anthropic API key
OPENAI_API_KEY OpenAI API key (when using openai provider)
QAPROBE_PROVIDER anthropic LLM provider: anthropic or openai
QAPROBE_AGENT_MODEL claude-sonnet-4-5 Model for the agent loop
QAPROBE_VERIFIER_MODEL claude-opus-4-5 Model for independent verification
QAPROBE_FAST_MODEL claude-haiku-3-5 Fast model for simple steps (model routing)
QAPROBE_ROUTING_THRESHOLD 20 Element count below which the fast model is used
QAPROBE_MAX_STEPS 40 Maximum agent steps per run
QAPROBE_BROWSER_TIMEOUT_MS 30000 Playwright action timeout
QAPROBE_DEBOUNCE_POLL_MS 200 SPA debounce polling interval
QAPROBE_DEBOUNCE_STABLE_MS 500 AX tree stable time before snapshot
QAPROBE_DEBOUNCE_TIMEOUT_MS 3000 Maximum debounce wait
QAPROBE_RUNS_DIR runs Directory for run artifacts

CLI Flags

qaprobe run
  --url              URL to test (required)
  --story            Plain-English story (required)
  --auth             Path to storage state JSON
  --max-steps        Max agent steps (default: 40)
  --headed           Show the browser window
  --runs-dir         Artifact directory
  --reveal-secrets   Show fill values in reports (default: masked)
  --no-routing       Disable fast/slow model routing

qaprobe suite <file>
  --auth             Override suite auth config
  --runs-dir         Artifact directory
  --headed           Show the browser window
  --baseline         Save results as baseline
  --reveal-secrets   Show fill values in reports
  --no-routing       Disable model routing

qaprobe a11y
  --url              URL to audit (required)
  --html             Output HTML instead of JSON
  --auth             Path to storage state JSON

qaprobe login
  --url              Login page URL (required)
  --save             Path to save state (default: .auth/state.json)

qaprobe record
  --url              URL to start recording from (required)
  --append-to        Append generated story to a suite YAML
  --critical-path    Record as a deterministic critical path
  --save-to          Save critical path to a specific YAML file
  --name             Name for the critical path

qaprobe replay <file>
  --auth             Path to storage state JSON
  --headed           Show the browser window
  --runs-dir         Artifact directory
  --verify           Run LLM verifier on final state (uses Haiku)
  --json-output      Output results as JSON

qaprobe watch <file>
  --interval         Interval between runs (e.g. 30s, 5m, 1h)
  --auth             Path to storage state JSON
  --verify           Run LLM verifier on final state
  --webhook          URL to POST failure notifications to
  --runs-dir         Artifact directory
  --max-runs         Stop after N runs (0 = unlimited)

qaprobe init           Scaffold probes/ directory
qaprobe install        Install Playwright Chromium

Suite YAML Reference

name: my-app
base_url: http://localhost:3000

auth:
  storage_state: .auth/state.json

allowed_origins:
  - http://localhost:3000
  - https://api.myapp.com

reveal_fields:
  - inp:username@form

macros:
  login_as: "Go to /login, fill {{1}} in username, fill {{2}} in password, click Login"

stories:
  - name: story_name
    path: /page
    story: "Description of what should happen"
    depends_on: other_story_name  # optional

Why This Exists

Traditional E2E tests require selectors, test IDs, and fixture setup. They're brittle and slow to write. QAProbe uses the accessibility tree — the same structure screen readers use — so it's resilient to visual changes and produces a11y findings as a free side-effect.

Feature QAProbe Traditional E2E Other AI QA
Test language Plain English Code (Cypress/Playwright DSL) Mixed
Target Any URL Requires test harness Varies
Setup on target None — just a URL Test IDs, selectors, fixtures Agents / SDKs
Observation Accessibility tree (CDP) DOM / CSS selectors DOM / screenshots
Verification Independent second model Assertion code Self-reported
Verdict Three-way (pass/fail/inconclusive) Binary Binary
A11y audit Free on every run Separate tool None
Artifacts Video + trace + HTML report Screenshots on failure Varies
Multi-provider Anthropic + OpenAI N/A Single vendor

Examples

See the examples/ directory for suite files you can run immediately:

Hosted Version

A managed, hosted version of QAProbe runs on Scytala — a secure AI engineering platform. Managed browser infrastructure, team dashboards, scheduled runs, and integration with Scytala's AI debugging agents.

License

MIT — built by Scytala.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qaprobe-0.2.0.tar.gz (58.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qaprobe-0.2.0-py3-none-any.whl (42.7 kB view details)

Uploaded Python 3

File details

Details for the file qaprobe-0.2.0.tar.gz.

File metadata

  • Download URL: qaprobe-0.2.0.tar.gz
  • Upload date:
  • Size: 58.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for qaprobe-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7f345db2b7634c1263c44ab3314c608dd35be56742c679257b44394ddb2e5eb7
MD5 27d83f91094196973202b248f9c16546
BLAKE2b-256 0c454d9c75eea490b26a6e1bcd9ef3d614ea06abaf66c4f84b10495ff0eb4646

See more details on using hashes here.

File details

Details for the file qaprobe-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: qaprobe-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 42.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for qaprobe-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c5a7b55200c5de1587ad36bcac0cb3d8e4fef854032e04a36dd1db55279213f9
MD5 9115bbce0f2c78728022350fe78347ad
BLAKE2b-256 69620bafef1e58dfe2c0caa644143eead2ba39d8475ab555b214c7c619940fd7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page