Agentic QA for web apps — local or deployed
Project description
qaprobe
Agentic QA for web apps — local or deployed. Give it a URL and a plain-English user story. It drives a real browser with an LLM, records video, and has a second model independently verify the result. Every run also produces an accessibility audit for free.
Works against anything with a URL: localhost dev servers, staging environments, production. No setup on the target app — if you can open it in a browser, QAProbe can test it.
30-Second Quickstart
pip install qaprobe
qaprobe install # downloads Chromium
export ANTHROPIC_API_KEY=sk-ant-...
qaprobe run --url http://localhost:3000 \
--story "Add an item to the cart and verify the total updates"
That's it. You'll get a verdict (PASS / FAIL / INCONCLUSIVE), an HTML report, video recording, Playwright trace, and accessibility findings.
Usage
Single run
# Test your local dev server
qaprobe run --url http://localhost:3000 --story "Add an item to the cart and verify the total updates"
# Test staging before a deploy
qaprobe run --url https://staging.myapp.com --story "Log in and verify the dashboard loads"
# Test any public site
qaprobe run --url https://example.com --story "Click 'More information' and verify I land on IANA's site"
Suite runner
Define a YAML suite and run all stories at once:
# probes/myapp.yml
name: my-app
base_url: http://localhost:3000
auth:
storage_state: .auth/qa.json
macros:
login_as: "Go to /login, fill {{1}} in username, fill {{2}} in password, click Login"
stories:
- name: browse_catalog
path: /
story: "Browse to the catalog page and verify at least 3 products are listed"
- name: add_to_cart
path: /catalog
story: "Add the first product to the cart and verify the cart badge shows 1"
depends_on: browse_catalog
qaprobe suite probes/myapp.yml
The suite produces a single index.html dashboard with per-story status, step logs, a11y findings, and links to videos/traces.
Authentication
# Save login state once
qaprobe login --url https://myapp.com/login --save .auth/state.json
# Reuse it
qaprobe run --url https://myapp.com/dashboard --story "..." --auth .auth/state.json
Standalone accessibility audit
# JSON output
qaprobe a11y --url https://example.com
# HTML report
qaprobe a11y --url https://example.com --html
Record and generate stories
# Open a browser, interact, then close to generate a story
qaprobe record --url http://localhost:3000
# Append to an existing suite
qaprobe record --url http://localhost:3000 --append-to probes/myapp.yml
# Record as a deterministic critical path (no LLM needed)
qaprobe record --url http://localhost:3000 --critical-path --name checkout_flow
Critical path replay
Replay recorded paths deterministically — no LLM cost, fast, reliable:
# Replay all paths in a file
qaprobe replay probes/critical_paths.yml
# With optional LLM verification of the final state (uses Haiku, very cheap)
qaprobe replay probes/critical_paths.yml --verify
# JSON output for CI integration
qaprobe replay probes/critical_paths.yml --json-output
Continuous monitoring
Watch critical paths on a schedule and alert on failures:
# Check every 5 minutes
qaprobe watch probes/critical_paths.yml --interval 5m
# With webhook notifications on failure
qaprobe watch probes/critical_paths.yml --interval 1h --webhook https://hooks.slack.com/...
# Stop after 10 runs
qaprobe watch probes/critical_paths.yml --interval 30s --max-runs 10
Scaffold a new project
qaprobe init
# Creates probes/example.yml and .auth/ directory
CI / GitHub Actions
- uses: qaprobe/action@v1
with:
suite: probes/myapp.yml
auth-state: .auth/qa.json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
On failure, trace and video artifacts are uploaded automatically.
Baseline mode
# Save current results as the baseline
qaprobe suite probes/myapp.yml --baseline
# Future runs only fail on regressions (stories that were passing now fail)
qaprobe suite probes/myapp.yml
Architecture
qaprobe run --url <URL> --story "<story>"
│
├─ Launch headless Chromium (Playwright)
│ ├─ Video recording ON
│ └─ Tracing ON
│
├─ Agent loop (Claude Sonnet, up to 40 steps)
│ ├─ Observe: CDP Accessibility.getFullAXTree → stable element refs
│ ├─ Decide: LLM picks one tool (click, fill, select, press_key, navigate, scroll, wait, done)
│ ├─ Act: RefResolver maps ref → Playwright locator (role+name, not CSS)
│ ├─ SPA debouncing: waits for AX tree to stabilize after actions
│ └─ Repeat until done or step budget exhausted
│
├─ Verifier (Claude Opus, 1 call, fresh context)
│ ├─ Sees: story + snapshot history + step log + screenshot + agent verdict
│ └─ Returns: {goal_achieved, confidence, reasoning}
│
└─ Reconcile
├─ Both agree pass (high confidence) → PASS
├─ Both agree fail → FAIL
├─ Both agree pass but low confidence → INCONCLUSIVE
└─ Disagree → INCONCLUSIVE (needs human review)
Output per run: runs/<timestamp>/{report.json, report.html, trace.zip, video/*.webm}
Configuration
All configuration is via environment variables:
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY |
(required) | Anthropic API key |
OPENAI_API_KEY |
OpenAI API key (when using openai provider) |
|
QAPROBE_PROVIDER |
anthropic |
LLM provider: anthropic or openai |
QAPROBE_AGENT_MODEL |
claude-sonnet-4-5 |
Model for the agent loop |
QAPROBE_VERIFIER_MODEL |
claude-opus-4-5 |
Model for independent verification |
QAPROBE_FAST_MODEL |
claude-haiku-3-5 |
Fast model for simple steps (model routing) |
QAPROBE_ROUTING_THRESHOLD |
20 |
Element count below which the fast model is used |
QAPROBE_MAX_STEPS |
40 |
Maximum agent steps per run |
QAPROBE_BROWSER_TIMEOUT_MS |
30000 |
Playwright action timeout |
QAPROBE_DEBOUNCE_POLL_MS |
200 |
SPA debounce polling interval |
QAPROBE_DEBOUNCE_STABLE_MS |
500 |
AX tree stable time before snapshot |
QAPROBE_DEBOUNCE_TIMEOUT_MS |
3000 |
Maximum debounce wait |
QAPROBE_RUNS_DIR |
runs |
Directory for run artifacts |
CLI Flags
qaprobe run
--url URL to test (required)
--story Plain-English story (required)
--auth Path to storage state JSON
--max-steps Max agent steps (default: 40)
--headed Show the browser window
--runs-dir Artifact directory
--reveal-secrets Show fill values in reports (default: masked)
--no-routing Disable fast/slow model routing
qaprobe suite <file>
--auth Override suite auth config
--runs-dir Artifact directory
--headed Show the browser window
--baseline Save results as baseline
--reveal-secrets Show fill values in reports
--no-routing Disable model routing
qaprobe a11y
--url URL to audit (required)
--html Output HTML instead of JSON
--auth Path to storage state JSON
qaprobe login
--url Login page URL (required)
--save Path to save state (default: .auth/state.json)
qaprobe record
--url URL to start recording from (required)
--append-to Append generated story to a suite YAML
--critical-path Record as a deterministic critical path
--save-to Save critical path to a specific YAML file
--name Name for the critical path
qaprobe replay <file>
--auth Path to storage state JSON
--headed Show the browser window
--runs-dir Artifact directory
--verify Run LLM verifier on final state (uses Haiku)
--json-output Output results as JSON
qaprobe watch <file>
--interval Interval between runs (e.g. 30s, 5m, 1h)
--auth Path to storage state JSON
--verify Run LLM verifier on final state
--webhook URL to POST failure notifications to
--runs-dir Artifact directory
--max-runs Stop after N runs (0 = unlimited)
qaprobe init Scaffold probes/ directory
qaprobe install Install Playwright Chromium
Suite YAML Reference
name: my-app
base_url: http://localhost:3000
auth:
storage_state: .auth/state.json
allowed_origins:
- http://localhost:3000
- https://api.myapp.com
reveal_fields:
- inp:username@form
macros:
login_as: "Go to /login, fill {{1}} in username, fill {{2}} in password, click Login"
stories:
- name: story_name
path: /page
story: "Description of what should happen"
depends_on: other_story_name # optional
Why This Exists
Traditional E2E tests require selectors, test IDs, and fixture setup. They're brittle and slow to write. QAProbe uses the accessibility tree — the same structure screen readers use — so it's resilient to visual changes and produces a11y findings as a free side-effect.
| Feature | QAProbe | Traditional E2E | Other AI QA |
|---|---|---|---|
| Test language | Plain English | Code (Cypress/Playwright DSL) | Mixed |
| Target | Any URL | Requires test harness | Varies |
| Setup on target | None — just a URL | Test IDs, selectors, fixtures | Agents / SDKs |
| Observation | Accessibility tree (CDP) | DOM / CSS selectors | DOM / screenshots |
| Verification | Independent second model | Assertion code | Self-reported |
| Verdict | Three-way (pass/fail/inconclusive) | Binary | Binary |
| A11y audit | Free on every run | Separate tool | None |
| Artifacts | Video + trace + HTML report | Screenshots on failure | Varies |
| Multi-provider | Anthropic + OpenAI | N/A | Single vendor |
Examples
See the examples/ directory for suite files you can run immediately:
example_com.yml— basic tests against example.comtodomvc.yml— tests against a public TodoMVC React apphackernews.yml— tests against Hacker Newsexample_com_critical_path.yml— deterministic critical path (no LLM needed)todomvc_critical_path.yml— critical path with verification clauses
Hosted Version
A managed, hosted version of QAProbe runs on Scytala — a secure AI engineering platform. Managed browser infrastructure, team dashboards, scheduled runs, and integration with Scytala's AI debugging agents.
License
MIT — built by Scytala.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qaprobe-0.2.0.tar.gz.
File metadata
- Download URL: qaprobe-0.2.0.tar.gz
- Upload date:
- Size: 58.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f345db2b7634c1263c44ab3314c608dd35be56742c679257b44394ddb2e5eb7
|
|
| MD5 |
27d83f91094196973202b248f9c16546
|
|
| BLAKE2b-256 |
0c454d9c75eea490b26a6e1bcd9ef3d614ea06abaf66c4f84b10495ff0eb4646
|
File details
Details for the file qaprobe-0.2.0-py3-none-any.whl.
File metadata
- Download URL: qaprobe-0.2.0-py3-none-any.whl
- Upload date:
- Size: 42.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5a7b55200c5de1587ad36bcac0cb3d8e4fef854032e04a36dd1db55279213f9
|
|
| MD5 |
9115bbce0f2c78728022350fe78347ad
|
|
| BLAKE2b-256 |
69620bafef1e58dfe2c0caa644143eead2ba39d8475ab555b214c7c619940fd7
|