Agentic QA for web apps — local or deployed

These details have not been verified by PyPI

Project links

Project description

qaprobe

Agentic QA for web apps — local or deployed. Give it a URL and a plain-English user story. It drives a real browser with an LLM, records video, and has a second model independently verify the result. Every run also produces an accessibility audit for free.

Works against anything with a URL: localhost dev servers, staging environments, production. No setup on the target app — if you can open it in a browser, QAProbe can test it.

30-Second Quickstart

pip install qaprobe
qaprobe install          # downloads Chromium

export ANTHROPIC_API_KEY=sk-ant-...

qaprobe run --url http://localhost:3000 \
  --story "Add an item to the cart and verify the total updates"

That's it. You'll get a verdict (PASS / FAIL / INCONCLUSIVE), an HTML report, video recording, Playwright trace, and accessibility findings.

Usage

Single run

# Test your local dev server
qaprobe run --url http://localhost:3000 --story "Add an item to the cart and verify the total updates"

# Test staging before a deploy
qaprobe run --url https://staging.myapp.com --story "Log in and verify the dashboard loads"

# Test any public site
qaprobe run --url https://example.com --story "Click 'More information' and verify I land on IANA's site"

Suite runner

Define a YAML suite and run all stories at once:

# probes/myapp.yml
name: my-app
base_url: http://localhost:3000
auth:
  storage_state: .auth/qa.json

macros:
  login_as: "Go to /login, fill {{1}} in username, fill {{2}} in password, click Login"

stories:
  - name: browse_catalog
    path: /
    story: "Browse to the catalog page and verify at least 3 products are listed"

  - name: add_to_cart
    path: /catalog
    story: "Add the first product to the cart and verify the cart badge shows 1"
    depends_on: browse_catalog

qaprobe suite probes/myapp.yml

The suite produces a single index.html dashboard with per-story status, step logs, a11y findings, and links to videos/traces.

Authentication

# Save login state once
qaprobe login --url https://myapp.com/login --save .auth/state.json

# Reuse it
qaprobe run --url https://myapp.com/dashboard --story "..." --auth .auth/state.json

Standalone accessibility audit

# JSON output
qaprobe a11y --url https://example.com

# HTML report
qaprobe a11y --url https://example.com --html

Record and generate stories

# Open a browser, interact, then close to generate a story
qaprobe record --url http://localhost:3000

# Append to an existing suite
qaprobe record --url http://localhost:3000 --append-to probes/myapp.yml

# Record as a deterministic critical path (no LLM needed)
qaprobe record --url http://localhost:3000 --critical-path --name checkout_flow

Critical path replay

Replay recorded paths deterministically — no LLM cost, fast, reliable:

# Replay all paths in a file
qaprobe replay probes/critical_paths.yml

# With optional LLM verification of the final state (uses Haiku, very cheap)
qaprobe replay probes/critical_paths.yml --verify

# JSON output for CI integration
qaprobe replay probes/critical_paths.yml --json-output

Continuous monitoring

Watch critical paths on a schedule and alert on failures:

# Check every 5 minutes
qaprobe watch probes/critical_paths.yml --interval 5m

# With webhook notifications on failure
qaprobe watch probes/critical_paths.yml --interval 1h --webhook https://hooks.slack.com/...

# Stop after 10 runs
qaprobe watch probes/critical_paths.yml --interval 30s --max-runs 10

Scaffold a new project

qaprobe init
# Creates probes/example.yml and .auth/ directory

CI / GitHub Actions

- uses: qaprobe/action@v1
  with:
    suite: probes/myapp.yml
    auth-state: .auth/qa.json
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

On failure, trace and video artifacts are uploaded automatically.

Baseline mode

# Save current results as the baseline
qaprobe suite probes/myapp.yml --baseline

# Future runs only fail on regressions (stories that were passing now fail)
qaprobe suite probes/myapp.yml

Architecture

qaprobe run --url <URL> --story "<story>"
    │
    ├─ Launch headless Chromium (Playwright)
    │   ├─ Video recording ON
    │   └─ Tracing ON
    │
    ├─ Agent loop (Claude Sonnet, up to 40 steps)
    │   ├─ Observe: CDP Accessibility.getFullAXTree → stable element refs
    │   ├─ Decide: LLM picks one tool (click, fill, select, press_key, navigate, scroll, wait, done)
    │   ├─ Act: RefResolver maps ref → Playwright locator (role+name, not CSS)
    │   ├─ SPA debouncing: waits for AX tree to stabilize after actions
    │   └─ Repeat until done or step budget exhausted
    │
    ├─ Verifier (Claude Opus, 1 call, fresh context)
    │   ├─ Sees: story + snapshot history + step log + screenshot + agent verdict
    │   └─ Returns: {goal_achieved, confidence, reasoning}
    │
    └─ Reconcile
        ├─ Both agree pass (high confidence) → PASS
        ├─ Both agree fail → FAIL
        ├─ Both agree pass but low confidence → INCONCLUSIVE
        └─ Disagree → INCONCLUSIVE (needs human review)

Output per run: runs/<timestamp>/{report.json, report.html, trace.zip, video/*.webm}

Configuration

All configuration is via environment variables:

Variable	Default	Description
`ANTHROPIC_API_KEY`	(required)	Anthropic API key
`OPENAI_API_KEY`		OpenAI API key (when using `openai` provider)
`QAPROBE_PROVIDER`	`anthropic`	LLM provider: `anthropic` or `openai`
`QAPROBE_AGENT_MODEL`	`claude-sonnet-4-5`	Model for the agent loop
`QAPROBE_VERIFIER_MODEL`	`claude-opus-4-5`	Model for independent verification
`QAPROBE_FAST_MODEL`	`claude-haiku-3-5`	Fast model for simple steps (model routing)
`QAPROBE_ROUTING_THRESHOLD`	`20`	Element count below which the fast model is used
`QAPROBE_MAX_STEPS`	`40`	Maximum agent steps per run
`QAPROBE_BROWSER_TIMEOUT_MS`	`30000`	Playwright action timeout
`QAPROBE_DEBOUNCE_POLL_MS`	`200`	SPA debounce polling interval
`QAPROBE_DEBOUNCE_STABLE_MS`	`500`	AX tree stable time before snapshot
`QAPROBE_DEBOUNCE_TIMEOUT_MS`	`3000`	Maximum debounce wait
`QAPROBE_RUNS_DIR`	`runs`	Directory for run artifacts

CLI Flags

qaprobe run
  --url              URL to test (required)
  --story            Plain-English story (required)
  --auth             Path to storage state JSON
  --max-steps        Max agent steps (default: 40)
  --headed           Show the browser window
  --runs-dir         Artifact directory
  --reveal-secrets   Show fill values in reports (default: masked)
  --no-routing       Disable fast/slow model routing

qaprobe suite <file>
  --auth             Override suite auth config
  --runs-dir         Artifact directory
  --headed           Show the browser window
  --baseline         Save results as baseline
  --reveal-secrets   Show fill values in reports
  --no-routing       Disable model routing

qaprobe a11y
  --url              URL to audit (required)
  --html             Output HTML instead of JSON
  --auth             Path to storage state JSON

qaprobe login
  --url              Login page URL (required)
  --save             Path to save state (default: .auth/state.json)

qaprobe record
  --url              URL to start recording from (required)
  --append-to        Append generated story to a suite YAML
  --critical-path    Record as a deterministic critical path
  --save-to          Save critical path to a specific YAML file
  --name             Name for the critical path

qaprobe replay <file>
  --auth             Path to storage state JSON
  --headed           Show the browser window
  --runs-dir         Artifact directory
  --verify           Run LLM verifier on final state (uses Haiku)
  --json-output      Output results as JSON

qaprobe watch <file>
  --interval         Interval between runs (e.g. 30s, 5m, 1h)
  --auth             Path to storage state JSON
  --verify           Run LLM verifier on final state
  --webhook          URL to POST failure notifications to
  --runs-dir         Artifact directory
  --max-runs         Stop after N runs (0 = unlimited)

qaprobe init           Scaffold probes/ directory
qaprobe install        Install Playwright Chromium

Suite YAML Reference

name: my-app
base_url: http://localhost:3000

auth:
  storage_state: .auth/state.json

allowed_origins:
  - http://localhost:3000
  - https://api.myapp.com

reveal_fields:
  - inp:username@form

macros:
  login_as: "Go to /login, fill {{1}} in username, fill {{2}} in password, click Login"

stories:
  - name: story_name
    path: /page
    story: "Description of what should happen"
    depends_on: other_story_name  # optional

Why This Exists

Traditional E2E tests require selectors, test IDs, and fixture setup. They're brittle and slow to write. QAProbe uses the accessibility tree — the same structure screen readers use — so it's resilient to visual changes and produces a11y findings as a free side-effect.

Feature	QAProbe	Traditional E2E	Other AI QA
Test language	Plain English	Code (Cypress/Playwright DSL)	Mixed
Target	Any URL	Requires test harness	Varies
Setup on target	None — just a URL	Test IDs, selectors, fixtures	Agents / SDKs
Observation	Accessibility tree (CDP)	DOM / CSS selectors	DOM / screenshots
Verification	Independent second model	Assertion code	Self-reported
Verdict	Three-way (pass/fail/inconclusive)	Binary	Binary
A11y audit	Free on every run	Separate tool	None
Artifacts	Video + trace + HTML report	Screenshots on failure	Varies
Multi-provider	Anthropic + OpenAI	N/A	Single vendor

Examples

See the examples/ directory for suite files you can run immediately:

example_com.yml — basic tests against example.com
todomvc.yml — tests against a public TodoMVC React app
hackernews.yml — tests against Hacker News
example_com_critical_path.yml — deterministic critical path (no LLM needed)
todomvc_critical_path.yml — critical path with verification clauses

Hosted Version

A managed, hosted version of QAProbe runs on Scytala — a secure AI engineering platform. Managed browser infrastructure, team dashboards, scheduled runs, and integration with Scytala's AI debugging agents.

License

MIT — built by Scytala.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qaprobe-0.2.0.tar.gz (58.0 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qaprobe-0.2.0-py3-none-any.whl (42.7 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file qaprobe-0.2.0.tar.gz.

File metadata

Download URL: qaprobe-0.2.0.tar.gz
Upload date: May 5, 2026
Size: 58.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for qaprobe-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`7f345db2b7634c1263c44ab3314c608dd35be56742c679257b44394ddb2e5eb7`
MD5	`27d83f91094196973202b248f9c16546`
BLAKE2b-256	`0c454d9c75eea490b26a6e1bcd9ef3d614ea06abaf66c4f84b10495ff0eb4646`

See more details on using hashes here.

File details

Details for the file qaprobe-0.2.0-py3-none-any.whl.

File metadata

Download URL: qaprobe-0.2.0-py3-none-any.whl
Upload date: May 5, 2026
Size: 42.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for qaprobe-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5a7b55200c5de1587ad36bcac0cb3d8e4fef854032e04a36dd1db55279213f9`
MD5	`9115bbce0f2c78728022350fe78347ad`
BLAKE2b-256	`69620bafef1e58dfe2c0caa644143eead2ba39d8475ab555b214c7c619940fd7`

See more details on using hashes here.

qaprobe 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

qaprobe

30-Second Quickstart

Usage

Single run

Suite runner

Authentication

Standalone accessibility audit

Record and generate stories

Critical path replay

Continuous monitoring

Scaffold a new project

CI / GitHub Actions

Baseline mode

Architecture

Configuration

CLI Flags

Suite YAML Reference

Why This Exists

Examples

Hosted Version

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes