Skip to main content

AI-powered diagnostic for failing Playwright tests โ€” multi-LLM, RGPD-friendly, open source. Detects root cause in seconds.

Project description

๐Ÿš€ QA Autopilot โ€” AI Diagnostic for Playwright Test Failures

pytest plugin โ€” Real-time AI diagnosis of Playwright test failures

Multi-LLM (OpenAI ยท Anthropic ยท Ollama ยท DeepSeek) ยท GDPR by default ยท Open source

Python PyPI Playwright LLM pytest License Lines


Installation โ€ข Quick Start โ€ข Scorecard โ€ข How it works โ€ข Configuration


๐ŸŽฏ The problem

A Playwright test fails. The error message says:

TimeoutError: Page.click: Timeout 5000ms exceeded.
waiting for locator("a[href='/international/']")

The selector is correct. The element exists. So why isn't it working?

Because the cookie banner is covering everything. Or the element is inside an iframe. Or the DOM was reloaded via AJAX. Or the button is disabled. Or you used click() instead of dblclick().

QA Autopilot tells you the real cause in a single command.


๐Ÿ“Š Scorecard

Results on a suite of 7 trap tests specifically designed to fool diagnostic tools:

Test Trap AI Diagnosis Category Confidence
๐Ÿซฃ Cookie overlay Element covered by banner โœ… Cookie banner blocks the click element_obscured ๐ŸŸข 95%
๐Ÿ–ผ๏ธ Invisible iframe Element inside iframe, searched in main frame โœ… Missing iframe context iframe_context ๐ŸŸข 95%
๐Ÿ‘ป Stale AJAX Locator captured before DOM reload โœ… Stale reference after AJAX stale_reference ๐ŸŸข 95%
โ†ช๏ธ Silent redirect URL redirected 301/302 โœ… Test PASSED (trap detected) โ€” โœ…
๐Ÿšซ Disabled button Element visible but disabled โœ… Disabled attribute detected element_disabled ๐ŸŸข 95%
๐Ÿ”ค Unicode regex Zinedine vs Zinรฉdine โœ… Regex accent mismatch encoding_mismatch ๐ŸŸข 95%
๐Ÿซฃ Double-click Consent manager intercepts click โœ… Overlay detected element_obscured ๐ŸŸข 95%

6/6 correct diagnoses at 95% confidence โ€” the 7th test PASSED (no diagnosis needed).


โš ๏ธ Limitations

[!CAUTION] Tests over 200 lines: the context sent to the AI is intentionally truncated. An E2E test should stay short โ€” one scenario, one responsibility, under 50 lines. Beyond that, it's a design problem, not a diagnostic problem. Refactor your tests before looking for the cause of a failure.

๐Ÿ“ฆ Installation

QA Autopilot is a native Python module โ€” available directly on PyPI:

pip install qa-autopilot

That's it. No config, no server, no account. One line.

With automatic .env loading:

pip install qa-autopilot[dotenv]

Or from source:

git clone https://github.com/julienmerconsulting/qa-autopilot.git
cd qa-autopilot
pip install -e .

Prerequisites

playwright install chromium

.env configuration

Create a .env file at the root of your project:

# OpenAI (default)
OPENAI_API_KEY=sk-...

# Or DeepSeek
BASE_URL=https://api.deepseek.com
API_KEY=sk-...
QA_MODEL=deepseek-chat

# Or local Ollama (zero cost)
BASE_URL=http://localhost:11434/v1
API_KEY=ollama
QA_MODEL=llama3

โšก Quick Start

pytest mode (recommended)

Add a single flag to your pytest command:

pytest tests/ --qa-autopilot -v

That's it. Every failing test gets an automatic AI diagnosis.

With HTML report

pytest tests/ --qa-autopilot --html=qa-reports/report.html --self-contained-html -v

Standalone mode

python -m qa_autopilot tests/test_checkout.py
python -m qa_autopilot tests/test_login.py::test_auth
python -m qa_autopilot tests/ -k "checkout" --headed

Direct import mode

from qa_autopilot import QAInterceptor

# Inside your test
interceptor = QAInterceptor(page)
interceptor.start()

# ... your test ...

# On failure
diagnosis = interceptor.diagnose(error_message, "test_file.py")
print(diagnosis["root_cause"])
print(diagnosis["category"])

๐Ÿ” How it works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                YOUR PLAYWRIGHT TEST                  โ”‚
โ”‚                                                      โ”‚
โ”‚   page.goto("https://example.com")                  โ”‚
โ”‚   page.click("#submit")           โ† FAIL            โ”‚
โ”‚   expect(page).to_have_url(...)                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚    QA AUTOPILOT HOOK      โ”‚
         โ”‚   (listens in parallel)   โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ–ผ                  โ–ผ                  โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  DOM   โ”‚      โ”‚ NETWORK  โ”‚      โ”‚ CONSOLE   โ”‚
โ”‚Listenerโ”‚      โ”‚ Capture  โ”‚      โ”‚ Capture   โ”‚
โ”‚  (JS)  โ”‚      โ”‚ req/res  โ”‚      โ”‚ err/warn  โ”‚
โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚                โ”‚                   โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚   CONTEXT BUNDLE      โ”‚
         โ”‚  code + error + DOM   โ”‚
         โ”‚  + network + console  โ”‚
         โ”‚  + screenshot (opt)   โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚    ONE PROMPT โ†’ AI    โ”‚
         โ”‚   (12 categories)     โ”‚
         โ”‚   diagnosis + fix     โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ–ผ                โ–ผ                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚Terminalโ”‚    โ”‚   JSON    โ”‚    โ”‚  Jira    โ”‚
โ”‚ Output โ”‚    โ”‚  Report   โ”‚    โ”‚ (if bug) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

5-step pipeline

  1. Transparent hook โ€” Plugs into the Playwright page via native events
  2. Parallel capture โ€” DOM (injected JS listener), network, console, screenshots
  3. Failure detection โ€” The pytest hook intercepts the FAILED status
  4. Bundle + Prompt โ€” All context goes out in ONE AI call
  5. Diagnosis โ€” Root cause + category + concrete fix + JSON report

๐Ÿท๏ธ The 12 diagnosis categories

Icon Category Description
๐ŸŽฏ wrong_selector Broken, missing, or too-broad selector
โญ๏ธ missing_step Missing step (cookies, goto, dropdown)
โฑ๏ธ timing Race condition, element not ready yet
๐Ÿซฃ element_obscured Element covered by overlay/modal/banner
๐Ÿšซ element_disabled Element found but disabled
๐Ÿ”€ wrong_action Wrong method (click vs dblclick, fill vs type)
๐Ÿ–ผ๏ธ iframe_context Element searched in the wrong frame
๐Ÿ”ค encoding_mismatch Unicode/accent/regex issue
๐Ÿ‘ป stale_reference Stale locator after DOM change
๐Ÿ“Š test_data Assertion with wrong expected value
๐Ÿ› app_bug Application bug (not the test) โ†’ generates a Jira ticket
๐ŸŒ network Failed network requests (4xx/5xx)

โš™๏ธ Configuration

Environment variables

Variable Default Description
OPENAI_API_KEY (required if no API_KEY) OpenAI API key
API_KEY (optional) Key for alternative provider (DeepSeek, Ollamaโ€ฆ)
BASE_URL None (native OpenAI) LLM provider base URL
QA_MODEL gpt-4.1-mini AI model to use
QA_SCREENSHOT 0 1 to include screenshots in the prompt
QA_REPORT_DIR qa-reports/ Reports directory
QA_REDACT_INPUTS 1 Auto-redaction of sensitive fields (password, credit card, tokens, IBANโ€ฆ). 0 to disable (not recommended).

pytest arguments

pytest tests/ --qa-autopilot          # Enable AI diagnosis
pytest tests/ --qa-autopilot --headed # With visible browser
pytest tests/ --qa-autopilot -k "login" # Filter by keyword

๐Ÿ“ Report structure

qa-reports/
โ”œโ”€โ”€ summary_20260223_014751.json          # Consolidated run report
โ”œโ”€โ”€ diag_test_broken_20260223_014713.json # Individual diagnosis
โ”œโ”€โ”€ diag_test_broken_20260223_014659.json
โ”œโ”€โ”€ jira_test_broken_20260223_014659.md   # Jira ticket (if app_bug)
โ””โ”€โ”€ report.html                           # pytest HTML report

Consolidated report example

[
  {
    "test": "test_element_covered_by_overlay[chromium]",
    "category": "element_obscured",
    "confidence": 0.95,
    "root_cause": "The target element is covered by the cookie banner",
    "suggested_fix": "Close the cookie banner before clicking"
  }
]

๐Ÿ”’ Security & GDPR

QA Autopilot automatically redacts sensitive data before any LLM call. This protection is enabled by default (QA_REDACT_INPUTS=1) and operates on two fronts:

1. Browser-side redaction (DOM listener)

Values typed into sensitive fields are intercepted inside the browser, in the saveEntry() function of the JS listener, and replaced with [REDACTED] before any storage. The real value never leaves the browser.

Detection criteria Examples
HTML type password, email, tel
name/id contains password, passwd, pwd, secret, token, cvv, card, ssn, auth, pin, api_key, credit, iban, bic, swift, client_secret
placeholder / aria-label contains same patterns
autocomplete current-password, new-password, cc-* (credit card)

2. Source code redaction (before LLM call)

The test .py file is also scanned and hardcoded credentials are redacted:

  • page.fill("#password", "...") / .type() / .press_sequentially() / .input_value() on sensitive selectors
  • Python variables: password = "...", token = "...", api_key = "...", client_secret = "...", access_token = "...", etc.
  • os.environ["PASSWORD"] = "..." (direct assignment)

When a redaction is applied, qa-autopilot prints a warning:

โš ๏ธ  Hardcoded credentials detected in test_login.py, redacted before LLM call.
    Best practice: use os.environ or pytest fixtures for secrets.

What does the LLM see?

CAPTURED DOM ACTIONS (3 total)
  1. INPUT โœ… #username = 'john.doe@example.com'
  2. INPUT โœ… #password = [REDACTED โ€” sensitive field]
  3. CLICK โœ… button[type="submit"] (text: 'Login')

TEST CODE
def test_login(page):
    page.fill("#username", "john.doe@example.com")
    page.fill("#password", "[REDACTED]")
    page.click("button[type='submit']")

The LLM is explicitly informed in the prompt that [REDACTED] does not mean an empty or broken field, but a GDPR protection. The diagnosis is performed without knowing the real value.

How to verify redaction works?

# Run a test that types a password
pytest tests/test_login.py --qa-autopilot

# Check the JSON report โ€” you should see [REDACTED] everywhere
grep -i "password\|REDACTED" qa-reports/diag_*.json

For the ultra-paranoid: intercept outgoing traffic with mitmproxy and verify that requests to api.openai.com never contain your real sensitive value.

Data sent to the LLM

Data Sent Redactable
Test source code (3000 chars max) โœ… โœ… by default (regex on fill/assign/env)
Playwright error message โœ… โŒ
Element selectors โœ… โŒ
Input values (DOM listener) โœ… โœ… by default (6-criteria cascade)
Page URL โœ… โŒ
Console errors โœ… โŒ
4xx/5xx request bodies (truncated) โœ… โŒ
Screenshots only if QA_SCREENSHOT=1 n/a

Disabling redaction (not recommended)

For the rare cases where the content of a "sensitive" field is legitimately useful to see (false positive on a field name containing auth but not actually authentication-related):

QA_REDACT_INPUTS=0 pytest tests/ --qa-autopilot

โš ๏ธ Use only with fictional test data. When in doubt, keep redaction enabled. Redaction is not an excuse to hardcode credentials: it doesn't catch every exotic case (variables with invented names, concatenated values, etc.). The golden rule remains: never hardcode secrets.


๐Ÿ—๏ธ Architecture

qa-autopilot/
โ”œโ”€โ”€ qa_autopilot/
โ”‚   โ”œโ”€โ”€ __init__.py          # Public exports
โ”‚   โ”œโ”€โ”€ core.py              # QAInterceptor + capture
โ”‚   โ”œโ”€โ”€ prompt.py            # Prompt v2 (12 categories)
โ”‚   โ”œโ”€โ”€ diagnose.py          # AI call + retry + JSON mode
โ”‚   โ”œโ”€โ”€ reporter.py          # JSON + Jira markdown reports
โ”‚   โ”œโ”€โ”€ listener.js          # DOM listener (browser injection)
โ”‚   โ””โ”€โ”€ plugin.py            # pytest hooks
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_traps.py        # Trap test suite
โ”‚   โ””โ”€โ”€ conftest.py
โ”œโ”€โ”€ examples/
โ”‚   โ””โ”€โ”€ standalone.py        # Direct usage example
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ README.md

Note: The current version is a monolithic qa_autopilot.py file (~600 lines). The structure above is the target for v2.


๐Ÿ†š Why not the alternatives?

QA Autopilot Playwright MCP (23K lines) SaaS (Testim, Mabl...)
Lines of code ~600 23,000+ Closed
Installation pip install MCP server + config Account + license
Config 1 flag 32 MCP tools Dashboard + integration
Price Free + OpenAI key Free $200-500/month/user
Diagnosis 12 categories, 95% Basic Variable
Vendor lock-in None MCP protocol Total

๐Ÿ› ๏ธ DOM Listener โ€” 6-tier cascade

The JavaScript listener injected into the browser uses a 6-level selector cascade, from most stable to least stable:

Tier Strategy Example
1 data-testid / id / name [data-testid="submit-btn"]
2 aria-label / placeholder / title [aria-label="Close"]
3 href (links) a[href="/checkout"]
4 Parent with stable attribute [data-testid="form"] button
5 Associated label (inputs) //label[contains(text(),"Email")]//input
6 Short CSS + nth-of-type button.primary:nth-of-type(2)

Every selector is validated for uniqueness in the DOM. Shadow DOM support included.


๐Ÿค Contributors

Contributor Contribution
Julien Mer Original author
@szwnba Multi-provider LLM support (DeepSeek, Ollama) + CN translation

Contributions are welcome โ€” issues, bug reports, pull requests.


Tags: pytest plugin ยท playwright ยท playwright-python ยท ai testing ยท llm ยท openai ยท anthropic ยท deepseek ยท ollama ยท mistral ยท groq ยท self-healing tests ยท root cause analysis ยท test debugging ยท qa automation ยท gdpr ยท rgpd ยท data protection ยท multi-llm ยท open source qa


๐Ÿ“„ License

MIT โ€” Do whatever you want with it.


Created by Julien Mer โ€” JMer Consulting

QA Architect ยท 20+ years experience ยท Katalon Top Partner Europe

Newsletter QA OPS LAB

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qa_autopilot-1.2.3.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qa_autopilot-1.2.3-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file qa_autopilot-1.2.3.tar.gz.

File metadata

  • Download URL: qa_autopilot-1.2.3.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.2

File hashes

Hashes for qa_autopilot-1.2.3.tar.gz
Algorithm Hash digest
SHA256 4fcfe0e0449dfaa721efedade51a2c3e5f0520d604f34c1318c8faf79056439f
MD5 7b97e56ee3b43c1a366a1076a876da46
BLAKE2b-256 e35f7089856232c94d809937478760cfb99dc4c1c07a3b865dbe1b13b5b38dfa

See more details on using hashes here.

File details

Details for the file qa_autopilot-1.2.3-py3-none-any.whl.

File metadata

  • Download URL: qa_autopilot-1.2.3-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.2

File hashes

Hashes for qa_autopilot-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2787bf81398a707ae3167b3b2268ee622bb2c7c53512ddec0f690a2c22c63815
MD5 c43a21d3ba64f74a7f526c18d30dee7d
BLAKE2b-256 f6887409f962b84d9b634226bd13f1bbdd2e894884d92fc3fab1600f1d9c6625

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page