AI-powered diagnostic for failing Playwright tests โ multi-LLM, RGPD-friendly, open source. Detects root cause in seconds.
Project description
๐ QA Autopilot โ AI Diagnostic for Playwright Test Failures
pytest plugin โ Real-time AI diagnosis of Playwright test failures
Multi-LLM (OpenAI ยท Anthropic ยท Ollama ยท DeepSeek) ยท GDPR by default ยท Open source
Installation โข Quick Start โข Scorecard โข How it works โข Configuration
๐ฏ The problem
A Playwright test fails. The error message says:
TimeoutError: Page.click: Timeout 5000ms exceeded.
waiting for locator("a[href='/international/']")
The selector is correct. The element exists. So why isn't it working?
Because the cookie banner is covering everything. Or the element is inside an iframe. Or the DOM was reloaded via AJAX. Or the button is disabled. Or you used click() instead of dblclick().
QA Autopilot tells you the real cause in a single command.
๐ Scorecard
Results on a suite of 7 trap tests specifically designed to fool diagnostic tools:
| Test | Trap | AI Diagnosis | Category | Confidence |
|---|---|---|---|---|
| ๐ซฃ Cookie overlay | Element covered by banner | โ Cookie banner blocks the click | element_obscured |
๐ข 95% |
| ๐ผ๏ธ Invisible iframe | Element inside iframe, searched in main frame | โ Missing iframe context | iframe_context |
๐ข 95% |
| ๐ป Stale AJAX | Locator captured before DOM reload | โ Stale reference after AJAX | stale_reference |
๐ข 95% |
| โช๏ธ Silent redirect | URL redirected 301/302 | โ Test PASSED (trap detected) | โ | โ |
| ๐ซ Disabled button | Element visible but disabled | โ Disabled attribute detected | element_disabled |
๐ข 95% |
| ๐ค Unicode regex | Zinedine vs Zinรฉdine |
โ Regex accent mismatch | encoding_mismatch |
๐ข 95% |
| ๐ซฃ Double-click | Consent manager intercepts click | โ Overlay detected | element_obscured |
๐ข 95% |
6/6 correct diagnoses at 95% confidence โ the 7th test PASSED (no diagnosis needed).
โ ๏ธ Limitations
[!CAUTION] Tests over 200 lines: the context sent to the AI is intentionally truncated. An E2E test should stay short โ one scenario, one responsibility, under 50 lines. Beyond that, it's a design problem, not a diagnostic problem. Refactor your tests before looking for the cause of a failure.
๐ฆ Installation
QA Autopilot is a native Python module โ available directly on PyPI:
pip install qa-autopilot
That's it. No config, no server, no account. One line.
With automatic .env loading:
pip install qa-autopilot[dotenv]
Or from source:
git clone https://github.com/julienmerconsulting/qa-autopilot.git
cd qa-autopilot
pip install -e .
Prerequisites
playwright install chromium
.env configuration
Create a .env file at the root of your project:
# OpenAI (default)
OPENAI_API_KEY=sk-...
# Or DeepSeek
BASE_URL=https://api.deepseek.com
API_KEY=sk-...
QA_MODEL=deepseek-chat
# Or local Ollama (zero cost)
BASE_URL=http://localhost:11434/v1
API_KEY=ollama
QA_MODEL=llama3
โก Quick Start
pytest mode (recommended)
Add a single flag to your pytest command:
pytest tests/ --qa-autopilot -v
That's it. Every failing test gets an automatic AI diagnosis.
With HTML report
pytest tests/ --qa-autopilot --html=qa-reports/report.html --self-contained-html -v
Standalone mode
python -m qa_autopilot tests/test_checkout.py
python -m qa_autopilot tests/test_login.py::test_auth
python -m qa_autopilot tests/ -k "checkout" --headed
Direct import mode
from qa_autopilot import QAInterceptor
# Inside your test
interceptor = QAInterceptor(page)
interceptor.start()
# ... your test ...
# On failure
diagnosis = interceptor.diagnose(error_message, "test_file.py")
print(diagnosis["root_cause"])
print(diagnosis["category"])
๐ How it works
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ YOUR PLAYWRIGHT TEST โ
โ โ
โ page.goto("https://example.com") โ
โ page.click("#submit") โ FAIL โ
โ expect(page).to_have_url(...) โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ
โ QA AUTOPILOT HOOK โ
โ (listens in parallel) โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโ
โ DOM โ โ NETWORK โ โ CONSOLE โ
โListenerโ โ Capture โ โ Capture โ
โ (JS) โ โ req/res โ โ err/warn โ
โโโโโฌโโโโโ โโโโโโฌโโโโโโ โโโโโโโฌโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ CONTEXT BUNDLE โ
โ code + error + DOM โ
โ + network + console โ
โ + screenshot (opt) โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ ONE PROMPT โ AI โ
โ (12 categories) โ
โ diagnosis + fix โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโ
โTerminalโ โ JSON โ โ Jira โ
โ Output โ โ Report โ โ (if bug) โ
โโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโ
5-step pipeline
- Transparent hook โ Plugs into the Playwright page via native events
- Parallel capture โ DOM (injected JS listener), network, console, screenshots
- Failure detection โ The pytest hook intercepts the
FAILEDstatus - Bundle + Prompt โ All context goes out in ONE AI call
- Diagnosis โ Root cause + category + concrete fix + JSON report
๐ท๏ธ The 12 diagnosis categories
| Icon | Category | Description |
|---|---|---|
| ๐ฏ | wrong_selector |
Broken, missing, or too-broad selector |
| โญ๏ธ | missing_step |
Missing step (cookies, goto, dropdown) |
| โฑ๏ธ | timing |
Race condition, element not ready yet |
| ๐ซฃ | element_obscured |
Element covered by overlay/modal/banner |
| ๐ซ | element_disabled |
Element found but disabled |
| ๐ | wrong_action |
Wrong method (click vs dblclick, fill vs type) |
| ๐ผ๏ธ | iframe_context |
Element searched in the wrong frame |
| ๐ค | encoding_mismatch |
Unicode/accent/regex issue |
| ๐ป | stale_reference |
Stale locator after DOM change |
| ๐ | test_data |
Assertion with wrong expected value |
| ๐ | app_bug |
Application bug (not the test) โ generates a Jira ticket |
| ๐ | network |
Failed network requests (4xx/5xx) |
โ๏ธ Configuration
Environment variables
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required if no API_KEY) | OpenAI API key |
API_KEY |
(optional) | Key for alternative provider (DeepSeek, Ollamaโฆ) |
BASE_URL |
None (native OpenAI) |
LLM provider base URL |
QA_MODEL |
gpt-4.1-mini |
AI model to use |
QA_SCREENSHOT |
0 |
1 to include screenshots in the prompt |
QA_REPORT_DIR |
qa-reports/ |
Reports directory |
QA_REDACT_INPUTS |
1 |
Auto-redaction of sensitive fields (password, credit card, tokens, IBANโฆ). 0 to disable (not recommended). |
pytest arguments
pytest tests/ --qa-autopilot # Enable AI diagnosis
pytest tests/ --qa-autopilot --headed # With visible browser
pytest tests/ --qa-autopilot -k "login" # Filter by keyword
๐ Report structure
qa-reports/
โโโ summary_20260223_014751.json # Consolidated run report
โโโ diag_test_broken_20260223_014713.json # Individual diagnosis
โโโ diag_test_broken_20260223_014659.json
โโโ jira_test_broken_20260223_014659.md # Jira ticket (if app_bug)
โโโ report.html # pytest HTML report
Consolidated report example
[
{
"test": "test_element_covered_by_overlay[chromium]",
"category": "element_obscured",
"confidence": 0.95,
"root_cause": "The target element is covered by the cookie banner",
"suggested_fix": "Close the cookie banner before clicking"
}
]
๐ Security & GDPR
QA Autopilot automatically redacts sensitive data before any LLM call. This protection is enabled by default (QA_REDACT_INPUTS=1) and operates on two fronts:
1. Browser-side redaction (DOM listener)
Values typed into sensitive fields are intercepted inside the browser, in the saveEntry() function of the JS listener, and replaced with [REDACTED] before any storage. The real value never leaves the browser.
| Detection criteria | Examples |
|---|---|
HTML type |
password, email, tel |
name/id contains |
password, passwd, pwd, secret, token, cvv, card, ssn, auth, pin, api_key, credit, iban, bic, swift, client_secret |
placeholder / aria-label contains |
same patterns |
autocomplete |
current-password, new-password, cc-* (credit card) |
2. Source code redaction (before LLM call)
The test .py file is also scanned and hardcoded credentials are redacted:
page.fill("#password", "...")/.type()/.press_sequentially()/.input_value()on sensitive selectors- Python variables:
password = "...",token = "...",api_key = "...",client_secret = "...",access_token = "...", etc. os.environ["PASSWORD"] = "..."(direct assignment)
When a redaction is applied, qa-autopilot prints a warning:
โ ๏ธ Hardcoded credentials detected in test_login.py, redacted before LLM call.
Best practice: use os.environ or pytest fixtures for secrets.
What does the LLM see?
CAPTURED DOM ACTIONS (3 total)
1. INPUT โ
#username = 'john.doe@example.com'
2. INPUT โ
#password = [REDACTED โ sensitive field]
3. CLICK โ
button[type="submit"] (text: 'Login')
TEST CODE
def test_login(page):
page.fill("#username", "john.doe@example.com")
page.fill("#password", "[REDACTED]")
page.click("button[type='submit']")
The LLM is explicitly informed in the prompt that [REDACTED] does not mean an empty or broken field, but a GDPR protection. The diagnosis is performed without knowing the real value.
How to verify redaction works?
# Run a test that types a password
pytest tests/test_login.py --qa-autopilot
# Check the JSON report โ you should see [REDACTED] everywhere
grep -i "password\|REDACTED" qa-reports/diag_*.json
For the ultra-paranoid: intercept outgoing traffic with mitmproxy and verify that requests to api.openai.com never contain your real sensitive value.
Data sent to the LLM
| Data | Sent | Redactable |
|---|---|---|
| Test source code (3000 chars max) | โ | โ by default (regex on fill/assign/env) |
| Playwright error message | โ | โ |
| Element selectors | โ | โ |
| Input values (DOM listener) | โ | โ by default (6-criteria cascade) |
| Page URL | โ | โ |
| Console errors | โ | โ |
| 4xx/5xx request bodies (truncated) | โ | โ |
| Screenshots | only if QA_SCREENSHOT=1 |
n/a |
Disabling redaction (not recommended)
For the rare cases where the content of a "sensitive" field is legitimately useful to see (false positive on a field name containing auth but not actually authentication-related):
QA_REDACT_INPUTS=0 pytest tests/ --qa-autopilot
โ ๏ธ Use only with fictional test data. When in doubt, keep redaction enabled. Redaction is not an excuse to hardcode credentials: it doesn't catch every exotic case (variables with invented names, concatenated values, etc.). The golden rule remains: never hardcode secrets.
๐๏ธ Architecture
qa-autopilot/
โโโ qa_autopilot/
โ โโโ __init__.py # Public exports
โ โโโ core.py # QAInterceptor + capture
โ โโโ prompt.py # Prompt v2 (12 categories)
โ โโโ diagnose.py # AI call + retry + JSON mode
โ โโโ reporter.py # JSON + Jira markdown reports
โ โโโ listener.js # DOM listener (browser injection)
โ โโโ plugin.py # pytest hooks
โโโ tests/
โ โโโ test_traps.py # Trap test suite
โ โโโ conftest.py
โโโ examples/
โ โโโ standalone.py # Direct usage example
โโโ pyproject.toml
โโโ LICENSE
โโโ README.md
Note: The current version is a monolithic
qa_autopilot.pyfile (~600 lines). The structure above is the target for v2.
๐ Why not the alternatives?
| QA Autopilot | Playwright MCP (23K lines) | SaaS (Testim, Mabl...) | |
|---|---|---|---|
| Lines of code | ~600 | 23,000+ | Closed |
| Installation | pip install |
MCP server + config | Account + license |
| Config | 1 flag | 32 MCP tools | Dashboard + integration |
| Price | Free + OpenAI key | Free | $200-500/month/user |
| Diagnosis | 12 categories, 95% | Basic | Variable |
| Vendor lock-in | None | MCP protocol | Total |
๐ ๏ธ DOM Listener โ 6-tier cascade
The JavaScript listener injected into the browser uses a 6-level selector cascade, from most stable to least stable:
| Tier | Strategy | Example |
|---|---|---|
| 1 | data-testid / id / name |
[data-testid="submit-btn"] |
| 2 | aria-label / placeholder / title |
[aria-label="Close"] |
| 3 | href (links) |
a[href="/checkout"] |
| 4 | Parent with stable attribute | [data-testid="form"] button |
| 5 | Associated label (inputs) | //label[contains(text(),"Email")]//input |
| 6 | Short CSS + nth-of-type |
button.primary:nth-of-type(2) |
Every selector is validated for uniqueness in the DOM. Shadow DOM support included.
๐ค Contributors
| Contributor | Contribution |
|---|---|
| Julien Mer | Original author |
| @szwnba | Multi-provider LLM support (DeepSeek, Ollama) + CN translation |
Contributions are welcome โ issues, bug reports, pull requests.
Tags: pytest plugin ยท playwright ยท playwright-python ยท ai testing ยท llm ยท openai ยท anthropic ยท deepseek ยท ollama ยท mistral ยท groq ยท self-healing tests ยท root cause analysis ยท test debugging ยท qa automation ยท gdpr ยท rgpd ยท data protection ยท multi-llm ยท open source qa
๐ License
MIT โ Do whatever you want with it.
Created by Julien Mer โ JMer Consulting
QA Architect ยท 20+ years experience ยท Katalon Top Partner Europe
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qa_autopilot-1.2.3.tar.gz.
File metadata
- Download URL: qa_autopilot-1.2.3.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fcfe0e0449dfaa721efedade51a2c3e5f0520d604f34c1318c8faf79056439f
|
|
| MD5 |
7b97e56ee3b43c1a366a1076a876da46
|
|
| BLAKE2b-256 |
e35f7089856232c94d809937478760cfb99dc4c1c07a3b865dbe1b13b5b38dfa
|
File details
Details for the file qa_autopilot-1.2.3-py3-none-any.whl.
File metadata
- Download URL: qa_autopilot-1.2.3-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2787bf81398a707ae3167b3b2268ee622bb2c7c53512ddec0f690a2c22c63815
|
|
| MD5 |
c43a21d3ba64f74a7f526c18d30dee7d
|
|
| BLAKE2b-256 |
f6887409f962b84d9b634226bd13f1bbdd2e894884d92fc3fab1600f1d9c6625
|