AI-powered diagnostic for failing Playwright tests — multi-LLM, RGPD-friendly, open source. Detects root cause in seconds.

These details have not been verified by PyPI

Project links

Project description

🚀 QA Autopilot — AI Diagnostic for Playwright Test Failures

pytest plugin — Real-time AI diagnosis of Playwright test failures

Multi-LLM (OpenAI · Anthropic · Ollama · DeepSeek) · GDPR by default · Open source

Installation • Quick Start • Scorecard • How it works • Configuration

🎯 The problem

A Playwright test fails. The error message says:

TimeoutError: Page.click: Timeout 5000ms exceeded.
waiting for locator("a[href='/international/']")

The selector is correct. The element exists. So why isn't it working?

Because the cookie banner is covering everything. Or the element is inside an iframe. Or the DOM was reloaded via AJAX. Or the button is disabled. Or you used click() instead of dblclick().

QA Autopilot tells you the real cause in a single command.

📊 Scorecard

Results on a suite of 7 trap tests specifically designed to fool diagnostic tools:

Test	Trap	AI Diagnosis	Category	Confidence
🫣 Cookie overlay	Element covered by banner	✅ Cookie banner blocks the click	`element_obscured`	🟢 95%
🖼️ Invisible iframe	Element inside iframe, searched in main frame	✅ Missing iframe context	`iframe_context`	🟢 95%
👻 Stale AJAX	Locator captured before DOM reload	✅ Stale reference after AJAX	`stale_reference`	🟢 95%
↪️ Silent redirect	URL redirected 301/302	✅ Test PASSED (trap detected)	—	✅
🚫 Disabled button	Element visible but disabled	✅ Disabled attribute detected	`element_disabled`	🟢 95%
🔤 Unicode regex	`Zinedine` vs `Zinédine`	✅ Regex accent mismatch	`encoding_mismatch`	🟢 95%
🫣 Double-click	Consent manager intercepts click	✅ Overlay detected	`element_obscured`	🟢 95%

6/6 correct diagnoses at 95% confidence — the 7th test PASSED (no diagnosis needed).

⚠️ Limitations

[!CAUTION] Tests over 200 lines: the context sent to the AI is intentionally truncated. An E2E test should stay short — one scenario, one responsibility, under 50 lines. Beyond that, it's a design problem, not a diagnostic problem. Refactor your tests before looking for the cause of a failure.

📦 Installation

QA Autopilot is a native Python module — available directly on PyPI:

pip install qa-autopilot

That's it. No config, no server, no account. One line.

With automatic .env loading:

pip install qa-autopilot[dotenv]

Or from source:

git clone https://github.com/julienmerconsulting/qa-autopilot.git
cd qa-autopilot
pip install -e .

Prerequisites

playwright install chromium

`.env` configuration

Create a .env file at the root of your project:

# OpenAI (default)
OPENAI_API_KEY=sk-...

# Or DeepSeek
BASE_URL=https://api.deepseek.com
API_KEY=sk-...
QA_MODEL=deepseek-chat

# Or local Ollama (zero cost)
BASE_URL=http://localhost:11434/v1
API_KEY=ollama
QA_MODEL=llama3

⚡ Quick Start

pytest mode (recommended)

Add a single flag to your pytest command:

pytest tests/ --qa-autopilot -v

That's it. Every failing test gets an automatic AI diagnosis.

With HTML report

pytest tests/ --qa-autopilot --html=qa-reports/report.html --self-contained-html -v

Standalone mode

python -m qa_autopilot tests/test_checkout.py
python -m qa_autopilot tests/test_login.py::test_auth
python -m qa_autopilot tests/ -k "checkout" --headed

Direct import mode

from qa_autopilot import QAInterceptor

# Inside your test
interceptor = QAInterceptor(page)
interceptor.start()

# ... your test ...

# On failure
diagnosis = interceptor.diagnose(error_message, "test_file.py")
print(diagnosis["root_cause"])
print(diagnosis["category"])

🔍 How it works

┌─────────────────────────────────────────────────────┐
│                YOUR PLAYWRIGHT TEST                  │
│                                                      │
│   page.goto("https://example.com")                  │
│   page.click("#submit")           ← FAIL            │
│   expect(page).to_have_url(...)                     │
└──────────────────────┬──────────────────────────────┘
                       │
         ┌─────────────▼─────────────┐
         │    QA AUTOPILOT HOOK      │
         │   (listens in parallel)   │
         └─────────────┬─────────────┘
                       │
    ┌──────────────────┼──────────────────┐
    ▼                  ▼                  ▼
┌────────┐      ┌──────────┐      ┌───────────┐
│  DOM   │      │ NETWORK  │      │ CONSOLE   │
│Listener│      │ Capture  │      │ Capture   │
│  (JS)  │      │ req/res  │      │ err/warn  │
└───┬────┘      └────┬─────┘      └─────┬─────┘
    │                │                   │
    └────────────────┼───────────────────┘
                     │
         ┌───────────▼───────────┐
         │   CONTEXT BUNDLE      │
         │  code + error + DOM   │
         │  + network + console  │
         │  + screenshot (opt)   │
         └───────────┬───────────┘
                     │
         ┌───────────▼───────────┐
         │    ONE PROMPT → AI    │
         │   (12 categories)     │
         │   diagnosis + fix     │
         └───────────┬───────────┘
                     │
    ┌────────────────┼────────────────┐
    ▼                ▼                ▼
┌────────┐    ┌───────────┐    ┌──────────┐
│Terminal│    │   JSON    │    │  Jira    │
│ Output │    │  Report   │    │ (if bug) │
└────────┘    └───────────┘    └──────────┘

5-step pipeline

Transparent hook — Plugs into the Playwright page via native events
Parallel capture — DOM (injected JS listener), network, console, screenshots
Failure detection — The pytest hook intercepts the FAILED status
Bundle + Prompt — All context goes out in ONE AI call
Diagnosis — Root cause + category + concrete fix + JSON report

🏷️ The 12 diagnosis categories

Icon	Category	Description
🎯	`wrong_selector`	Broken, missing, or too-broad selector
⏭️	`missing_step`	Missing step (cookies, goto, dropdown)
⏱️	`timing`	Race condition, element not ready yet
🫣	`element_obscured`	Element covered by overlay/modal/banner
🚫	`element_disabled`	Element found but disabled
🔀	`wrong_action`	Wrong method (click vs dblclick, fill vs type)
🖼️	`iframe_context`	Element searched in the wrong frame
🔤	`encoding_mismatch`	Unicode/accent/regex issue
👻	`stale_reference`	Stale locator after DOM change
📊	`test_data`	Assertion with wrong expected value
🐛	`app_bug`	Application bug (not the test) → generates a Jira ticket
🌐	`network`	Failed network requests (4xx/5xx)

⚙️ Configuration

Environment variables

Variable	Default	Description
`OPENAI_API_KEY`	(required if no API_KEY)	OpenAI API key
`API_KEY`	(optional)	Key for alternative provider (DeepSeek, Ollama…)
`BASE_URL`	`None` (native OpenAI)	LLM provider base URL
`QA_MODEL`	`gpt-4.1-mini`	AI model to use
`QA_SCREENSHOT`	`0`	`1` to include screenshots in the prompt
`QA_REPORT_DIR`	`qa-reports/`	Reports directory
`QA_REDACT_INPUTS`	`1`	Auto-redaction of sensitive fields (password, credit card, tokens, IBAN…). `0` to disable (not recommended).

pytest arguments

pytest tests/ --qa-autopilot          # Enable AI diagnosis
pytest tests/ --qa-autopilot --headed # With visible browser
pytest tests/ --qa-autopilot -k "login" # Filter by keyword

📁 Report structure

qa-reports/
├── summary_20260223_014751.json          # Consolidated run report
├── diag_test_broken_20260223_014713.json # Individual diagnosis
├── diag_test_broken_20260223_014659.json
├── jira_test_broken_20260223_014659.md   # Jira ticket (if app_bug)
└── report.html                           # pytest HTML report

Consolidated report example

[
  {
    "test": "test_element_covered_by_overlay[chromium]",
    "category": "element_obscured",
    "confidence": 0.95,
    "root_cause": "The target element is covered by the cookie banner",
    "suggested_fix": "Close the cookie banner before clicking"
  }
]

🔒 Security & GDPR

QA Autopilot automatically redacts sensitive data before any LLM call. This protection is enabled by default (QA_REDACT_INPUTS=1) and operates on two fronts:

1. Browser-side redaction (DOM listener)

Values typed into sensitive fields are intercepted inside the browser, in the saveEntry() function of the JS listener, and replaced with [REDACTED] before any storage. The real value never leaves the browser.

Detection criteria	Examples
HTML `type`	`password`, `email`, `tel`
`name`/`id` contains	`password`, `passwd`, `pwd`, `secret`, `token`, `cvv`, `card`, `ssn`, `auth`, `pin`, `api_key`, `credit`, `iban`, `bic`, `swift`, `client_secret`
`placeholder` / `aria-label` contains	same patterns
`autocomplete`	`current-password`, `new-password`, `cc-*` (credit card)

2. Source code redaction (before LLM call)

The test .py file is also scanned and hardcoded credentials are redacted:

page.fill("#password", "...") / .type() / .press_sequentially() / .input_value() on sensitive selectors
Python variables: password = "...", token = "...", api_key = "...", client_secret = "...", access_token = "...", etc.
os.environ["PASSWORD"] = "..." (direct assignment)

When a redaction is applied, qa-autopilot prints a warning:

⚠️  Hardcoded credentials detected in test_login.py, redacted before LLM call.
    Best practice: use os.environ or pytest fixtures for secrets.

What does the LLM see?

CAPTURED DOM ACTIONS (3 total)
  1. INPUT ✅ #username = 'john.doe@example.com'
  2. INPUT ✅ #password = [REDACTED — sensitive field]
  3. CLICK ✅ button[type="submit"] (text: 'Login')

TEST CODE
def test_login(page):
    page.fill("#username", "john.doe@example.com")
    page.fill("#password", "[REDACTED]")
    page.click("button[type='submit']")

The LLM is explicitly informed in the prompt that [REDACTED] does not mean an empty or broken field, but a GDPR protection. The diagnosis is performed without knowing the real value.

How to verify redaction works?

# Run a test that types a password
pytest tests/test_login.py --qa-autopilot

# Check the JSON report — you should see [REDACTED] everywhere
grep -i "password\|REDACTED" qa-reports/diag_*.json

For the ultra-paranoid: intercept outgoing traffic with mitmproxy and verify that requests to api.openai.com never contain your real sensitive value.

Data sent to the LLM

Data	Sent	Redactable
Test source code (3000 chars max)	✅	✅ by default (regex on fill/assign/env)
Playwright error message	✅	❌
Element selectors	✅	❌
Input values (DOM listener)	✅	✅ by default (6-criteria cascade)
Page URL	✅	❌
Console errors	✅	❌
4xx/5xx request bodies (truncated)	✅	❌
Screenshots	only if `QA_SCREENSHOT=1`	n/a

Disabling redaction (not recommended)

For the rare cases where the content of a "sensitive" field is legitimately useful to see (false positive on a field name containing auth but not actually authentication-related):

QA_REDACT_INPUTS=0 pytest tests/ --qa-autopilot

⚠️ Use only with fictional test data. When in doubt, keep redaction enabled. Redaction is not an excuse to hardcode credentials: it doesn't catch every exotic case (variables with invented names, concatenated values, etc.). The golden rule remains: never hardcode secrets.

🏗️ Architecture

qa-autopilot/
├── qa_autopilot/
│   ├── __init__.py          # Public exports
│   ├── core.py              # QAInterceptor + capture
│   ├── prompt.py            # Prompt v2 (12 categories)
│   ├── diagnose.py          # AI call + retry + JSON mode
│   ├── reporter.py          # JSON + Jira markdown reports
│   ├── listener.js          # DOM listener (browser injection)
│   └── plugin.py            # pytest hooks
├── tests/
│   ├── test_traps.py        # Trap test suite
│   └── conftest.py
├── examples/
│   └── standalone.py        # Direct usage example
├── pyproject.toml
├── LICENSE
└── README.md

Note: The current version is a monolithic qa_autopilot.py file (~600 lines). The structure above is the target for v2.

🆚 Why not the alternatives?

	QA Autopilot	Playwright MCP (23K lines)	SaaS (Testim, Mabl...)
Lines of code	~600	23,000+	Closed
Installation	`pip install`	MCP server + config	Account + license
Config	1 flag	32 MCP tools	Dashboard + integration
Price	Free + OpenAI key	Free	$200-500/month/user
Diagnosis	12 categories, 95%	Basic	Variable
Vendor lock-in	None	MCP protocol	Total

🛠️ DOM Listener — 6-tier cascade

The JavaScript listener injected into the browser uses a 6-level selector cascade, from most stable to least stable:

Tier	Strategy	Example
1	`data-testid` / `id` / `name`	`[data-testid="submit-btn"]`
2	`aria-label` / `placeholder` / `title`	`[aria-label="Close"]`
3	`href` (links)	`a[href="/checkout"]`
4	Parent with stable attribute	`[data-testid="form"] button`
5	Associated label (inputs)	`//label[contains(text(),"Email")]//input`
6	Short CSS + `nth-of-type`	`button.primary:nth-of-type(2)`

Every selector is validated for uniqueness in the DOM. Shadow DOM support included.

🤝 Contributors

Contributor	Contribution
Julien Mer	Original author
@szwnba	Multi-provider LLM support (DeepSeek, Ollama) + CN translation

Contributions are welcome — issues, bug reports, pull requests.

_{Tags: pytest plugin · playwright · playwright-python · ai testing · llm · openai · anthropic · deepseek · ollama · mistral · groq · self-healing tests · root cause analysis · test debugging · qa automation · gdpr · rgpd · data protection · multi-llm · open source qa}

📄 License

MIT — Do whatever you want with it.

Created by Julien Mer — JMer Consulting

QA Architect · 20+ years experience · Katalon Top Partner Europe

QA OPS LAB

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.3

Apr 8, 2026

1.2.2

Apr 8, 2026

1.2.1

Mar 22, 2026

1.2.0

Mar 21, 2026

1.0.1

Mar 11, 2026

1.0.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qa_autopilot-1.2.3.tar.gz (31.3 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qa_autopilot-1.2.3-py3-none-any.whl (25.7 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file qa_autopilot-1.2.3.tar.gz.

File metadata

Download URL: qa_autopilot-1.2.3.tar.gz
Upload date: Apr 8, 2026
Size: 31.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.2

File hashes

Hashes for qa_autopilot-1.2.3.tar.gz
Algorithm	Hash digest
SHA256	`4fcfe0e0449dfaa721efedade51a2c3e5f0520d604f34c1318c8faf79056439f`
MD5	`7b97e56ee3b43c1a366a1076a876da46`
BLAKE2b-256	`e35f7089856232c94d809937478760cfb99dc4c1c07a3b865dbe1b13b5b38dfa`

See more details on using hashes here.

File details

Details for the file qa_autopilot-1.2.3-py3-none-any.whl.

File metadata

Download URL: qa_autopilot-1.2.3-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.2

File hashes

Hashes for qa_autopilot-1.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2787bf81398a707ae3167b3b2268ee622bb2c7c53512ddec0f690a2c22c63815`
MD5	`c43a21d3ba64f74a7f526c18d30dee7d`
BLAKE2b-256	`f6887409f962b84d9b634226bd13f1bbdd2e894884d92fc3fab1600f1d9c6625`

See more details on using hashes here.

qa-autopilot 1.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 QA Autopilot — AI Diagnostic for Playwright Test Failures

pytest plugin — Real-time AI diagnosis of Playwright test failures

Multi-LLM (OpenAI · Anthropic · Ollama · DeepSeek) · GDPR by default · Open source

🎯 The problem

📊 Scorecard

⚠️ Limitations

📦 Installation

Prerequisites

.env configuration

⚡ Quick Start

pytest mode (recommended)

With HTML report

Standalone mode

Direct import mode

🔍 How it works

5-step pipeline

🏷️ The 12 diagnosis categories

⚙️ Configuration

Environment variables

pytest arguments

📁 Report structure

Consolidated report example

🔒 Security & GDPR

1. Browser-side redaction (DOM listener)

2. Source code redaction (before LLM call)

What does the LLM see?

How to verify redaction works?

Data sent to the LLM

Disabling redaction (not recommended)

🏗️ Architecture

🆚 Why not the alternatives?

🛠️ DOM Listener — 6-tier cascade

🤝 Contributors

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`.env` configuration