Your AI QA crew — evaluates products like real users would, right from your coding session
Project description
Preflight
External-experience AI QA system. Evaluates shipped products like real users would — through the UI only, no code inspection — and produces evidence-backed findings plus repair briefs for coding agents (Claude Code, Codex, Cursor).
What It Does
- Understands your product — infers purpose, audience, and critical flows from visible surfaces + optional repo analysis
- Generates realistic test personas — dynamically creates a team of user agents tailored to your product (4-6 agents, capped for speed)
- Evaluates through the UI — runs web (Playwright) and mobile (Playwright emulation / Maestro) interactions as real users would
- Applies specialist lenses in parallel — design critique, trust assessment, auth/login flow review, mobile responsiveness check, institutional/governance review
- Deduplicates aggressively — two-pass dedup using error signatures (fast) then LLM semantic clustering
- Groups related issues — clusters findings by category and product area into issue groups
- Produces actionable reports — interactive HTML report with clickable severity cards, inline screenshot galleries, search, and score explanations
- Generates developer handoffs — prioritized tasks with fix options, file mapping, dependency graphs, and verification steps
Quick Start
# Install from PyPI
pip install preflight-qa
playwright install chromium
# Quick check (~1-2 min) — fast single-pass evaluation
preflight check https://your-product.com
# Full evaluation with repo context
preflight run https://your-product.com --repo https://github.com/user/repo
# Interactive mode (prompts for everything)
preflight
You'll need a Google API key (free from aistudio.google.com):
export GOOGLE_API_KEY=your-key-here
More Examples
# Quick check with focus
preflight check https://your-product.com --focus "login flow"
# Full run with options
preflight run https://your-product.com \
--brief "B2B SaaS dashboard for financial analytics" \
--credentials '{"email": "test@example.com", "password": "test123"}' \
--focus "onboarding,search,export" \
--output ./my-report
# Generate handoff from existing run
preflight handoff ./artifacts --format claude-code
# Compare runs for regressions
preflight compare ./baseline ./current
# Export issues to GitHub
preflight export-issues --repo user/repo --run ./artifacts
# Schedule overnight runs
preflight schedule https://your-product.com --cron "0 2 * * *"
Configuration
Create preflight.yaml or pass options via CLI:
target:
url: https://your-product.com
credentials:
email: test@example.com
password: test123
options:
brief: "Financial analytics dashboard"
focus_flows:
- onboarding
- search
- export
personas_hint: "enterprise finance users"
institutional_review: auto # auto | on | off
design_review: true
llm:
provider: anthropic # anthropic | openai
model: claude-sonnet-4-20250514
api_key_env: ANTHROPIC_API_KEY
output:
dir: ./reports
formats:
- markdown
- json
- html
- repair_briefs
Environment Variables
ANTHROPIC_API_KEY=sk-... # Required (or OPENAI_API_KEY)
GITHUB_TOKEN=ghp_... # Optional, for repo analysis and issue export
HUMANQA_OUTPUT_DIR=./reports # Optional, default: ./artifacts
MCP Server (Claude Code / AI Tool Integration)
Preflight exposes an MCP (Model Context Protocol) server so AI coding tools like Claude Code can use it as a tool.
Setup
Add to your Claude Code MCP configuration (~/.claude/claude_desktop_config.json or project .mcp.json):
{
"mcpServers": {
"preflight": {
"command": "preflight-mcp",
"env": {
"GEMINI_API_KEY": "your-gemini-key"
}
}
}
}
Available Tools
| Tool | Description | Speed |
|---|---|---|
preflight_quick_check |
Fast single-pass evaluation | ~30s |
preflight_evaluate |
Full multi-agent QA pipeline | 2-5 min |
preflight_get_report |
Retrieve existing reports | Instant |
preflight_compare |
Compare runs for regressions | Instant |
Usage in Claude Code
"Quick check my staging site at https://staging.myapp.com"
"Run a full evaluation on https://myapp.com with repo https://github.com/org/myapp"
"Show me the Preflight report from the last run"
"Compare the current run against ./baseline for regressions"
See HUMANQA_SKILL.md for the full integration guide.
Architecture
preflight/
├── core/
│ ├── intent_modeler.py # Infers product purpose from visible surfaces
│ ├── persona_generator.py # Generates tailored user agent team
│ ├── orchestrator.py # Coordinates agents, dedup, issue grouping
│ ├── pipeline.py # End-to-end pipeline with timeouts & parallelization
│ ├── performance.py # Product-type-aware performance budgets
│ ├── file_mapper.py # Maps issues to likely source files
│ ├── repo_analyzer.py # GitHub repo analysis (visibility, tech stack, routes)
│ ├── schemas.py # All data models (issues, groups, evidence, agents)
│ ├── actions.py # Deterministic browser action engine
│ ├── quick_check.py # Fast single-pass evaluation for MCP/CI
│ ├── progress.py # Visual progress tracker
│ └── llm.py # LLM abstraction (Anthropic / OpenAI / Gemini)
├── runners/
│ ├── web_runner.py # Playwright-based web evaluation (desktop + mobile)
│ ├── mobile_runner.py # Mobile web emulation / Maestro native app testing
│ └── page_snapshot.py # Page state capture (a11y tree, screenshots, metrics)
├── lenses/
│ ├── design_lens.py # Design/UI quality review
│ ├── trust_lens.py # Trust signal detection
│ ├── auth_lens.py # Login/auth flow evaluation (no credentials needed)
│ ├── responsive_lens.py # Mobile responsiveness & layout comparison
│ └── institutional_lens.py # Governance/provenance/auditability review
├── reporting/
│ ├── report_generator.py # Markdown, JSON, interactive HTML reports
│ ├── handoff.py # Developer handoff (HANDOFF.md + handoff.json)
│ ├── comparison.py # Run-to-run regression comparison
│ ├── github_export.py # Export issues to GitHub Issues
│ ├── webhook.py # Slack/webhook notifications
│ └── templates/
│ └── report.html # Interactive HTML report template
├── mcp_server.py # MCP server (Claude Code / AI tool integration)
└── scheduling/
└── scheduler.py # Cron-based scheduled runs
Evaluation Pipeline
Scrape landing page ──► Build intent model ──► Generate personas (max 6)
│ │
Analyze repo ◄─── (optional) ▼
Orchestrate evaluation
(8 steps/journey cap, 5min timeout)
│
▼
┌─────── Specialist lenses (parallel) ──────┐
│ Design │ Trust │ Responsive │ Auth │
└──────────┴────────┴────────────┴─────────┘
│
Institutional lens (if applicable)
│
▼
Deduplicate & group issues
(signature pass → LLM pass)
│
┌───────────┼───────────┐
▼ ▼ ▼
report.html HANDOFF.md repair_briefs/
report.md handoff.json
report.json
Key Features
Evidence-Backed Findings
Every issue cites specific evidence — screenshot references, element selectors from the accessibility tree, measured metrics, or documented absences. Findings without anchored evidence are rejected.
Screenshot Evidence with Captions
Screenshots include contextual metadata: captions describing what's shown, step references linking to the journey, viewport dimensions, and timestamps.
Adaptive Performance Budgets
Performance thresholds adapt to product type:
- Marketing sites — tight LCP and load budgets (first impressions matter)
- SaaS apps — lenient LCP, strict CLS (interactivity over raw speed)
- E-commerce — strictest CLS budgets (layout shifts hurt conversion)
- Content sites — tightest LCP (fast text rendering)
- Mobile web — accounts for slower networks
Login/Auth Evaluation
Evaluates login pages as product surfaces without requiring credentials — assesses form quality, error handling, trust signals, password reset availability, and accessibility.
Mobile Responsiveness
Compares desktop vs mobile findings to catch responsive design issues: layout breaks, undersized touch targets, text readability, navigation adaptation, and horizontal overflow.
Developer Handoff with Fix Options
Each handoff task includes multiple fix strategies with trade-offs:
- Critical issues get both quick-patch and proper-fix options
- Accessibility issues get ARIA/semantic HTML suggestions
- Performance issues get optimization pass options
Repo Visibility Detection
Automatically detects whether the GitHub repo is public or private, informing trust evaluation and handoff context.
Output
Each run produces:
report.html— Interactive HTML report with filters, search, clickable severity cards, inline screenshot galleries, and score explanationsreport.md— Human-readable prioritized findingsreport.json— Machine-readable full exportHANDOFF.md— Developer handoff with tasks, fix options, dependencieshandoff.json— Machine-readable handoff for AI coding toolsrepair_briefs/— Per-issue technical guides for coding agents- Screenshots — Evidence images referenced in reports
Core Principle
This system never inspects source code. All evaluation happens through the observable user experience — the same surface real users see.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file preflight_qa-0.1.0.tar.gz.
File metadata
- Download URL: preflight_qa-0.1.0.tar.gz
- Upload date:
- Size: 129.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d8cb07a6104cf8af95aa3c3824e312ec842b1f02cce4993a9e6b448d35654d5
|
|
| MD5 |
1607853e0242cbb6171b7dd8858ba33a
|
|
| BLAKE2b-256 |
94f9ab1be3e589e341ca1b395206bc39204f2c7de74d66c42bf110cfb1d9ebea
|
File details
Details for the file preflight_qa-0.1.0-py3-none-any.whl.
File metadata
- Download URL: preflight_qa-0.1.0-py3-none-any.whl
- Upload date:
- Size: 149.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec22ca86542772d593e04719c6f6a8dec2c7cf1ace3b8ae84f6a152ccb9210e9
|
|
| MD5 |
25fbff9be1879be125ab70e9221d435a
|
|
| BLAKE2b-256 |
b32428864db6cda2ea24453d0fc70ec7aa344e32b7b5a237196ce72622cfa410
|