Your AI QA crew — evaluates products like real users would, right from your coding session

These details have not been verified by PyPI

Project links

Project description

Preflight

External-experience AI QA system. Evaluates shipped products like real users would — through the UI only, no code inspection — and produces evidence-backed findings plus repair briefs for coding agents (Claude Code, Codex, Cursor).

What It Does

Understands your product — infers purpose, audience, and critical flows from visible surfaces + optional repo analysis
Generates realistic test personas — dynamically creates a team of user agents tailored to your product (4-6 agents, capped for speed)
Evaluates through the UI — runs web (Playwright) and mobile (Playwright emulation / Maestro) interactions as real users would
Applies specialist lenses in parallel — design critique, trust assessment, auth/login flow review, mobile responsiveness check, institutional/governance review
Deduplicates aggressively — two-pass dedup using error signatures (fast) then LLM semantic clustering
Groups related issues — clusters findings by category and product area into issue groups
Produces actionable reports — interactive HTML report with clickable severity cards, inline screenshot galleries, search, and score explanations
Generates developer handoffs — prioritized tasks with fix options, file mapping, dependency graphs, and verification steps

Quick Start

# Install from PyPI
pip install preflight-qa
playwright install chromium

# Quick check (~1-2 min) — fast single-pass evaluation
preflight check https://your-product.com

# Full evaluation with repo context
preflight run https://your-product.com --repo https://github.com/user/repo

# Interactive mode (prompts for everything)
preflight

You'll need a Google API key (free from aistudio.google.com):

export GOOGLE_API_KEY=your-key-here

More Examples

# Quick check with focus
preflight check https://your-product.com --focus "login flow"

# Full run with options
preflight run https://your-product.com \
  --brief "B2B SaaS dashboard for financial analytics" \
  --credentials '{"email": "test@example.com", "password": "test123"}' \
  --focus "onboarding,search,export" \
  --output ./my-report

# Generate handoff from existing run
preflight handoff ./artifacts --format claude-code

# Compare runs for regressions
preflight compare ./baseline ./current

# Export issues to GitHub
preflight export-issues --repo user/repo --run ./artifacts

# Schedule overnight runs
preflight schedule https://your-product.com --cron "0 2 * * *"

Configuration

Create preflight.yaml or pass options via CLI:

target:
  url: https://your-product.com
  credentials:
    email: test@example.com
    password: test123

options:
  brief: "Financial analytics dashboard"
  focus_flows:
    - onboarding
    - search
    - export
  personas_hint: "enterprise finance users"
  institutional_review: auto  # auto | on | off
  design_review: true

llm:
  provider: anthropic  # anthropic | openai
  model: claude-sonnet-4-20250514
  api_key_env: ANTHROPIC_API_KEY

output:
  dir: ./reports
  formats:
    - markdown
    - json
    - html
    - repair_briefs

Environment Variables

ANTHROPIC_API_KEY=sk-...    # Required (or OPENAI_API_KEY)
GITHUB_TOKEN=ghp_...        # Optional, for repo analysis and issue export
HUMANQA_OUTPUT_DIR=./reports # Optional, default: ./artifacts

MCP Server (Claude Code / AI Tool Integration)

Preflight exposes an MCP (Model Context Protocol) server so AI coding tools like Claude Code can use it as a tool.

Setup

Add to your Claude Code MCP configuration (~/.claude/claude_desktop_config.json or project .mcp.json):

{
  "mcpServers": {
    "preflight": {
      "command": "preflight-mcp",
      "env": {
        "GEMINI_API_KEY": "your-gemini-key"
      }
    }
  }
}

Available Tools

Tool	Description	Speed
`preflight_quick_check`	Fast single-pass evaluation	~30s
`preflight_evaluate`	Full multi-agent QA pipeline	2-5 min
`preflight_get_report`	Retrieve existing reports	Instant
`preflight_compare`	Compare runs for regressions	Instant

Usage in Claude Code

"Quick check my staging site at https://staging.myapp.com"
"Run a full evaluation on https://myapp.com with repo https://github.com/org/myapp"
"Show me the Preflight report from the last run"
"Compare the current run against ./baseline for regressions"

See HUMANQA_SKILL.md for the full integration guide.

Architecture

preflight/
├── core/
│   ├── intent_modeler.py    # Infers product purpose from visible surfaces
│   ├── persona_generator.py # Generates tailored user agent team
│   ├── orchestrator.py      # Coordinates agents, dedup, issue grouping
│   ├── pipeline.py          # End-to-end pipeline with timeouts & parallelization
│   ├── performance.py       # Product-type-aware performance budgets
│   ├── file_mapper.py       # Maps issues to likely source files
│   ├── repo_analyzer.py     # GitHub repo analysis (visibility, tech stack, routes)
│   ├── schemas.py           # All data models (issues, groups, evidence, agents)
│   ├── actions.py           # Deterministic browser action engine
│   ├── quick_check.py       # Fast single-pass evaluation for MCP/CI
│   ├── progress.py          # Visual progress tracker
│   └── llm.py               # LLM abstraction (Anthropic / OpenAI / Gemini)
├── runners/
│   ├── web_runner.py        # Playwright-based web evaluation (desktop + mobile)
│   ├── mobile_runner.py     # Mobile web emulation / Maestro native app testing
│   └── page_snapshot.py     # Page state capture (a11y tree, screenshots, metrics)
├── lenses/
│   ├── design_lens.py       # Design/UI quality review
│   ├── trust_lens.py        # Trust signal detection
│   ├── auth_lens.py         # Login/auth flow evaluation (no credentials needed)
│   ├── responsive_lens.py   # Mobile responsiveness & layout comparison
│   └── institutional_lens.py # Governance/provenance/auditability review
├── reporting/
│   ├── report_generator.py  # Markdown, JSON, interactive HTML reports
│   ├── handoff.py           # Developer handoff (HANDOFF.md + handoff.json)
│   ├── comparison.py        # Run-to-run regression comparison
│   ├── github_export.py     # Export issues to GitHub Issues
│   ├── webhook.py           # Slack/webhook notifications
│   └── templates/
│       └── report.html      # Interactive HTML report template
├── mcp_server.py            # MCP server (Claude Code / AI tool integration)
└── scheduling/
    └── scheduler.py         # Cron-based scheduled runs

Evaluation Pipeline

Scrape landing page ──► Build intent model ──► Generate personas (max 6)
         │                                            │
    Analyze repo ◄─── (optional)                      ▼
                                          Orchestrate evaluation
                                          (8 steps/journey cap, 5min timeout)
                                                      │
                                                      ▼
                                    ┌─────── Specialist lenses (parallel) ──────┐
                                    │  Design  │  Trust  │ Responsive │  Auth   │
                                    └──────────┴────────┴────────────┴─────────┘
                                                      │
                                        Institutional lens (if applicable)
                                                      │
                                                      ▼
                                          Deduplicate & group issues
                                          (signature pass → LLM pass)
                                                      │
                                          ┌───────────┼───────────┐
                                          ▼           ▼           ▼
                                      report.html  HANDOFF.md  repair_briefs/
                                      report.md    handoff.json
                                      report.json

Key Features

Evidence-Backed Findings

Every issue cites specific evidence — screenshot references, element selectors from the accessibility tree, measured metrics, or documented absences. Findings without anchored evidence are rejected.

Screenshot Evidence with Captions

Screenshots include contextual metadata: captions describing what's shown, step references linking to the journey, viewport dimensions, and timestamps.

Adaptive Performance Budgets

Performance thresholds adapt to product type:

Marketing sites — tight LCP and load budgets (first impressions matter)
SaaS apps — lenient LCP, strict CLS (interactivity over raw speed)
E-commerce — strictest CLS budgets (layout shifts hurt conversion)
Content sites — tightest LCP (fast text rendering)
Mobile web — accounts for slower networks

Login/Auth Evaluation

Evaluates login pages as product surfaces without requiring credentials — assesses form quality, error handling, trust signals, password reset availability, and accessibility.

Mobile Responsiveness

Compares desktop vs mobile findings to catch responsive design issues: layout breaks, undersized touch targets, text readability, navigation adaptation, and horizontal overflow.

Developer Handoff with Fix Options

Each handoff task includes multiple fix strategies with trade-offs:

Critical issues get both quick-patch and proper-fix options
Accessibility issues get ARIA/semantic HTML suggestions
Performance issues get optimization pass options

Repo Visibility Detection

Automatically detects whether the GitHub repo is public or private, informing trust evaluation and handoff context.

Output

Each run produces:

report.html — Interactive HTML report with filters, search, clickable severity cards, inline screenshot galleries, and score explanations
report.md — Human-readable prioritized findings
report.json — Machine-readable full export
HANDOFF.md — Developer handoff with tasks, fix options, dependencies
handoff.json — Machine-readable handoff for AI coding tools
repair_briefs/ — Per-issue technical guides for coding agents
Screenshots — Evidence images referenced in reports

Core Principle

This system never inspects source code. All evaluation happens through the observable user experience — the same surface real users see.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preflight_qa-0.1.0.tar.gz (129.1 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

preflight_qa-0.1.0-py3-none-any.whl (149.9 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file preflight_qa-0.1.0.tar.gz.

File metadata

Download URL: preflight_qa-0.1.0.tar.gz
Upload date: Mar 15, 2026
Size: 129.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for preflight_qa-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4d8cb07a6104cf8af95aa3c3824e312ec842b1f02cce4993a9e6b448d35654d5`
MD5	`1607853e0242cbb6171b7dd8858ba33a`
BLAKE2b-256	`94f9ab1be3e589e341ca1b395206bc39204f2c7de74d66c42bf110cfb1d9ebea`

See more details on using hashes here.

File details

Details for the file preflight_qa-0.1.0-py3-none-any.whl.

File metadata

Download URL: preflight_qa-0.1.0-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 149.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for preflight_qa-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ec22ca86542772d593e04719c6f6a8dec2c7cf1ace3b8ae84f6a152ccb9210e9`
MD5	`25fbff9be1879be125ab70e9221d435a`
BLAKE2b-256	`b32428864db6cda2ea24453d0fc70ec7aa344e32b7b5a237196ce72622cfa410`

See more details on using hashes here.

preflight-qa 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Preflight

What It Does

Quick Start

More Examples

Configuration

Environment Variables

MCP Server (Claude Code / AI Tool Integration)

Setup

Available Tools

Usage in Claude Code

Architecture

Evaluation Pipeline

Key Features

Evidence-Backed Findings

Screenshot Evidence with Captions

Adaptive Performance Budgets

Login/Auth Evaluation

Mobile Responsiveness

Developer Handoff with Fix Options

Repo Visibility Detection

Output

Core Principle

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes