Re-run tests N times, visualize failure heatmaps, and get AI root cause hypotheses

These details have not been verified by PyPI

Project links

Project description

pytest-flakehunter 🎯

Re-run tests N times, visualize failure heatmaps, and get AI-powered root cause hypotheses.

Flaky tests are expensive. They erode trust in CI, waste engineer time, and mask real bugs. pytest-flakehunter helps you stop guessing and start diagnosing.

What it does

Run any test suite with --fh and get a self-contained HTML report with:

Flake rate badges per test — instantly see your worst offenders
Duration scatter plots — spot timeout-related flakes vs. race conditions at a glance
Failure heatmaps — shows exactly which line of code fails, and in which run attempts
Stack trace clustering — groups failures by fingerprint so you know if it's one bug or three
AI root cause hypotheses — Claude analyzes the clusters and tells you why it's probably flaking

FlakeHunter report screenshot

Installation

pip install pytest-flakehunter

Usage

If you use uv, pass your .env file explicitly:

uv run --env-file=.env pytest tests/ --fh --fh-ai

# Basic: run each test 10 times, generate report
pytest tests/ --fh

# Custom run count
pytest tests/ --fh --fh-runs=20

# With AI analysis (requires ANTHROPIC_API_KEY env var)
pytest tests/ --fh --fh-runs=15 --fh-ai

# Custom report path
pytest tests/ --fh --fh-report=reports/flake_$(date +%Y%m%d).html

# Full fixture isolation between attempts (slower but catches session-state flakes)
pytest tests/ --fh --fh-isolate

CLI options

Option	Default	Description
`--fh`	off	Enable flake hunter mode
`--fh-runs N`	10	Re-run count per test
`--fh-report PATH`	`flakehunter_report.html`	Output report path
`--fh-ai`	off	AI root cause analysis via Claude
`--fh-history-dir PATH`	`.flakehunter/history`	Directory for persistent run history
`--fh-no-history`	off	Skip writing to history this run
`--fh-isolate`	off	Full fixture teardown between every attempt (see below)

Reading the report

Duration scatter

Each dot is one run attempt. Red = failed, green = passed.

Failures clustering at a consistent high duration → likely a timeout
Failures scattered at random durations → likely a race condition or state issue
Single outlier failure at low duration → likely environment or setup flake

Failure commonality heatmap

A multi-dimensional view of what your failures have in common.

Current run — columns are failed attempts, rows are:

Duration — was this attempt slow, medium, or fast relative to the run? Helps spot timeout-related flakes.
Traceback — which frames appeared in the stack trace, and in what % of failures.

Historical (when history is enabled) — columns are past runs, rows are:

Environment — which host had failures, and how often.
Branch — which git branch failures occurred on.
Date — which week failures clustered in.

Cell brightness = how common that value is across failures. Bright = strong pattern.

Stack trace clusters

Failures are grouped by a fingerprint of their innermost stack frames. Multiple clusters = multiple distinct failure modes (e.g., two different bugs). One cluster = one root cause, just intermittently triggered.

Historical tracking

By default, flakehunter appends each run's results to per-test CSV files in .flakehunter/history/. These accumulate across runs, giving the report progressively richer data:

Flake rate trend — see if a test is getting more or less flaky over time
Environment breakdown — which CI host/branch has the most failures
Argument correlation — for parametrized tests, which parameter combos fail most

To opt out for a single run:

pytest tests/ --fh --fh-no-history

To store history in a non-default location (e.g., shared network path):

pytest tests/ --fh --fh-history-dir=.cache/flakehunter

The CSV files are safe to commit — they're small, append-only, and give you a free audit trail.

Fixture isolation

By default, flakehunter preserves session- and module-scoped fixtures across re-runs. This means a Playwright browser instance, database connection pool, or asyncio event loop is only created once per test — making repeated runs fast.

When to use --fh-isolate:

pytest tests/ --fh --fh-isolate

This passes a full teardown signal after every attempt, so every re-run starts with a completely clean fixture state. It is slower but will surface a class of flakiness that the default mode can miss — things like:

Connection pools that degrade after a certain number of operations
Session-scoped caches or queues that accumulate state
Event loop pollution from uncancelled async tasks

If your tests are fast and stateless, the default is fine. If you're seeing flakiness that only appears after many runs in a long session, try --fh-isolate.

AI analysis

Enable with --fh-ai. Requires an ANTHROPIC_API_KEY — set it using whichever method fits your workflow:

Shell / CI environment variable (recommended for CI):

# Linux / macOS
export ANTHROPIC_API_KEY=sk-ant-...
pytest tests/ --fh --fh-ai

# Windows CMD
set ANTHROPIC_API_KEY=sk-ant-...
pytest tests/ --fh --fh-ai

# Windows PowerShell
$env:ANTHROPIC_API_KEY = "sk-ant-..."
pytest tests/ --fh --fh-ai

.env file (recommended for local development — never commit this file):

# .env
ANTHROPIC_API_KEY=sk-ant-...

If you use uv, pass the file explicitly:

uv run --env-file=.env pytest --fh --fh-ai

Note: Do not put your API key in pytest.ini or pyproject.toml — those files are typically committed to version control.

⚡ AI Root Cause Hypothesis

This test fails most often in setup_db_connection() after ~2.3s, which is
consistent with a connection pool exhaustion pattern under parallel test
execution. Consider increasing the test pool size or adding explicit
teardown that returns connections to the pool between retry attempts.

The AI receives the clustered stack traces, failure rates, and timing data — not raw code — so it works on any codebase without needing source access.

Architecture

pytest_flakehunter/
├── plugin.py       # pytest hooks: --fh option, re-run protocol, xdist support
├── reporter.py     # Self-contained HTML report with inline SVG charts
├── history.py      # Append-only CSV history per test, cross-run trend analysis
└── ai_analysis.py  # Anthropic API integration for root cause hypotheses

No external dependencies beyond pytest itself. The HTML report is fully self-contained (no CDN calls, no JS frameworks) — safe to commit to repos or attach to Jira tickets.

Contributing

git clone https://github.com/jkasser/pytest-flakehunter
cd pytest-flakehunter
pip install -e ".[dev]"
pytest demo/sample_tests.py --fh --fh-runs=15

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_flakehunter-1.0.1.tar.gz (24.2 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_flakehunter-1.0.1-py3-none-any.whl (27.3 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file pytest_flakehunter-1.0.1.tar.gz.

File metadata

Download URL: pytest_flakehunter-1.0.1.tar.gz
Upload date: Apr 7, 2026
Size: 24.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pytest_flakehunter-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`2d2a8933199c0357a71a1e174c010e7a1776567452fdf7ef4c12608670b30339`
MD5	`908dc97eebd39238d380b59060549fb0`
BLAKE2b-256	`7628ce9633dbeda5958ba8e95ecb9ade3bdcf2575415bd4aa9d9f17158b701fa`

See more details on using hashes here.

File details

Details for the file pytest_flakehunter-1.0.1-py3-none-any.whl.

File metadata

Download URL: pytest_flakehunter-1.0.1-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 27.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pytest_flakehunter-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`47f8e03be6cc89d46fb86c6b02c4f707407c7452234b4997cb3360017b775731`
MD5	`968e5491f5e4eea27bdab1b21a7b8bfc`
BLAKE2b-256	`3b54a63007eaf5645a11804cafc44ce73da76a1bb1442580df2fc2a083003472`

See more details on using hashes here.

pytest-flakehunter 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pytest-flakehunter 🎯

What it does

Installation

Usage

CLI options

Reading the report

Duration scatter

Failure commonality heatmap

Stack trace clusters

Historical tracking

Fixture isolation

AI analysis

Architecture

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes