Skip to main content

Detect flaky tests, classify root causes, get fix suggestions

Project description

Flaky Test Autopsy

Detect flaky tests. Classify why. Get a fix.

PyPI version Python versions CI License: MIT

Flaky Test Autopsy Demo


The problem

Flaky tests erode CI trust — teams learn to re-run failures without reading them, and real bugs hide behind habitual retries. Most tools just retry the test; they never tell you why it failed or how often it will keep failing.

What Autopsy does differently

  • Detects which tests are genuinely flaky using Wilson score confidence intervals, not raw pass rates
  • Classifies each flaky test by root cause: ordering dependency, timing race, randomness, or network
  • Suggests fixes — template code snippets plus optional AI-powered analysis via Claude
  • Tracks trends across sessions so you know whether a flaky test is getting worse, improving, or newly introduced

Install

pip install flaky-test-autopsy

Quick start

# Run your suite 10 times with randomised order; score and classify results
autopsy run ./tests --runs 10

# Show scored results from a saved DB
autopsy score ./autopsy_results.db --explain

# Get fix suggestions for all flaky tests
autopsy fix ./autopsy_results.db

# Track trends across multiple sessions
autopsy trend ./autopsy_results.db

Commands

autopsy run <path>

Runs your pytest suite --runs times with a new random seed each time (via pytest-randomly). Results are written to autopsy_results.db in the current directory.

autopsy run ./tests --runs 20 --label "post-refactor"
autopsy run ./tests --runs 5 --fresh        # wipe old data first

After running, prints a scored summary table:

 Test                                     Runs  Pass rate  Flakiness  Severity  Root cause
 tests/test_flaky.py::test_sometimes        20      50.0%      26.4%    MEDIUM  randomness
 tests/test_order.py::test_depends_on_a     20      45.0%      22.3%    MEDIUM  ordering
 tests/test_stable.py::test_always_passes   20     100.0%       0.0%      NONE  —

autopsy score <db_path>

Re-score results from an existing DB without re-running tests.

autopsy score ./autopsy_results.db
autopsy score ./autopsy_results.db --explain        # show evidence bullets
autopsy score ./autopsy_results.db --all            # include stable tests
autopsy score ./autopsy_results.db --threshold 0.1  # stricter threshold
autopsy score ./autopsy_results.db --json           # machine-readable output

autopsy fix <db_path>

Generate fix suggestions for every flaky test.

autopsy fix ./autopsy_results.db
autopsy fix ./autopsy_results.db --ai              # Claude-powered analysis
autopsy fix ./autopsy_results.db --output fixes.md # write Markdown report
autopsy fix ./autopsy_results.db --ai --model claude-sonnet-4-6

The AI model is also configurable via the AUTOPSY_AI_MODEL environment variable (default: claude-opus-4-7).

autopsy trend <db_path>

Compare flakiness across sessions and detect regressions.

autopsy trend ./autopsy_results.db
autopsy trend ./autopsy_results.db --regressions-only
autopsy trend ./autopsy_results.db --output trend_report.md

autopsy ci <path>

Composite command designed for CI pipelines. Runs the suite, scores results, compares against a baseline, and exits 0 (clean), 1 (regressions detected), or 2 (error).

autopsy ci ./tests --runs 5 --baseline ./baseline/autopsy_results.db --output report.md
autopsy ci ./tests --runs 5 --json-output report.json   # for downstream tooling

autopsy dashboard <db_path>

Serve a local web dashboard with summary cards, a Chart.js flakiness trend chart, and a sortable, filterable test table.

autopsy dashboard ./autopsy_results.db
autopsy dashboard ./autopsy_results.db --port 9000 --no-browser

Dashboard

autopsy init-ci

Generate a .github/workflows/flaky-tests.yml workflow that runs on push, pull request, and a nightly cron schedule.

autopsy init-ci --runs 10 --schedule "0 3 * * *"

CI Integration

Use autopsy ci on every PR to catch regressions before merge. The workflow below saves the baseline DB as an artifact on main pushes, then downloads and compares on PRs.

name: Flaky Test Detection

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 2 * * *'

jobs:
  flaky-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install flaky-test-autopsy
          pip install -r requirements.txt

      - name: Download baseline DB (if exists)
        uses: actions/download-artifact@v4
        with:
          name: autopsy-baseline
          path: ./baseline
        continue-on-error: true

      - name: Run flaky test detection
        run: |
          autopsy ci . --runs 10 \
            --baseline ./baseline/autopsy_results.db \
            --output autopsy_ci_report.md

      - name: Upload results as artifact
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: autopsy-baseline
          path: autopsy_results.db

      - name: Upload CI report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: autopsy-report
          path: autopsy_ci_report.md

Root cause categories

Category What it means
ordering Test relies on execution order — passes alone, fails when another test runs first
timing Race condition or brittle sleep/timeout that fails under load or slow CI
randomness Unseeded random, uuid, or hash seed causing non-deterministic behavior
network Test hits a real endpoint or DNS; fails when the network is slow or unavailable

How the scoring works

Autopsy uses the Wilson score lower bound (95% confidence) rather than raw failure rate. A test that failed 1 time in 5 runs might just be bad luck; Wilson score accounts for sample size and returns a conservative lower bound on the true failure rate. flakiness_score is this lower bound — a test is considered flaky when it exceeds 0.05 (5%). Severity bands: low ≤ 10%, medium ≤ 30%, high ≤ 60%, critical > 60%. This approach eliminates false positives from small sample sizes and gives you a number you can track over time.


Contributing

See CONTRIBUTING.md for setup instructions, how to add a new root cause classifier, and PR requirements.


License

MIT — see LICENSE.


Releasing a new version

  1. Bump version in pyproject.toml
  2. Add entry to CHANGELOG.md
  3. git tag v0.1.0 && git push --tags
  4. Create a GitHub Release — PyPI publish triggers automatically

Roadmap

  • VS Code extension
  • GitLab CI native integration
  • JavaScript/Jest support
  • Flakiness heatmap by file/module
  • Slack/Discord notifications for regressions
  • autopsy watch — continuous monitoring mode

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flaky_test_autopsy-0.3.0.tar.gz (949.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flaky_test_autopsy-0.3.0-py3-none-any.whl (40.3 kB view details)

Uploaded Python 3

File details

Details for the file flaky_test_autopsy-0.3.0.tar.gz.

File metadata

  • Download URL: flaky_test_autopsy-0.3.0.tar.gz
  • Upload date:
  • Size: 949.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flaky_test_autopsy-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fd270e143af7202df467936e9ebf83f193fbc7e49915fb93a9485cab35d9b406
MD5 da7a5cefac0bd44f5823a2136ebcab31
BLAKE2b-256 474ab6d11edee8f93d49d88bafc60bf48f01fe8b1008ca3c9ab2326022d72af6

See more details on using hashes here.

Provenance

The following attestation bundles were made for flaky_test_autopsy-0.3.0.tar.gz:

Publisher: publish.yml on PranavOaR/flaky

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flaky_test_autopsy-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for flaky_test_autopsy-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6266bcd705719b7eafc7db4543289a5c674354b251f3a1352bc377f5f43c4a4d
MD5 54caf30c6fea21de5418611d1ef781cf
BLAKE2b-256 e7f649dc3cb13735e51faf93d41096b56948211082aefa8d34b2e45426599f4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for flaky_test_autopsy-0.3.0-py3-none-any.whl:

Publisher: publish.yml on PranavOaR/flaky

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page