Detect flaky tests, classify root causes, get fix suggestions

These details have not been verified by PyPI

Project description

Flaky Test Autopsy

Detect flaky tests. Classify why. Get a fix.

Flaky Test Autopsy Demo

The problem

Flaky tests erode CI trust — teams learn to re-run failures without reading them, and real bugs hide behind habitual retries. Most tools just retry the test; they never tell you why it failed or how often it will keep failing.

What Autopsy does differently

Detects which tests are genuinely flaky using Wilson score confidence intervals, not raw pass rates
Classifies each flaky test by root cause: ordering dependency, timing race, randomness, or network
Suggests fixes — template code snippets plus optional AI-powered analysis via Claude
Tracks trends across sessions so you know whether a flaky test is getting worse, improving, or newly introduced

Install

pip install flaky-test-autopsy

Quick start

# Run your suite 10 times with randomised order; score and classify results
autopsy run ./tests --runs 10

# Show scored results from a saved DB
autopsy score ./autopsy_results.db --explain

# Get fix suggestions for all flaky tests
autopsy fix ./autopsy_results.db

# Track trends across multiple sessions
autopsy trend ./autopsy_results.db

Commands

`autopsy run <path>`

Runs your pytest suite --runs times with a new random seed each time (via pytest-randomly). Results are written to autopsy_results.db in the current directory.

autopsy run ./tests --runs 20 --label "post-refactor"
autopsy run ./tests --runs 5 --fresh        # wipe old data first

After running, prints a scored summary table:

 Test                                     Runs  Pass rate  Flakiness  Severity  Root cause
 tests/test_flaky.py::test_sometimes        20      50.0%      26.4%    MEDIUM  randomness
 tests/test_order.py::test_depends_on_a     20      45.0%      22.3%    MEDIUM  ordering
 tests/test_stable.py::test_always_passes   20     100.0%       0.0%      NONE  —

`autopsy score <db_path>`

Re-score results from an existing DB without re-running tests.

autopsy score ./autopsy_results.db
autopsy score ./autopsy_results.db --explain        # show evidence bullets
autopsy score ./autopsy_results.db --all            # include stable tests
autopsy score ./autopsy_results.db --threshold 0.1  # stricter threshold
autopsy score ./autopsy_results.db --json           # machine-readable output

`autopsy fix <db_path>`

Generate fix suggestions for every flaky test.

autopsy fix ./autopsy_results.db
autopsy fix ./autopsy_results.db --ai              # Claude-powered analysis
autopsy fix ./autopsy_results.db --output fixes.md # write Markdown report
autopsy fix ./autopsy_results.db --ai --model claude-sonnet-4-6

The AI model is also configurable via the AUTOPSY_AI_MODEL environment variable (default: claude-opus-4-7).

`autopsy trend <db_path>`

Compare flakiness across sessions and detect regressions.

autopsy trend ./autopsy_results.db
autopsy trend ./autopsy_results.db --regressions-only
autopsy trend ./autopsy_results.db --output trend_report.md

`autopsy ci <path>`

Composite command designed for CI pipelines. Runs the suite, scores results, compares against a baseline, and exits 0 (clean), 1 (regressions detected), or 2 (error).

autopsy ci ./tests --runs 5 --baseline ./baseline/autopsy_results.db --output report.md
autopsy ci ./tests --runs 5 --json-output report.json   # for downstream tooling

`autopsy dashboard <db_path>`

Serve a local web dashboard with summary cards, a Chart.js flakiness trend chart, and a sortable, filterable test table.

autopsy dashboard ./autopsy_results.db
autopsy dashboard ./autopsy_results.db --port 9000 --no-browser

Dashboard

`autopsy init-ci`

Generate a .github/workflows/flaky-tests.yml workflow that runs on push, pull request, and a nightly cron schedule.

autopsy init-ci --runs 10 --schedule "0 3 * * *"

CI Integration

Use autopsy ci on every PR to catch regressions before merge. The workflow below saves the baseline DB as an artifact on main pushes, then downloads and compares on PRs.

name: Flaky Test Detection

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 2 * * *'

jobs:
  flaky-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install flaky-test-autopsy
          pip install -r requirements.txt

      - name: Download baseline DB (if exists)
        uses: actions/download-artifact@v4
        with:
          name: autopsy-baseline
          path: ./baseline
        continue-on-error: true

      - name: Run flaky test detection
        run: |
          autopsy ci . --runs 10 \
            --baseline ./baseline/autopsy_results.db \
            --output autopsy_ci_report.md

      - name: Upload results as artifact
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: autopsy-baseline
          path: autopsy_results.db

      - name: Upload CI report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: autopsy-report
          path: autopsy_ci_report.md

Root cause categories

Category	What it means
`ordering`	Test relies on execution order — passes alone, fails when another test runs first
`timing`	Race condition or brittle sleep/timeout that fails under load or slow CI
`randomness`	Unseeded `random`, `uuid`, or hash seed causing non-deterministic behavior
`network`	Test hits a real endpoint or DNS; fails when the network is slow or unavailable

How the scoring works

Autopsy uses the Wilson score lower bound (95% confidence) rather than raw failure rate. A test that failed 1 time in 5 runs might just be bad luck; Wilson score accounts for sample size and returns a conservative lower bound on the true failure rate. flakiness_score is this lower bound — a test is considered flaky when it exceeds 0.05 (5%). Severity bands: low ≤ 10%, medium ≤ 30%, high ≤ 60%, critical > 60%. This approach eliminates false positives from small sample sizes and gives you a number you can track over time.

Contributing

See CONTRIBUTING.md for setup instructions, how to add a new root cause classifier, and PR requirements.

License

MIT — see LICENSE.

Releasing a new version

Bump version in pyproject.toml
Add entry to CHANGELOG.md
git tag v0.1.0 && git push --tags
Create a GitHub Release — PyPI publish triggers automatically

Roadmap

VS Code extension
GitLab CI native integration
JavaScript/Jest support
Flakiness heatmap by file/module
Slack/Discord notifications for regressions
autopsy watch — continuous monitoring mode

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

May 2, 2026

0.2.0

May 1, 2026

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flaky_test_autopsy-0.3.0.tar.gz (949.1 kB view details)

Uploaded May 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flaky_test_autopsy-0.3.0-py3-none-any.whl (40.3 kB view details)

Uploaded May 2, 2026 Python 3

File details

Details for the file flaky_test_autopsy-0.3.0.tar.gz.

File metadata

Download URL: flaky_test_autopsy-0.3.0.tar.gz
Upload date: May 2, 2026
Size: 949.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flaky_test_autopsy-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`fd270e143af7202df467936e9ebf83f193fbc7e49915fb93a9485cab35d9b406`
MD5	`da7a5cefac0bd44f5823a2136ebcab31`
BLAKE2b-256	`474ab6d11edee8f93d49d88bafc60bf48f01fe8b1008ca3c9ab2326022d72af6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for flaky_test_autopsy-0.3.0.tar.gz:

Publisher: publish.yml on PranavOaR/flaky

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: flaky_test_autopsy-0.3.0.tar.gz
- Subject digest: fd270e143af7202df467936e9ebf83f193fbc7e49915fb93a9485cab35d9b406
- Sigstore transparency entry: 1428875038
- Sigstore integration time: May 2, 2026
Source repository:
- Permalink: PranavOaR/flaky@f265aa1a0373442fddd1e243811f3842dec77364
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/PranavOaR
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f265aa1a0373442fddd1e243811f3842dec77364
- Trigger Event: release

File details

Details for the file flaky_test_autopsy-0.3.0-py3-none-any.whl.

File metadata

Download URL: flaky_test_autopsy-0.3.0-py3-none-any.whl
Upload date: May 2, 2026
Size: 40.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flaky_test_autopsy-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6266bcd705719b7eafc7db4543289a5c674354b251f3a1352bc377f5f43c4a4d`
MD5	`54caf30c6fea21de5418611d1ef781cf`
BLAKE2b-256	`e7f649dc3cb13735e51faf93d41096b56948211082aefa8d34b2e45426599f4e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for flaky_test_autopsy-0.3.0-py3-none-any.whl:

Publisher: publish.yml on PranavOaR/flaky

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: flaky_test_autopsy-0.3.0-py3-none-any.whl
- Subject digest: 6266bcd705719b7eafc7db4543289a5c674354b251f3a1352bc377f5f43c4a4d
- Sigstore transparency entry: 1428875040
- Sigstore integration time: May 2, 2026
Source repository:
- Permalink: PranavOaR/flaky@f265aa1a0373442fddd1e243811f3842dec77364
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/PranavOaR
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f265aa1a0373442fddd1e243811f3842dec77364
- Trigger Event: release

flaky-test-autopsy 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Flaky Test Autopsy

The problem

What Autopsy does differently

Install

Quick start

Commands

autopsy run <path>

autopsy score <db_path>

autopsy fix <db_path>

autopsy trend <db_path>

autopsy ci <path>

autopsy dashboard <db_path>

autopsy init-ci

CI Integration

Root cause categories

How the scoring works

Contributing

License

Releasing a new version

Roadmap

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`autopsy run <path>`

`autopsy score <db_path>`

`autopsy fix <db_path>`

`autopsy trend <db_path>`

`autopsy ci <path>`

`autopsy dashboard <db_path>`

`autopsy init-ci`