Pytest plugin for running non-deterministic LLM tests with automatic retry and beautiful reports

These details have not been verified by PyPI

Project links

Project description

llm-flaky

Pytest plugin for running non-deterministic LLM tests.

llm-flaky report

LLM tests are inherently non-deterministic due to the probabilistic nature of language models. This plugin handles flakiness by automatically retrying tests and requiring an 80% pass rate (4/5 by default).

Features

Auto-marking: Automatically applies @pytest.mark.flaky to tests with @pytest.mark.llm
80% accuracy default: Tests pass if 4 out of 5 runs succeed (configurable)
Beautiful reports: Replaces standard flaky output with a formatted table
Environment variable support: Use FLAKY_MAX_RUNS to control retries
pytest-xdist compatible: Works correctly with parallel test execution

Installation

pip install llm-flaky

Usage

Mark your LLM tests with @pytest.mark.llm:

import pytest

@pytest.mark.llm
async def test_llm_response():
    response = await call_llm("What is 2+2?")
    assert "4" in response

The plugin automatically applies flaky retry logic. No additional code needed!

Example output

══════════════════════════════════════════════════════════════════════════════
 LLM TESTS SUMMARY
══════════════════════════════════════════════════════════════════════════════

 Test                                                     Passed       Result
 ────────────────────────────────────────────────────────────────────────────
 test_llm_response_quality                                 4 / 4     ✓ PASSED
 test_llm_context_handling[short]                          4 / 4     ✓ PASSED
 test_llm_context_handling[long]                           3 / 4     ✓ PASSED

 ✗ FAILED TESTS:
 ────────────────────────────────────────────────────────────────────────────
 test_llm_edge_case                                        2 / 4     ✗ FAILED
 ────────────────────────────────────────────────────────────────────────────
 ⚠ Total                                                   3 / 4       75.0%
══════════════════════════════════════════════════════════════════════════════

Configuration

Environment variables

FLAKY_MAX_RUNS=3 pytest  # Run each test up to 3 times (min_passes=2)

Command line options

pytest --llm-flaky-max-runs=5           # Max runs for LLM tests (default: 5)
pytest --llm-flaky-min-passes=4         # Min passes required (default: max_runs - 1)
pytest --llm-flaky-exclude-marker=skip  # Marker to exclude from flaky
pytest --llm-flaky-title="My Title"     # Custom report title
pytest --no-llm-flaky-report            # Disable beautiful report

pytest.ini options

[pytest]
llm_flaky_max_runs = 5
llm_flaky_min_passes = 4
llm_flaky_exclude_marker = langsmith_dataset
llm_flaky_title = LLM TESTS SUMMARY

Priority

Configuration is read in this order (highest priority first):

FLAKY_MAX_RUNS environment variable
Command line options (--llm-flaky-*)
pytest.ini options (llm_flaky_*)
Defaults (max_runs=5, min_passes=4)

How it works

Collection phase: Plugin finds all tests with @pytest.mark.llm
Auto-marking: Applies @pytest.mark.flaky(max_runs=5, min_passes=4)
Execution: pytest-flaky handles retry logic
Reporting: Beautiful summary table replaces standard output

Excluding tests

Tests with @langsmith_dataset marker are excluded by default (they use LangSmith's built-in evaluation):

@pytest.mark.llm
@langsmith_dataset("my_dataset.yaml")
async def test_with_langsmith():
    # This test won't get flaky retry - LangSmith handles evaluation
    pass

Requirements

Python >= 3.9
pytest >= 7.0.0
flaky >= 3.7.0

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Dec 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_flaky-0.1.0.tar.gz (7.6 kB view details)

Uploaded Dec 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_flaky-0.1.0-py3-none-any.whl (8.5 kB view details)

Uploaded Dec 13, 2025 Python 3

File details

Details for the file llm_flaky-0.1.0.tar.gz.

File metadata

Download URL: llm_flaky-0.1.0.tar.gz
Upload date: Dec 13, 2025
Size: 7.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for llm_flaky-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9269a4a269fd83976a645ad9a6c350979473313faace583a9c6edac15f7233eb`
MD5	`6c50e47ce23e8dfbcdae809b8f6c5485`
BLAKE2b-256	`7e9d18a938805e90fa3de06b5a03108cfd919cdff4a80c0f01b6bf36d927e824`

See more details on using hashes here.

File details

Details for the file llm_flaky-0.1.0-py3-none-any.whl.

File metadata

Download URL: llm_flaky-0.1.0-py3-none-any.whl
Upload date: Dec 13, 2025
Size: 8.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for llm_flaky-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70e612d1405a44af48f6d4c8d9b9b3dc2f2ecb88c0cd4d9bfa0ca13b6d9b468f`
MD5	`cc6f98c644a72190f34874aaaa30da73`
BLAKE2b-256	`aa21e84654cbd9aa5656ee7d030c4ffb40b34b5c642a588c6b23045951bb6c5d`

See more details on using hashes here.

llm-flaky 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-flaky

Features

Installation

Usage

Example output

Configuration

Environment variables

Command line options

pytest.ini options

Priority

How it works

Excluding tests

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes