Pytest plugin for running non-deterministic LLM tests with automatic retry and beautiful reports
Project description
llm-flaky
Pytest plugin for running non-deterministic LLM tests.
LLM tests are inherently non-deterministic due to the probabilistic nature of language models. This plugin handles flakiness by automatically retrying tests and requiring an 80% pass rate (4/5 by default).
Features
- Auto-marking: Automatically applies
@pytest.mark.flakyto tests with@pytest.mark.llm - 80% accuracy default: Tests pass if 4 out of 5 runs succeed (configurable)
- Beautiful reports: Replaces standard flaky output with a formatted table
- Environment variable support: Use
FLAKY_MAX_RUNSto control retries - pytest-xdist compatible: Works correctly with parallel test execution
Installation
pip install llm-flaky
Usage
Mark your LLM tests with @pytest.mark.llm:
import pytest
@pytest.mark.llm
async def test_llm_response():
response = await call_llm("What is 2+2?")
assert "4" in response
The plugin automatically applies flaky retry logic. No additional code needed!
Example output
══════════════════════════════════════════════════════════════════════════════
LLM TESTS SUMMARY
══════════════════════════════════════════════════════════════════════════════
Test Passed Result
────────────────────────────────────────────────────────────────────────────
test_llm_response_quality 4 / 4 ✓ PASSED
test_llm_context_handling[short] 4 / 4 ✓ PASSED
test_llm_context_handling[long] 3 / 4 ✓ PASSED
✗ FAILED TESTS:
────────────────────────────────────────────────────────────────────────────
test_llm_edge_case 2 / 4 ✗ FAILED
────────────────────────────────────────────────────────────────────────────
⚠ Total 3 / 4 75.0%
══════════════════════════════════════════════════════════════════════════════
Configuration
Environment variables
FLAKY_MAX_RUNS=3 pytest # Run each test up to 3 times (min_passes=2)
Command line options
pytest --llm-flaky-max-runs=5 # Max runs for LLM tests (default: 5)
pytest --llm-flaky-min-passes=4 # Min passes required (default: max_runs - 1)
pytest --llm-flaky-exclude-marker=skip # Marker to exclude from flaky
pytest --llm-flaky-title="My Title" # Custom report title
pytest --no-llm-flaky-report # Disable beautiful report
pytest.ini options
[pytest]
llm_flaky_max_runs = 5
llm_flaky_min_passes = 4
llm_flaky_exclude_marker = langsmith_dataset
llm_flaky_title = LLM TESTS SUMMARY
Priority
Configuration is read in this order (highest priority first):
FLAKY_MAX_RUNSenvironment variable- Command line options (
--llm-flaky-*) - pytest.ini options (
llm_flaky_*) - Defaults (max_runs=5, min_passes=4)
How it works
- Collection phase: Plugin finds all tests with
@pytest.mark.llm - Auto-marking: Applies
@pytest.mark.flaky(max_runs=5, min_passes=4) - Execution: pytest-flaky handles retry logic
- Reporting: Beautiful summary table replaces standard output
Excluding tests
Tests with @langsmith_dataset marker are excluded by default (they use LangSmith's built-in evaluation):
@pytest.mark.llm
@langsmith_dataset("my_dataset.yaml")
async def test_with_langsmith():
# This test won't get flaky retry - LangSmith handles evaluation
pass
Requirements
- Python >= 3.9
- pytest >= 7.0.0
- flaky >= 3.7.0
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_flaky-0.1.0.tar.gz.
File metadata
- Download URL: llm_flaky-0.1.0.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9269a4a269fd83976a645ad9a6c350979473313faace583a9c6edac15f7233eb
|
|
| MD5 |
6c50e47ce23e8dfbcdae809b8f6c5485
|
|
| BLAKE2b-256 |
7e9d18a938805e90fa3de06b5a03108cfd919cdff4a80c0f01b6bf36d927e824
|
File details
Details for the file llm_flaky-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_flaky-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70e612d1405a44af48f6d4c8d9b9b3dc2f2ecb88c0cd4d9bfa0ca13b6d9b468f
|
|
| MD5 |
cc6f98c644a72190f34874aaaa30da73
|
|
| BLAKE2b-256 |
aa21e84654cbd9aa5656ee7d030c4ffb40b34b5c642a588c6b23045951bb6c5d
|