A pytest plugin for testing stochastic systems like LLMs, providing statistical confidence through multiple test runs.
Project description
sik-stochastic-tests
A pytest plugin for testing non-deterministic systems such as LLMs, running tests multiple times with configurable sample sizes and success thresholds to establish reliable results despite occasional random failures.
Overview
When testing non-deterministic systems such as large language models, traditional pass/fail testing is problematic because of sporadic errors or response inconsistencies. This plugin allows you to run tests multiple times and determine success based on a threshold, ensuring your tests are reliable even with occasional random failures.
Features
- Run tests multiple times with a single decorator
- Set success thresholds to allow for occasional failures
- Batch test execution for performance optimization
- Retry capability for flaky tests
- Timeout control for long-running tests
- Detailed reporting of success rates and failure patterns
- Full support for both synchronous and asynchronous tests
Installation
pip install sik-stochastic-tests
Or with uv:
uv add sik-stochastic-tests
Usage
Basic Usage
Mark any test with the stochastic decorator to run it multiple times:
import pytest
@pytest.mark.stochastic(samples=5) # Run 5 times
def test_llm_response():
response = my_llm.generate("What is the capital of France?")
assert 'paris' in response.lower()
Async Tests
For asynchronous tests, you must use both the @pytest.mark.asyncio and @pytest.mark.stochastic decorators:
import pytest
@pytest.mark.asyncio # Required for async tests
@pytest.mark.stochastic(samples=5)
async def test_async_llm_response():
response = await my_async_llm.generate("What is the capital of France?")
assert 'paris' in response.lower()
Important: The
@pytest.mark.asynciodecorator is required for async tests. Make sure you havepytest-asyncioinstalled.
Setting a Success Threshold
You can specify what percentage of runs must succeed for the test to pass:
@pytest.mark.stochastic(samples=10, threshold=0.8) # 80% must pass
def test_with_threshold():
# This test will pass if at least 8 out of 10 runs succeed
result = random_function()
assert result > 0
Retrying Flaky Tests
Specify which exceptions should trigger retries:
@pytest.mark.stochastic(
samples=5,
retry_on=[ConnectionError, TimeoutError],
max_retries=3
)
def test_with_retry():
response = api_call() # Might occasionally fail with connection issues
assert response.status_code == 200
Handling Timeouts
Set a timeout for long-running tests:
# For synchronous tests
@pytest.mark.stochastic(samples=3, timeout=5.0) # 5 second timeout
def test_with_timeout():
result = long_running_operation()
assert result.is_valid
# For asynchronous tests
@pytest.mark.asyncio
@pytest.mark.stochastic(samples=3, timeout=5.0) # 5 second timeout
async def test_async_with_timeout():
result = await async_long_running_operation()
assert result.is_valid
Batch Processing
Control concurrency with batch processing:
# Works for both sync and async tests
@pytest.mark.stochastic(samples=20, batch_size=5) # Run 5 at a time
def test_with_batching():
result = my_operation()
assert result.success
@pytest.mark.asyncio
@pytest.mark.stochastic(samples=20, batch_size=5) # Run 5 at a time
async def test_async_with_batching():
result = await async_operation()
assert result.success
Disabling Stochastic Mode
Temporarily disable stochastic behavior with a command-line flag:
pytest --disable-stochastic
This will run each test only once, ignoring the stochastic parameters for both sync and async tests.
Advanced Examples
Testing LLM Outputs
@pytest.mark.stochastic(samples=10, threshold=0.7)
def test_llm_instruction_following():
prompt = "Write a haiku about programming"
response = llm.generate(prompt)
# Test passes if at least 70% of responses contain all these criteria
assert len(response.split("\n")) == 3, "Should have 3 lines"
assert "program" in response.lower(), "Should mention programming"
# Count syllables (simplified)
lines = response.split("\n")
syllable_counts = [count_syllables(line) for line in lines]
assert syllable_counts == [5, 7, 5], f"Should follow 5-7-5 pattern, got {syllable_counts}"
Testing with External APIs (Async)
@pytest.mark.asyncio # Required for async tests
@pytest.mark.stochastic(
samples=5,
threshold=0.8,
retry_on=[requests.exceptions.RequestException],
max_retries=3,
timeout=10.0
)
async def test_weather_api():
response = await fetch_weather("New York")
# Basic schema validation
assert "temperature" in response
assert "humidity" in response
assert "wind_speed" in response
# Reasonable values check
assert -50 <= response["temperature"] <= 50 # Celsius
assert 0 <= response["humidity"] <= 100 # Percentage
Combining Multiple Features
# Synchronous example with multiple features
@pytest.mark.stochastic(
samples=20,
threshold=0.9,
batch_size=5,
retry_on=[ConnectionError, TimeoutError],
max_retries=2,
timeout=3.0
)
def test_complex_scenario():
# This test will:
# - Run 20 times total
# - Run 5 at a time (batched)
# - Pass if at least 18 runs succeed (90% threshold)
# - Retry on connection or timeout errors (up to 2 retries)
# - Timeout after 3 seconds for each run
result = complex_operation()
assert result.is_successful
# Asynchronous equivalent
@pytest.mark.asyncio
@pytest.mark.stochastic(
samples=20,
threshold=0.9,
batch_size=5,
retry_on=[ConnectionError, TimeoutError],
max_retries=2,
timeout=3.0
)
async def test_async_complex_scenario():
result = await async_complex_operation()
assert result.is_successful
Test Result Output
The plugin provides detailed test results in the console for both synchronous and asynchronous tests:
=========================== Stochastic Test Results ===========================
test_llm.py::test_llm_instruction_following:
Success rate: 0.80
Runs: 10, Successes: 8, Failures: 2
Failure samples:
1. AssertionError: Should follow 5-7-5 pattern, got [4, 6, 5]
2. AssertionError: Should mention programming
test_async.py::test_async_weather_api:
Success rate: 0.60
Runs: 5, Successes: 3, Failures: 2
Failure samples:
1. TimeoutError: Test timed out after 10.0 seconds
2. AssertionError: 'temperature' not in response
Requirements for Async Tests
-
pytest-asynciomust be installed:pip install pytest-asyncio
-
Each async test must have both decorators:
@pytest.mark.asyncio @pytest.mark.stochastic(...) async def test_something(): ...
-
For pytest.ini configuration, you might want to set:
[pytest] asyncio_mode = auto
Configuration
You can configure default behavior in your pyproject.toml or pytest.ini file:
[tool.pytest.ini_options]
# Set option to exclude stochastic tests from certain environments
addopts = "--disable-stochastic"
Compatibility
- Python 3.11+
- pytest 8.0+
- pytest-asyncio 0.21+ (for async tests)
Troubleshooting
Async Tests Not Running Multiple Times
If your async tests are only running once instead of the specified number of samples:
- Check that you have both
@pytest.mark.asyncioand@pytest.mark.stochasticdecorators - Ensure pytest-asyncio is installed
- Check that
--disable-stochasticflag is not present
Timeout Not Working for Async Tests
Timeouts for async tests are implemented using asyncio.wait_for(). If your timeouts aren't working:
- Ensure you're using a recent version of this plugin
- Make sure your test is actually running asynchronously
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sik_stochastic_tests-0.0.6.tar.gz.
File metadata
- Download URL: sik_stochastic_tests-0.0.6.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
617de6a1971fd57a0b446f731a3f49242e1a11c825e5b5e1549f9a3d28a032f9
|
|
| MD5 |
a77f09d85c053f10c0041f3ed2be7fc0
|
|
| BLAKE2b-256 |
f10c27f542bd96456f4438c251cf8988b00fd95d338aa6e69ab095b51bfae8eb
|
File details
Details for the file sik_stochastic_tests-0.0.6-py3-none-any.whl.
File metadata
- Download URL: sik_stochastic_tests-0.0.6-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08a8b202fb6b31ef03fab4df834e9ba169eb45e3f9f50cc2a3c0722c8094cc3c
|
|
| MD5 |
95541d2febd2c180709f75d9cfa41cb3
|
|
| BLAKE2b-256 |
547f8d310cf023dd00786a57367a3857a7dd7a8ed55edd39a722febf1bf8c88f
|