A pytest plugin for testing stochastic systems like LLMs, providing statistical confidence through multiple test runs.

Project description

sik-stochastic-tests

A pytest plugin for testing non-deterministic systems such as LLMs, running tests multiple times with configurable sample sizes and success thresholds to establish reliable results despite occasional random failures.

Overview

When testing non-deterministic systems such as large language models, traditional pass/fail testing is problematic because of sporatic errors or response inconsistencies. This plugin allows you to run tests multiple times and determine success based on a threshold, ensuring your tests are reliable even with occasional random failures.

Features

Run tests multiple times with a single decorator
Set success thresholds to allow for occasional failures
Batch test execution for performance optimization
Retry capability for flaky tests
Timeout control for long-running tests
Detailed reporting of success rates and failure patterns
Support for both synchronous and asynchronous tests

Installation

pip install sik-stochastic-tests

Or with uv:

uv add sik-stochastic-tests

Usage

Basic Usage

Mark any test with the stochastic decorator to run it multiple times:

import pytest

@pytest.mark.stochastic(samples=5)  # Run 5 times
def test_llm_response():
    response = my_llm.generate("What is the capital of France?")
    assert 'paris' in response.lower()

Setting a Success Threshold

You can specify what percentage of runs must succeed for the test to pass:

@pytest.mark.stochastic(samples=10, threshold=0.8)  # 80% must pass
def test_with_threshold():
    # This test will pass if at least 8 out of 10 runs succeed
    result = random_function()
    assert result > 0

Retrying Flaky Tests

Specify which exceptions should trigger retries:

@pytest.mark.stochastic(
    samples=5,
    retry_on=[ConnectionError, TimeoutError],
    max_retries=3
)
def test_with_retry():
    response = api_call()  # Might occasionally fail with connection issues
    assert response.status_code == 200

Handling Timeouts

Set a timeout for long-running tests:

@pytest.mark.stochastic(samples=3, timeout=5.0)  # 5 second timeout
async def test_with_timeout():
    result = await long_running_operation()
    assert result.is_valid

Batch Processing

Control concurrency with batch processing:

@pytest.mark.stochastic(samples=20, batch_size=5)  # Run 5 at a time
async def test_with_batching():
    result = await async_operation()
    assert result.success

Disabling Stochastic Mode

Temporarily disable stochastic behavior with a command-line flag:

pytest --disable-stochastic

Advanced Examples

Testing LLM Outputs

@pytest.mark.stochastic(samples=10, threshold=0.7)
def test_llm_instruction_following():
    prompt = "Write a haiku about programming"
    response = llm.generate(prompt)
    
    # Test passes if at least 70% of responses contain all these criteria
    assert len(response.split("\n")) == 3, "Should have 3 lines"
    assert "program" in response.lower(), "Should mention programming"
    
    # Count syllables (simplified)
    lines = response.split("\n")
    syllable_counts = [count_syllables(line) for line in lines]
    assert syllable_counts == [5, 7, 5], f"Should follow 5-7-5 pattern, got {syllable_counts}"

Testing with External APIs

@pytest.mark.stochastic(
    samples=5, 
    threshold=0.8,
    retry_on=[requests.exceptions.RequestException],
    max_retries=3,
    timeout=10.0
)
async def test_weather_api():
    response = await fetch_weather("New York")
    
    # Basic schema validation
    assert "temperature" in response
    assert "humidity" in response
    assert "wind_speed" in response
    
    # Reasonable values check
    assert -50 <= response["temperature"] <= 50  # Celsius
    assert 0 <= response["humidity"] <= 100      # Percentage

Test Result Output

The plugin provides detailed test results in the console:

=========================== Stochastic Test Results ===========================

test_llm.py::test_llm_instruction_following:
  Success rate: 0.80
  Runs: 10, Successes: 8, Failures: 2
  Failure samples:
    1. AssertionError: Should follow 5-7-5 pattern, got [4, 6, 5]
    2. AssertionError: Should mention programming

Configuration

You can configure default behavior in your pyproject.toml or pytest.ini file:

[tool.pytest.ini_options]
# Set option to exclude stochastic tests from certain environments
addopts = "--disable-stochastic"

Compatibility

Python 3.11+
pytest 8.0+

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

0.1.3

Mar 18, 2025

0.1.2

Mar 18, 2025

0.1.1

Mar 18, 2025

0.1.0

Mar 17, 2025

0.0.30

Mar 17, 2025

0.0.27

Mar 17, 2025

0.0.26

Mar 17, 2025

0.0.25

Mar 17, 2025

0.0.24

Mar 17, 2025

0.0.23

Mar 17, 2025

0.0.20

Mar 16, 2025

0.0.19

Mar 16, 2025

0.0.18

Mar 16, 2025

0.0.17

Mar 16, 2025

0.0.16

Mar 16, 2025

0.0.15

Mar 16, 2025

0.0.14

Mar 16, 2025

0.0.13

Mar 16, 2025

0.0.12

Mar 16, 2025

0.0.11

Mar 16, 2025

0.0.10

Mar 16, 2025

0.0.9

Mar 16, 2025

0.0.8

Mar 16, 2025

0.0.7

Mar 16, 2025

0.0.6

Mar 16, 2025

0.0.5

Mar 15, 2025

0.0.4

Mar 15, 2025

0.0.3

Mar 15, 2025

This version

0.0.2

Mar 15, 2025

0.0.1

Mar 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sik_stochastic_tests-0.0.2.tar.gz (19.3 kB view details)

Uploaded Mar 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sik_stochastic_tests-0.0.2-py3-none-any.whl (10.8 kB view details)

Uploaded Mar 15, 2025 Python 3

File details

Details for the file sik_stochastic_tests-0.0.2.tar.gz.

File metadata

Download URL: sik_stochastic_tests-0.0.2.tar.gz
Upload date: Mar 15, 2025
Size: 19.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.24

File hashes

Hashes for sik_stochastic_tests-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`d12e7ab0b09710a6cc2a41f3784051fdb1dc10fab8fff3958390c1e854aacd88`
MD5	`575e5630c38bacb024ea8b8e395a3a86`
BLAKE2b-256	`9bd2168b653ed12dad376d9ec84d3a740ed42a192956f8c9ee06da42ea78451c`

See more details on using hashes here.

File details

Details for the file sik_stochastic_tests-0.0.2-py3-none-any.whl.

File metadata

Download URL: sik_stochastic_tests-0.0.2-py3-none-any.whl
Upload date: Mar 15, 2025
Size: 10.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.24

File hashes

Hashes for sik_stochastic_tests-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3dff1ca2a6843d9451e08b3535252581cd9accd0d32fa0730eb8433592ea3ba2`
MD5	`0ad1d595521e93e578059328376b7786`
BLAKE2b-256	`77976f87bd3d1674fc04817ac9f724e4e8ca9bd82f73120e540c64b0a0d91e0a`

See more details on using hashes here.

sik-stochastic-tests 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

sik-stochastic-tests

Overview

Features

Installation

Usage

Basic Usage

Setting a Success Threshold

Retrying Flaky Tests

Handling Timeouts

Batch Processing

Disabling Stochastic Mode

Advanced Examples

Testing LLM Outputs

Testing with External APIs

Test Result Output

Configuration

Compatibility

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes