Pytest plugin for Holm-Bonferroni correction of randomized tests

Project description

pytest-familywise

A pytest plugin for running multiple randomized tests while controlling the family-wise error rate (FWER) via the Holm-Bonferroni step-down procedure.

Motivation

A test suite that contains several independent statistical tests will, under the null hypothesis, produce at least one false positive with probability greater than the nominal level α. For $m$ independent tests each at level $\alpha$ the FWER is $1 - (1-\alpha)^m$. Holm-Bonferroni corrects for this without being as conservative as a plain Bonferroni adjustment.

The complication is that Holm-Bonferroni must process p-values from smallest to largest — the threshold for rank k depends on the total count m and all smaller p-values before it. This plugin defers pass/fail decisions: every test runs to completion first, p-values are collected, and then the procedure is applied once to the full set.

Installation and loading

Add the package as a dev dependency:

pip add --dev pytest-familywise

That is all that is needed. The package declares a pytest11 entry point:

# pyproject.toml
[project.entry-points."pytest11"]
random = "pytest_familywise"

pytest scans installed pytest11 entry points at startup and loads matching modules automatically. The fixtures (assertNotReject, ztest_sample_size, etc.) are defined at module level in pytest_familywise, so they become available in every test file without any import or conftest.py change.

Quick example

import numpy as np
import scipy.stats

def test_uniform_marginals(ks_sample_size, assertNotReject):
    """Each output coordinate of our RNG should be marginally uniform."""
    n = ks_sample_size(effect_size=0.05)   # detect CDF deviation >= 5 pp
    samples = np.random.rand(n)
    result = scipy.stats.kstest(samples, "uniform")
    assertNotReject(result.pvalue)


def test_normal_mean_zero(ztest_sample_size, assertNotReject):
    """Standardised output should have mean zero"""
    n = ztest_sample_size(effect_size=0.3)   # Cohen's d = 0.3
    samples = np.random.randn(n)
    _, p = scipy.stats.ttest_1samp(samples, 0.0)
    assertNotReject(p)


def test_discrete_distribution(chisquare_sample_size, assertNotReject):
    """A categorical sampler should match its target probabilities."""
    n = chisquare_sample_size(effect_size=0.2, df=4)   # Cohen's w = 0.2
    observed = np.random.multinomial(n, [0.2] * 5)
    _, p = scipy.stats.chisquare(observed)
    assertNotReject(p)

Run with:

pytest --holm-alpha=0.05 --power=0.8

After all three tests complete, the plugin applies Holm-Bonferroni and appends a summary to the terminal output:

============ Holm-Bonferroni correction  α=0.05  n=3 =============
  PASSED  p=0.312541  threshold=0.016667  test_rng.py::test_uniform_marginals
  PASSED  p=0.487302  threshold=0.025000  test_rng.py::test_normal_mean_zero
  PASSED  p=0.621088  threshold=0.050000  test_rng.py::test_discrete_distribution

  3 passed, 0 failed after Holm-Bonferroni correction

The exit code is non-zero if any test fails the corrected threshold.

How the step-down procedure works

Given $m$ tests with p-values sorted ascending as $p_1 \le p_2 \le \cdots \le p_m$:

At rank $k$, the threshold is $\alpha / (m - k + 1)$.
Starting from $k = 1$, reject $H_0$ while $p_k \le \text{threshold}$.
As soon as a p-value exceeds its threshold, stop rejecting — that test and all remaining ones fail.

This is more powerful than Bonferroni ($\alpha/m$ for all tests) because later ranks receive a relaxed threshold once earlier hypotheses have been rejected.

CLI options

Option	Default	Description
`--holm-alpha`	`0.05`	Family-wise error rate for the Holm-Bonferroni procedure
`--power`	`0.8`	Per-test power used by the sample-size fixtures

--power is per-test, not family-wise. The sample-size fixtures use Holm-Bonferroni corrected significance levels rather than the raw alpha. At collection time, the plugin counts the number of assertNotReject tests (m) and then assigns alpha / (m - k + 1) to the k-th test that requests a sample size, in execution order. The first test receives the most stringent threshold (alpha / m) and therefore the largest sample size; later tests receive progressively relaxed thresholds and smaller samples. Because of this, it is worth ordering your test suite so that more computationally expensive tests run later, where the required sample sizes are smaller.

Fixtures

`assertNotReject`

def test_something(assertNotReject):
    p = run_statistical_test()
    assertNotReject(p)   # registers the p-value; plugin decides pass/fail

The test passes if the null hypothesis is not rejected after Holm-Bonferroni correction (i.e. the p-value is large enough). It fails if H0 is rejected.

Calling assertNotReject(p) with a value outside [0, 1] raises ValueError. If a test raises an exception before assertNotReject is called, it fails normally and is excluded from the Holm-Bonferroni set.

`ztest_sample_size`

n = ztest_sample_size(effect_size=0.5)               # two-sided (default)
n = ztest_sample_size(effect_size=0.5, two_sided=False)

effect_size is Cohen's d. Uses the exact closed form:

$$n = \left\lceil \left(\frac{z_\alpha + z_\beta}{d}\right)^2 \right\rceil$$

Returns per-group n for a two-sample test.

`chisquare_sample_size`

n = chisquare_sample_size(effect_size=0.3, df=4)

effect_size is Cohen's $w = \sqrt{\sum (p_i - p_{0i})^2 / p_{0i}}$; df is the degrees of freedom (number of categories − 1 for goodness-of-fit). Solves numerically via the non-central χ² survival function.

`ks_sample_size`

n = ks_sample_size(effect_size=0.1)                 # one-sample
n = ks_sample_size(effect_size=0.1, two_sample=True) # per-group

effect_size is the maximum absolute CDF difference $|F - G|_\infty \in (0, 1]$. Uses the DKW-inequality bound:

$$n \ge \frac{\left(\sqrt{\ln(2/\alpha)} + \sqrt{\ln(2/\beta)}\right)^2}{2\Delta^2}$$

where $\beta = 1 - \text{power}$. For two_sample=True the effective sample size for the two-sample KS statistic is $n_1 n_2/(n_1+n_2) = n_\text{each}/2$ (equal groups), so the returned per-group count is double the formula above.

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Jun 12, 2026

0.1.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_familywise-0.1.1.tar.gz (127.9 kB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_familywise-0.1.1-py3-none-any.whl (9.1 kB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file pytest_familywise-0.1.1.tar.gz.

File metadata

Download URL: pytest_familywise-0.1.1.tar.gz
Upload date: Jun 12, 2026
Size: 127.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pytest_familywise-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d425b5ba6c68db48c154e0a73aec551c74f8f79b28e20e9cfe208593460f85fb`
MD5	`51021af0f6ccc8a8441366aafffbeced`
BLAKE2b-256	`d95db53954e11dfded33c66ae592a8bef5f819b272c768c25c596fa012ccf2c0`

See more details on using hashes here.

File details

Details for the file pytest_familywise-0.1.1-py3-none-any.whl.

File metadata

Download URL: pytest_familywise-0.1.1-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pytest_familywise-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dc86448799abd9aea36a0977293405d34b400363259de046517838b34ef79ad5`
MD5	`c024c0450bd996fa7c6a3232d6bd38c9`
BLAKE2b-256	`979f6b7d38ce66ccf95cbbed47e01ef23446dd7da79ec31d50311020629bce18`

See more details on using hashes here.

pytest-familywise 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

pytest-familywise

Motivation

Installation and loading

Quick example

How the step-down procedure works

CLI options

Fixtures

`assertNotReject`

`ztest_sample_size`

`chisquare_sample_size`

`ks_sample_size`

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes