zedstat

Statistics tools for ML models and deployment

These details have not been verified by PyPI

Project links

Project description

Author:: ZeD@UChicago <zed.uchicago.edu>
Description:: Tools for ML statistics
Documentation:: https://zeroknowledgediscovery.github.io/zedstat/
Example:: https://github.com/zeroknowledgediscovery/zedstat/blob/master/examples/example1.ipynb

Additional usage examples

1. Export operating characteristics at a chosen prevalence

import pandas as pd
from zedstat import zedstat

roc_df = pd.read_csv("roc.csv")

zt = zedstat.processRoc(
    df=roc_df,
    order=3,
    total_samples=100000,
    positive_samples=100,
    alpha=0.01,
    prevalence=0.002,
)

zt.smooth(STEP=0.001)
zt.allmeasures(interpolate=True)
zt.usample(precision=3)
zt.getBounds()

out = zt.get().join(zt.df_lim["U"], rsuffix="_upper").join(zt.df_lim["L"], rsuffix="_lower")
out.to_csv("roc_operating_characteristics.csv")

2. Retrieve threshold-level PPV from a score

This uses the ROC-derived operating characteristics and prevalence to estimate the positive predictive value associated with using a score as a decision threshold.

example_scores = [0.10, 0.20, 0.30]

ppv_at_threshold = zt.score_to_threshold_ppv(
    example_scores,
    regen=True,
    STEP=0.001,
    precision=3,
    interpolate=True,
    convexify=False,
)

threshold_ppv_df = pd.DataFrame({
    "score": example_scores,
    "threshold_ppv": ppv_at_threshold,
})

display(threshold_ppv_df)

3. Held-out calibration with isotonic regression

The calibration module is separate from the ROC processing utilities and is imported as:

from zedstat import calibration

res = calibration.heldout_isotonic_calibration_with_bootstrap(
    df,
    score_col="predicted_risk",
    label_col="target",
    test_size=0.25,
    random_state=4,
    lower_score_is_risk=False,
    target_prevalence=None,
    n_bins=100,
    n_boot=1000,
    calibration_df_path="calibration_df_SISA.csv",
    plot="calibration_SISA.pdf",
)

print(res["summary"])
display(res["calibration_table"])

4. Convert raw scores to calibrated probabilities

The held-out calibration routine returns the fitted isotonic regression model in res["iso_model"]. This can be applied to any score vector.

import numpy as np
import pandas as pd

example_scores = [0.10, 0.20, 0.30]
example_scores_arr = np.asarray(example_scores, dtype=float)

calibrated_probs = res["iso_model"].predict(example_scores_arr)

calibrated_df = pd.DataFrame({
    "score": example_scores,
    "calibrated_probability": np.asarray(calibrated_probs, dtype=float),
})

display(calibrated_df)

5. Format calibration summary for a manuscript table

def format_calibration_summary_df(summary):
    import numpy as np
    import pandas as pd

    summary = pd.Series(summary)

    ci_map = {
        "auc_raw_test": ("auc_raw_ci_low", "auc_raw_ci_high"),
        "auc_calibrated_test": ("auc_calibrated_ci_low", "auc_calibrated_ci_high"),
        "brier_raw_test": ("brier_raw_ci_low", "brier_raw_ci_high"),
        "brier_calibrated_test": ("brier_calibrated_ci_low", "brier_calibrated_ci_high"),
        "calibration_intercept_test": ("calibration_intercept_ci_low", "calibration_intercept_ci_high"),
        "calibration_slope_test": ("calibration_slope_ci_low", "calibration_slope_ci_high"),
    }

    skip_keys = {
        "auc_raw_ci_low", "auc_raw_ci_high",
        "auc_calibrated_ci_low", "auc_calibrated_ci_high",
        "brier_raw_ci_low", "brier_raw_ci_high",
        "brier_calibrated_ci_low", "brier_calibrated_ci_high",
        "calibration_intercept_ci_low", "calibration_intercept_ci_high",
        "calibration_slope_ci_low", "calibration_slope_ci_high",
    }

    rows = []
    for key, val in summary.items():
        if key in skip_keys:
            continue

        value_str = "" if pd.isna(val) else f"{float(val):.3f}"

        if key in ci_map:
            lo_key, hi_key = ci_map[key]
            lo = summary.get(lo_key, np.nan)
            hi = summary.get(hi_key, np.nan)
            if pd.notna(lo) and pd.notna(hi):
                value_str = f"{float(val):.3f} ({float(lo):.3f}, {float(hi):.3f})"

        rows.append({"variable": str(key), "value": value_str})

    return pd.DataFrame(rows, columns=["variable", "value"])

summary_df = format_calibration_summary_df(res["summary"])
display(summary_df)

Feature explanations

processRoc

processRoc is the main ROC post-processing class. It takes an empirical ROC curve with false positive rate and true positive rate, augments it with operating metrics, and allows interpolation, confidence bounds, and interpretation at chosen operating points.

smooth(STEP, interpolate, convexify)

This function regularizes the empirical ROC curve. If convexify=True, the upper ROC hull is computed so that dominated operating points are removed. If interpolate=True, the curve is resampled on a uniform false positive rate grid.

Let the ROC curve be represented as points

\begin{equation*} \{(f_i, t_i)\}_{i=1}^m \end{equation*}

where \(f_i\) is the false positive rate and \(t_i\) is the true positive rate. After optional convexification and interpolation, these are resampled onto a uniform grid in \(f\).

allmeasures(prevalence)

This computes threshold-level operating measures from sensitivity, specificity, and prevalence.

Let

\begin{equation*} \mathrm{TPR}(t) = P(\hat Y_t = 1 \mid Y=1), \qquad \mathrm{FPR}(t) = P(\hat Y_t = 1 \mid Y=0), \end{equation*}

and let prevalence be

\begin{equation*} \pi = P(Y=1). \end{equation*}

Then:

\begin{equation*} \mathrm{Specificity}(t) = 1 - \mathrm{FPR}(t) \end{equation*}

\begin{equation*} \mathrm{PPV}(t) = \frac{\mathrm{TPR}(t)\pi} {\mathrm{TPR}(t)\pi + \mathrm{FPR}(t)(1-\pi)} \end{equation*}

\begin{equation*} \mathrm{NPV}(t) = \frac{(1-\mathrm{FPR}(t))(1-\pi)} {(1-\mathrm{FPR}(t))(1-\pi) + (1-\mathrm{TPR}(t))\pi} \end{equation*}

\begin{equation*} \mathrm{Accuracy}(t) = \pi \mathrm{TPR}(t) + (1-\pi)(1-\mathrm{FPR}(t)) \end{equation*}

\begin{equation*} \mathrm{LR}^+(t) = \frac{\mathrm{TPR}(t)}{\mathrm{FPR}(t)} \end{equation*}

\begin{equation*} \mathrm{LR}^-(t) = \frac{1-\mathrm{TPR}(t)}{1-\mathrm{FPR}(t)} \end{equation*}

These are threshold-level decision quantities. They describe the performance of classifying everyone whose score crosses the threshold.

usample(precision)

This resamples the metric tables on a uniform false positive rate grid, typically for downstream lookup and plotting. The grid spacing is controlled by the decimal precision.

getBounds(total_samples, positive_samples, alpha)

This computes pointwise confidence bounds for the operating measures using Wilson intervals for sensitivity and specificity, then propagates these to PPV, NPV, accuracy, and likelihood ratios.

If \(n_1\) is the number of positive cases and \(n_0\) is the number of negative cases, then Wilson intervals are first computed for

\begin{equation*} \mathrm{TPR}(t) \quad \text{using } n_1 \end{equation*}

and for

\begin{equation*} \mathrm{Specificity}(t) = 1-\mathrm{FPR}(t) \quad \text{using } n_0. \end{equation*}

The derived measures are then bounded by substituting lower and upper values of sensitivity and specificity into the formulas above.

auc()

The area under the ROC curve is

\begin{equation*} \mathrm{AUC} = \int_0^1 \mathrm{TPR}(f)\, df \end{equation*}

where \(f\) is false positive rate. Numerically this is computed by trapezoidal integration over the processed ROC curve. Confidence intervals can be estimated either analytically or by bootstrap when raw scores and labels are available.

operating_zone(LRplus, LRminus)

This identifies practically useful threshold regions subject to likelihood-ratio constraints, for example high precision or high sensitivity operating points. Internally, the method searches the set of thresholds satisfying

\begin{equation*} \mathrm{LR}^+(t) > c_1, \qquad \mathrm{LR}^-(t) < c_2 \end{equation*}

for user-specified constants \(c_1\) and \(c_2\).

interpret(fpr, number_of_positives, five_yr_survival, factor)

This converts operating characteristics into expected counts for an interpretable hypothetical population.

If \(P\) is the chosen number of true positives in the target population and prevalence is \(\pi\), then the implied number of negatives is

\begin{equation*} N = P \frac{1-\pi}{\pi}. \end{equation*}

Given sensitivity and PPV at the selected operating point, the method estimates true positives, false positives, false negatives, total flags, and number needed to screen.

Calibration

The calibration utilities are provided in zedstat.calibration.

Held-out isotonic calibration

Given a score \(S\) and binary label \(Y \in \{0,1\}\), the calibration routine splits the data into train and test subsets. On the training subset, isotonic regression fits a monotone mapping

\begin{equation*} g(s) \approx P(Y=1 \mid S=s). \end{equation*}

The fitted function \(g\) is then applied to the held-out test scores to obtain calibrated probabilities.

Calibrated probability

The calibrated probability at score \(s\) is the local conditional event probability

\begin{equation*} m(s) = P(Y=1 \mid S=s). \end{equation*}

For continuous scores this can be interpreted as

\begin{equation*} m(s) = \frac{f_{S,Y=1}(s)}{f_S(s)} = \frac{\pi f_{S \mid Y=1}(s)}{f_S(s)}. \end{equation*}

This is a local score-level quantity.

Threshold PPV versus calibrated probability

These are different objects.

Threshold PPV at threshold \(t\) is

\begin{equation*} \mathrm{PPV}(t) = P(Y=1 \mid S \ge t) \end{equation*}

when higher scores indicate higher risk. This is a tail-average quantity:

\begin{equation*} \mathrm{PPV}(t) = \frac{\int_t^\infty m(s) f_S(s)\, ds} {\int_t^\infty f_S(s)\, ds} = E[m(S)\mid S\ge t]. \end{equation*}

Thus, calibrated probability is local, while threshold PPV is cumulative over all subjects beyond the threshold.

Brier score

The Brier score evaluates probability accuracy:

\begin{equation*} \mathrm{Brier} = \frac{1}{n}\sum_{i=1}^n (p_i - y_i)^2 \end{equation*}

where \(p_i\) is the predicted probability and \(y_i\) is the observed binary outcome. Lower is better, with 0 indicating perfect probabilistic prediction.

Calibration intercept and slope

A well-calibrated model should satisfy

\begin{equation*} P(Y=1 \mid \hat p = p) = p. \end{equation*}

A practical assessment regresses the observed outcome on the logit of the predicted probability:

\begin{equation*} \logit P(Y=1) = a + b \logit(\hat p). \end{equation*}

Here:

\(a\) is the calibration intercept. Ideal value is 0.
\(b\) is the calibration slope. Ideal value is 1.

An intercept above 0 indicates underprediction on average. An intercept below 0 indicates overprediction on average. A slope below 1 indicates overly extreme predictions; a slope above 1 indicates predictions compressed toward the center.

Calibration table and reliability diagram

Predicted probabilities are grouped into bins, and for each bin the observed event rate is estimated. If a bin contains \(k\) events among \(n\) subjects, the observed rate is

\begin{equation*} \hat p_{\mathrm{obs}} = \frac{k}{n}. \end{equation*}

Wilson confidence intervals are computed for each bin and displayed as vertical error bars in the reliability diagram.

Sample size planning

The sample size utilities in zedstat use AUC-based approximations. Let the target AUC be \(A\). Define

\begin{equation*} Q_1 = \frac{A}{2-A}, \qquad Q_2 = \frac{2A^2}{1+A}. \end{equation*}

In the balanced-design approximation, the required sample size per class for resolving an AUC tolerance \(\delta\) at confidence level \(1-\alpha\) is

\begin{equation*} n \approx \frac{z_{1-\alpha/2}^2 \, c}{\delta^2}, \end{equation*}

where

\begin{equation*} c = A(1-A) - A^2 + Q_1 + Q_2. \end{equation*}

When prevalence is specified, the code uses a prevalence-aware total sample size formula derived from the Hanley-McNeil variance approximation.

Remarks

zedstat separates two different but complementary notions of risk:

threshold-level decision utility, such as PPV, NPV, and likelihood ratios, derived from the ROC curve and prevalence;
score-level probability interpretation, obtained through calibration.

The first is useful for screening policy and operating-point selection. The second is useful when the score must be interpreted as an individual event probability.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.153

May 30, 2026

0.0.152

May 30, 2026

0.0.151

May 30, 2026

0.0.150

May 30, 2026

0.0.149

May 12, 2026

0.0.148

May 12, 2026

0.0.147

May 12, 2026

0.0.146

May 12, 2026

0.0.145

May 12, 2026

0.0.144

May 12, 2026

0.0.143

May 10, 2026

0.0.142

Jan 19, 2026

0.0.141

Jun 27, 2024

0.0.140

Jun 27, 2024

0.0.139

Jun 26, 2024

0.0.138

Jun 26, 2024

0.0.137

Dec 5, 2023

0.0.136

Dec 5, 2023

0.0.135

Nov 19, 2023

0.0.134

Nov 19, 2023

0.0.133

Nov 19, 2023

0.0.132

Nov 19, 2023

0.0.131

Nov 15, 2023

0.0.130

Nov 15, 2023

0.0.129

Nov 15, 2023

0.0.128

Nov 5, 2023

0.0.127

Nov 5, 2023

0.0.126

Nov 5, 2023

0.0.125

Nov 4, 2023

0.0.124

Nov 4, 2023

0.0.123

Nov 4, 2023

0.0.122

Nov 4, 2023

0.0.121

Nov 4, 2023

0.0.120

Nov 4, 2023

0.0.119

Nov 4, 2023

0.0.118

Jul 6, 2023

0.0.117

Jun 25, 2023

0.0.116

Jun 25, 2023

0.0.115

Jun 16, 2023

0.0.114

Dec 19, 2022

0.0.113

Dec 15, 2022

0.0.112

Dec 15, 2022

0.0.111

Dec 15, 2022

0.0.110

Dec 15, 2022

0.0.109

Nov 7, 2022

0.0.108

Nov 7, 2022

0.0.107

Nov 7, 2022

0.0.106

Nov 7, 2022

0.0.105

Nov 7, 2022

0.0.104

Nov 7, 2022

0.0.103

Nov 7, 2022

0.0.102

Nov 5, 2022

0.0.101

Nov 5, 2022

0.0.100

Nov 4, 2022

0.0.99

Nov 3, 2022

0.0.98

Nov 1, 2022

0.0.97

Oct 31, 2022

0.0.96

Oct 31, 2022

0.0.95

Oct 28, 2022

0.0.94

Oct 28, 2022

0.0.93

Oct 26, 2022

0.0.92

Oct 25, 2022

0.0.91

Oct 12, 2022

0.0.90

Oct 10, 2022

0.0.89

Sep 7, 2022

0.0.88

Sep 7, 2022

0.0.87

Sep 7, 2022

0.0.86

Sep 7, 2022

0.0.85

Sep 7, 2022

0.0.84

Sep 7, 2022

0.0.83

Sep 7, 2022

0.0.82

Sep 7, 2022

0.0.81

Sep 7, 2022

0.0.80

Sep 7, 2022

0.0.79

Sep 7, 2022

0.0.78

Sep 6, 2022

0.0.77

Sep 6, 2022

0.0.76

Sep 6, 2022

0.0.75

Sep 6, 2022

0.0.74

Sep 6, 2022

0.0.73

Sep 6, 2022

0.0.72

Sep 6, 2022

0.0.71

Sep 6, 2022

0.0.70

Sep 6, 2022

0.0.69

Sep 6, 2022

0.0.68

Sep 6, 2022

0.0.67

Sep 6, 2022

0.0.66

Sep 6, 2022

0.0.65

Sep 6, 2022

0.0.64

Sep 6, 2022

0.0.63

Sep 6, 2022

0.0.62

Sep 6, 2022

0.0.61

Sep 6, 2022

0.0.60

Sep 6, 2022

0.0.59

Sep 5, 2022

0.0.58

Sep 5, 2022

0.0.57

Sep 5, 2022

0.0.56

Sep 5, 2022

0.0.55

Sep 5, 2022

0.0.54

Sep 5, 2022

0.0.53

Sep 5, 2022

0.0.52

Sep 5, 2022

0.0.51

Sep 5, 2022

0.0.50

Sep 5, 2022

0.0.49

Sep 5, 2022

0.0.48

Sep 5, 2022

0.0.47

Sep 5, 2022

0.0.46

Sep 5, 2022

0.0.45

Sep 5, 2022

0.0.44

Sep 5, 2022

0.0.43

Sep 3, 2022

0.0.42

Sep 3, 2022

0.0.41

Sep 3, 2022

0.0.40

Sep 3, 2022

0.0.39

Sep 3, 2022

0.0.38

Sep 3, 2022

0.0.37

Sep 3, 2022

0.0.36

Sep 3, 2022

0.0.35

Sep 3, 2022

0.0.34

Sep 3, 2022

0.0.33

Sep 3, 2022

0.0.32

Sep 3, 2022

0.0.31

Sep 3, 2022

0.0.30

Sep 3, 2022

0.0.29

Sep 3, 2022

0.0.28

Sep 3, 2022

0.0.27

Sep 3, 2022

0.0.26

Sep 3, 2022

0.0.25

Sep 3, 2022

0.0.24

Sep 3, 2022

0.0.23

Sep 3, 2022

0.0.22

Sep 3, 2022

0.0.21

Sep 3, 2022

0.0.20

Sep 3, 2022

0.0.19

Sep 3, 2022

0.0.18

Sep 3, 2022

0.0.17

Sep 3, 2022

0.0.16

Sep 3, 2022

0.0.15

Sep 3, 2022

0.0.14

Sep 3, 2022

0.0.13

Sep 3, 2022

0.0.12

Sep 3, 2022

0.0.11

Sep 3, 2022

0.0.10

Sep 3, 2022

0.0.9

Sep 3, 2022

0.0.8

Sep 2, 2022

0.0.7

Sep 2, 2022

0.0.6

Sep 2, 2022

0.0.5

Sep 2, 2022

0.0.4

Sep 2, 2022

0.0.3

Sep 2, 2022

0.0.2

Sep 2, 2022

0.0.1

Sep 2, 2022

0.0.0

Sep 2, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zedstat-0.0.153.tar.gz (108.2 kB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zedstat-0.0.153-py3-none-any.whl (193.8 kB view details)

Uploaded May 30, 2026 Python 3

File details

Details for the file zedstat-0.0.153.tar.gz.

File metadata

Download URL: zedstat-0.0.153.tar.gz
Upload date: May 30, 2026
Size: 108.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for zedstat-0.0.153.tar.gz
Algorithm	Hash digest
SHA256	`a47d236e0c6a22ef6b0ff96ae9ff8e5fa0b996b8ad9760a47f3d94ec82710d80`
MD5	`ff029f583ad08fb3eafc35f53870e6f0`
BLAKE2b-256	`4349998daa5ab1e1439b1fac8369ce2bba2eb11d93a5b390117d5677f628ca9f`

See more details on using hashes here.

File details

Details for the file zedstat-0.0.153-py3-none-any.whl.

File metadata

Download URL: zedstat-0.0.153-py3-none-any.whl
Upload date: May 30, 2026
Size: 193.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for zedstat-0.0.153-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ea76b28fd91e7a1a3cc05e0619feb1ad0c6c949d5159a1f7c9a84c1f009e776`
MD5	`3d4ee6469f7bcf106839a82f7a507b95`
BLAKE2b-256	`e1640067ce9e590cbd92699a05fa61d627ea61732a4e9bfb0197b1e98aeb0e10`

See more details on using hashes here.

zedstat 0.0.153

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Additional usage examples

Feature explanations

Calibration

Sample size planning

Remarks

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes