Confidence Intervals in python

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

The long missing python library for confidence intervals

Build Status

pip install confidenceinterval

This is a package that computes common machine learning metrics like F1, and returns their confidence intervals.

⭐ Very easy to use, with the standard scikit-learn naming convention and interface:

e.g roc_auc_score(y_true, y_pred).

⭐ Support for many metrics, with modern confidence interval methods.

⭐ Support for both analytical computation of the confidence intervals, and bootstrapping.

⭐ East to use interface to compute confidence intervals on new metrics that don't appear here, with bootstrapping.

Getting started

from confidenceinterval import roc_auc_score
auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95)
auc, ci = roc_auc_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_bca')

By default all the methods return an analytical computation of the confidence interval (CI). For a bootstrap computation of the CI, for any of the methods belonw, you can just specify method='bootstrap_bca', or method='bootstrap_percentile' or method='bootstrap_basic'.

Supported methods

Get a confidence interval for any external metric

With the bootstrap_ci method, you can get the CI for a an external metric method. As an example, lets get the CI for the balanced accuracy metric. It's not implemented yet in this package, but we can easily get the CI for it:

from confidenceinterval.bootstrap import bootstrap_ci
# You can specify a random generator for reproducability, or pass None
random_generator = np.random.default_rng()
bootstrap_ci(y_true=y_true,
             y_pred=y_pred,
             metric=sklearn.metrics.balanced_accuracy_score,
             confidence_level=0.95,
             n_resamples=9999,
             method='bootstrap_bca',
             random_state=random_generator)

F1, Precision, Recall (with Macro and Micro averaging)

from confidence interval import precision_score, recall_score, f1_score

These methods also accept average='micro' or average='macro'.

The analytical computation here is using the (amazing) 2022 paper of Takahashi et al (reference below).

ROC AUC

from confidence interval import roc_auc_score

The analytical computation here is a fast implementation of the DeLong method.

Binary metrics

from confidence interval import accuracy_score, ppv_score, npv_score,
                                tpr_score, fpr_score, tnr_score

For these methods, the confidence interval is estimated by treating the ratio as a binomial proportion, see the wiki page.

By default method='wilson', the wilson interval, which behaves better for smaller samples.

method can be one of ['wilson', 'normal', 'agresti_coull', 'beta', 'jeffreys', 'binom_test'], or one of the boostrap methods.

References

The binomial confidence interval computation uses the statsmodels package: https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportion_confint.html

Yandex data schhol implementation of the fast delong method: https://github.com/yandexdataschool/roc_comparison

https://ieeexplore.ieee.org/document/6851192 X. Sun and W. Xu, "Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves," in IEEE Signal Processing Letters, vol. 21, no. 11, pp. 1389-1393, Nov. 2014, doi: 10.1109/LSP.2014.2337313.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8936911/#APP2 Confidence interval for micro-averaged F1 and macro-averaged F1 scores Kanae Takahashi,1,2 Kouji Yamamoto,3 Aya Kuchiba,4,5 and Tatsuki Koyama6

B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall/CRC, Boca Raton, FL, USA (1993)

Nathaniel E. Helwig, “Bootstrap Confidence Intervals”, http://users.stat.umn.edu/~helwig/notes/bootci-Notes.pdf

Bootstrapping (statistics), Wikipedia, https://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.4

Jun 19, 2023

1.0.3

Jun 11, 2023

1.0.2

Jun 11, 2023

1.0.1

Apr 19, 2023

This version

1.0.0

Mar 11, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

confidenceinterval-1.0.0.tar.gz (11.7 kB view hashes)

Uploaded Mar 11, 2023 Source

Hashes for confidenceinterval-1.0.0.tar.gz

Hashes for confidenceinterval-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a3df29ebabe449bff85f177bf2270caee7990e1e5b80d55ac52ee2eb7c5b0008`
MD5	`13935968c423860a98fba8f795f9e471`
BLAKE2b-256	`2963961eeb16af92de8645963835230ee922a9fb81e6eac06fbf98c22874bcb2`