MLstatkit is a comprehensive Python library designed to seamlessly integrate established statistical methods into machine learning projects.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PyPI - Version PyPI - License PyPI - Status PyPI - Wheel PyPI - Python Version

MLstatkit

MLstatkit is a comprehensive Python library designed to seamlessly integrate established statistical methods into machine learning projects. It encompasses a variety of tools, including Delong's test for comparing AUCs and Bootstrapping for calculating confidence intervals, among others. With its modular design, MLstatkit offers researchers and data scientists a flexible and powerful toolkit to augment their analyses and model evaluations, catering to a broad spectrum of statistical testing needs within the domain of machine learning.

Installation

Install MLstatkit directly from PyPI using pip:

pip install MLstatkit

Usage

Delong's Test

Delong_test function allows for statistical comparison of AUCs from two different models, providing insights into their performance differences.

Parameters:

true : array-like of shape (n_samples,)
True binary labels in range {0, 1}.
prob_A : array-like of shape (n_samples,)
Predicted probabilities by the first model.
prob_B : array-like of shape (n_samples,)
Predicted probabilities by the second model.

Returns:

z_score : float
The z score from comparing the AUCs of two models.
p_value : float
The p value from comparing the AUCs of two models.

Example:

from MLstatkit.stats import Delong_test

# Example data
true = np.array([0, 1, 0, 1])
prob_A = np.array([0.1, 0.4, 0.35, 0.8])
prob_B = np.array([0.2, 0.3, 0.4, 0.7])

# Perform DeLong's test
z_score, p_value = Delong_test(true, prob_A, prob_B)

print(f"Z-Score: {z_score}, P-Value: {p_value}")

This demonstrates the usage of Delong_test to statistically compare the AUCs of two models based on their predictions and the ground truth labels. The returned z-score and p-value help in understanding if the difference in model performances is statistically significant.

Bootstrapping for Confidence Intervals

The Bootstrapping function calculates confidence intervals for specified performance metrics using bootstrapping, providing a measure of the estimation's reliability. It supports calculation for AUROC, AUPRC, and F1 score metrics.

Parameters:

true : array-like of shape (n_samples,)
True binary labels in range {0, 1}.
prob : array-like of shape (n_samples,)
Predicted probabilities or binary predictions depending on the score function.
score_func_str : str
Scoring function identifier: 'auroc', 'auprc', or 'f1'.
n_bootstraps : int, optional
Number of bootstrapping samples to use (default is 1000).
confidence_level : float, optional
The confidence interval level (e.g., 0.95 for 95% confidence interval, default is 0.95).
threshold : float, optional
Threshold to convert probabilities to binary labels for 'f1' scoring function (default is 0.5).
average : str, optional This parameter is required for multiclass/multilabel targets. default is 'macro'. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

Returns:

original_score : float
The original score calculated without bootstrapping.
confidence_lower : float
The lower bound of the confidence interval.
confidence_upper : float
The upper bound of the confidence interval.

Examples:

from MLstatkit.stats import Bootstrapping

# Example data
y_true = np.array([0, 1, 0, 0, 1, 1, 0, 1, 0])
y_prob = np.array([0.1, 0.4, 0.35, 0.8, 0.2, 0.3, 0.4, 0.7, 0.05])

# Calculate confidence intervals for AUROC
original_score, confidence_lower, confidence_upper = Bootstrapping(y_true, y_prob, 'auroc')
print(f"AUROC: {original_score:.3f}, Confidence interval: [{confidence_lower:.3f} - {confidence_upper:.3f}]")

# Calculate confidence intervals for AUPRC
original_score, confidence_lower, confidence_upper = Bootstrapping(y_true, y_prob, 'auprc')
print(f"AUPRC: {original_score:.3f}, Confidence interval: [{confidence_lower:.3f} - {confidence_upper:.3f}]")

# Calculate confidence intervals for F1 score with a custom threshold
original_score, confidence_lower, confidence_upper = Bootstrapping(y_true, y_prob, 'f1', threshold=0.5)
print(f"F1 Score: {original_score:.3f}, Confidence interval: [{confidence_lower:.3f} - {confidence_upper:.3f}]")

# Calculate confidence intervals for AUROC, AUPRC, F1 score
for score in ['auroc', 'auprc', 'f1']:
    original_score, conf_lower, conf_upper = Bootstrapping(y_true, y_prob, score, threshold=0.5)
    print(f"{score.upper()} original score: {original_score:.3f}, confidence interval: [{conf_lower:.3f} - {conf_upper:.3f}]")

References

Delong's Test

The implementation of Delong_test in MLStats is based on the following publication:

Xu Sun and Weichao Xu, "Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves," in IEEE Signal Processing Letters, vol. 21, no. 11, pp. 1389-1393, 2014, IEEE.

Bootstrapping

The Bootstrapping method for calculating confidence intervals does not directly reference a single publication but is a widely accepted statistical technique for estimating the distribution of a metric by resampling with replacement. For a comprehensive overview of bootstrapping methods, see:

B. Efron and R. Tibshirani, "An Introduction to the Bootstrap," Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 1994.

These references provide the foundational methodologies behind the statistical tests and techniques implemented in MLstatkit, offering users insights into their theoretical underpinnings.

Contributing

We welcome contributions to MLstatkit! Please see our contribution guidelines for more details.

License

MLstatkit is distributed under the MIT License. For more information, see the LICENSE file in the GitHub repository.

Update log

0.1.2 Add Bootstrapping operation process progress display.
0.1.1 Update README.md, setup.py. Add CONTRIBUTING.md.
0.1.0 First edition

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.4

Apr 4, 2024

0.1.3

Apr 2, 2024

This version

0.1.2

Apr 1, 2024

0.1.1

Apr 1, 2024

0.1.0

Apr 1, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MLstatkit-0.1.2.tar.gz (6.4 kB view hashes)

Uploaded Apr 1, 2024 Source

Built Distribution

MLstatkit-0.1.2-py3-none-any.whl (7.0 kB view hashes)

Uploaded Apr 1, 2024 Python 3

Hashes for MLstatkit-0.1.2.tar.gz

Hashes for MLstatkit-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`8da8b52b1987f0c8828711c45a96700092e6247a66361dd72658fc24aa6d6546`
MD5	`1054274ed8bd4111e5e4ebd8aad65392`
BLAKE2b-256	`533c3dd81d33f385775d0c332d50cbb80758aa531f9ad87e00b768bd13ffd75c`

Hashes for MLstatkit-0.1.2-py3-none-any.whl

Hashes for MLstatkit-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2c01018b5b77d191420c7e5330783cb986665cfaeff50cdfa4549d8319107760`
MD5	`6e0ab9d22748886f1329ac8ffe0da3ab`
BLAKE2b-256	`7d27dc4ad580690bdb148bd0cb69f5ae554c22b0762322be3bc577888eaa19f0`