Skip to main content

MLstatkit is a comprehensive Python library designed to seamlessly integrate established statistical methods into machine learning projects.

Project description

PyPI - Version PyPI - License PyPI - Status PyPI - Wheel PyPI - Python Version

MLstatkit

MLstatkit is a comprehensive Python library designed to seamlessly integrate established statistical methods into machine learning projects. It encompasses a variety of tools, including Delong's test for comparing AUCs and Bootstrapping for calculating confidence intervals, among others. With its modular design, MLstatkit offers researchers and data scientists a flexible and powerful toolkit to augment their analyses and model evaluations, catering to a broad spectrum of statistical testing needs within the domain of machine learning.

Installation

Install MLstatkit directly from PyPI using pip:

pip install MLstatkit

Usage

Delong's Test

Delong_test function allows for statistical comparison of AUCs from two different models, providing insights into their performance differences.

Parameters:

  • true : array-like of shape (n_samples,)
    True binary labels in range {0, 1}.

  • prob_A : array-like of shape (n_samples,)
    Predicted probabilities by the first model.

  • prob_B : array-like of shape (n_samples,)
    Predicted probabilities by the second model.

Returns:

  • z_score : float
    The z score from comparing the AUCs of two models.

  • p_value : float
    The p value from comparing the AUCs of two models.

Example:

from MLstatkit.stats import Delong_test

# Example data
true = np.array([0, 1, 0, 1])
prob_A = np.array([0.1, 0.4, 0.35, 0.8])
prob_B = np.array([0.2, 0.3, 0.4, 0.7])

# Perform DeLong's test
z_score, p_value = Delong_test(true, prob_A, prob_B)

print(f"Z-Score: {z_score}, P-Value: {p_value}")

This demonstrates the usage of Delong_test to statistically compare the AUCs of two models based on their predictions and the ground truth labels. The returned z-score and p-value help in understanding if the difference in model performances is statistically significant.

Bootstrapping for Confidence Intervals

The Bootstrapping function calculates confidence intervals for specified performance metrics using bootstrapping, providing a measure of the estimation's reliability. It supports calculation for AUROC, AUPRC, and F1 score metrics.

Parameters:

  • true : array-like of shape (n_samples,)
    True binary labels in range {0, 1}.
  • prob : array-like of shape (n_samples,)
    Predicted probabilities or binary predictions depending on the score function.
  • score_func_str : str
    Scoring function identifier: 'auroc', 'auprc', or 'f1'.
  • n_bootstraps : int, optional
    Number of bootstrapping samples to use (default is 1000).
  • confidence_level : float, optional
    The confidence interval level (e.g., 0.95 for 95% confidence interval, default is 0.95).
  • threshold : float, optional
    Threshold to convert probabilities to binary labels for 'f1' scoring function (default is 0.5).
  • average : str, optional This parameter is required for multiclass/multilabel targets. default is 'macro'. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data.

Returns:

  • original_score : float
    The original score calculated without bootstrapping.
  • confidence_lower : float
    The lower bound of the confidence interval.
  • confidence_upper : float
    The upper bound of the confidence interval.

Examples:

from MLstatkit.stats import Bootstrapping

# Example data
y_true = np.array([0, 1, 0, 0, 1, 1, 0, 1, 0])
y_prob = np.array([0.1, 0.4, 0.35, 0.8, 0.2, 0.3, 0.4, 0.7, 0.05])

# Calculate confidence intervals for AUROC
original_score, confidence_lower, confidence_upper = Bootstrapping(y_true, y_prob, 'auroc')
print(f"AUROC: {original_score:.3f}, Confidence interval: [{confidence_lower:.3f} - {confidence_upper:.3f}]")

# Calculate confidence intervals for AUPRC
original_score, confidence_lower, confidence_upper = Bootstrapping(y_true, y_prob, 'auprc')
print(f"AUPRC: {original_score:.3f}, Confidence interval: [{confidence_lower:.3f} - {confidence_upper:.3f}]")

# Calculate confidence intervals for F1 score with a custom threshold
original_score, confidence_lower, confidence_upper = Bootstrapping(y_true, y_prob, 'f1', threshold=0.5)
print(f"F1 Score: {original_score:.3f}, Confidence interval: [{confidence_lower:.3f} - {confidence_upper:.3f}]")

# Calculate confidence intervals for AUROC, AUPRC, F1 score
for score in ['auroc', 'auprc', 'f1']:
    original_score, conf_lower, conf_upper = Bootstrapping(y_true, y_prob, score, threshold=0.5)
    print(f"{score.upper()} original score: {original_score:.3f}, confidence interval: [{conf_lower:.3f} - {conf_upper:.3f}]")

References

Delong's Test

The implementation of Delong_test in MLStats is based on the following publication:

  • Xu Sun and Weichao Xu, "Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves," in IEEE Signal Processing Letters, vol. 21, no. 11, pp. 1389-1393, 2014, IEEE.

Bootstrapping

The Bootstrapping method for calculating confidence intervals does not directly reference a single publication but is a widely accepted statistical technique for estimating the distribution of a metric by resampling with replacement. For a comprehensive overview of bootstrapping methods, see:

  • B. Efron and R. Tibshirani, "An Introduction to the Bootstrap," Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 1994.

These references provide the foundational methodologies behind the statistical tests and techniques implemented in MLstatkit, offering users insights into their theoretical underpinnings.

Contributing

We welcome contributions to MLstatkit! Please see our contribution guidelines for more details.

License

MLstatkit is distributed under the MIT License. For more information, see the LICENSE file in the GitHub repository.

Update log

  • 0.1.2 Add Bootstrapping operation process progress display.
  • 0.1.1 Update README.md, setup.py. Add CONTRIBUTING.md.
  • 0.1.0 First edition

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MLstatkit-0.1.2.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

MLstatkit-0.1.2-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file MLstatkit-0.1.2.tar.gz.

File metadata

  • Download URL: MLstatkit-0.1.2.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.7

File hashes

Hashes for MLstatkit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8da8b52b1987f0c8828711c45a96700092e6247a66361dd72658fc24aa6d6546
MD5 1054274ed8bd4111e5e4ebd8aad65392
BLAKE2b-256 533c3dd81d33f385775d0c332d50cbb80758aa531f9ad87e00b768bd13ffd75c

See more details on using hashes here.

File details

Details for the file MLstatkit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: MLstatkit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.7

File hashes

Hashes for MLstatkit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2c01018b5b77d191420c7e5330783cb986665cfaeff50cdfa4549d8319107760
MD5 6e0ab9d22748886f1329ac8ffe0da3ab
BLAKE2b-256 7d27dc4ad580690bdb148bd0cb69f5ae554c22b0762322be3bc577888eaa19f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page