Skip to main content

Library for statistical testing and comparison of algorithm results

Project description

https://github.com/kdis-lab/StaTDS Hi, StaTDS is a library for statistical testing and comparison of algorithm results 👋

Statistical Tests for Data Science (StaTDS)

StaTDS is a library for mathematicians, scientists, and engineers. It includes various tools to facilitate statistical analysis given a set of data samples. Within this library, you will find a wide range of statistical tests to streamline the process when conducting comparative or sample studies.

https://github.com/kdis-lab/StaTDS

YouTube Channel Subscribers GitHub Followers GitHub Followers Pypi download

Available statistical test

Normality

Name Function
Shapiro-Wilk normality.shapiro_wilk_normality
D'Agostino-Pearson normality.d_agostino_pearson
Kolmogorov-Smirnov normality.kolmogorov_smirnov

Homoscedasticity

Name Function
Levene homoscedasticity.levene
Bartlett homoscedasticity.bartlett

Parametrics

Name Function Type Comparisons
T Test paired parametrics.t_test_paired Paired
T Test unpaired parametrics.t_test_unpaired Paired
ANOVA between cases parametrics.anova_cases Multiple
ANOVA within cases parametrics.anova_within_cases Multiple

Non Parametrics

Name Function Type Comparisons
Wilcoxon no_parametrics.wilconxon Paired
Binomial Sign no_parametrics.binomial Paired
Mann-Whitney U no_parametrics.mannwhitneyu Paired
Friedman no_parametrics.friedman Multiple
Friedman + Iman-Davenport no_parametrics.iman_davenport Multiple
Friedman Aligned Ranks no_parametrics.friedman_aligned_ranks Multiple
Quade no_parametrics.quade Multiple
Kruskal-Wallis no_parametrics.kruskal_wallis Multiple
Post-hoc
Name Function
Nemenyi no_parametrics.nemenyi
Bonferroni no_parametrics.bonferroni
Li no_parametrics.li
Holm no_parametrics.holm
Holland no_parametrics.holland
Finner no_parametrics.finner
Hochberg no_parametrics.hochberg
Hommel no_parametrics.hommel
Rom no_parametrics.rom
Shaffer no_parametrics.shaffer

Developed in:

Python

Authors

Documentación

You can find all documentation in Documentation Folder, Web Docs or Youtube Channel.

Installation

StaTDS could be downloaded using two different ways: using pip or git as command line or docker container.

Using Git repository

The installation process for Git is detailed for each supported operating system in [1]. Additionally, a comprehensive guide on downloading StaTDS is provided. Git can be easily installed on widely used operating systems such as Windows, Mac, and Linux. It is worth noting that Git comes pre-installed on the majority of Mac and Linux machines by default.

 $ git clone https//github.com/kdis-lab/StaTDS 
    $ cd StaTDS
    $ python -m pip install --upgrade pip # To update pip
    $ python -m pip install --upgrade build # To update build
    $ python -m build 
    $ pip install dist/statds-1.0-py3-none-any.whl

Using pip

Ensure that Python and pip are correctly installed on your operating system before proceeding. Once you have completed this step, utilize the following commands for library installation according to your preferred configuration:

  • If you only want to use the statistical tests:
    $ pip install statds
    
  • If you also want to generate PDFs:
    $ pip install statds[pdf]
    
  • If you want all the features:
    $ pip install statds[full-app]
    

Quick start

  • If you have questions, please ask them in GitHub Discussions.
  • If you want to report a bug, please open an issue on the GitHub repository.
  • If you want to see StaTDS in action, please click on the link below and navigate to the notebooks/ folder to open a collection of interactive Jupyter notebooks.

Using StaTDS Library - API

Normality tests: Shapiro Test

from statds.normality import shapiro_wilk_normality
dataset = pd.read_csv("dataset.csv")
alpha = 0.05
columns = list(dataset.columns)
results = []

for i in range(1, len(columns)): 
    results.append(shapiro_wilk_normality(dataset[columns[i]].to_numpy(), alpha))

statistic_list, p_value_list, cv_value_list, hypothesis_list = zip(*results)

results_test = pd.DataFrame({"Algorithm": columns[1:], "Statistic": statistic_list, "p-value": p_value_list, "Results": hypothesis_list})
print(results_test)

Homoscedasticy tests: Levene

from statds.homoscedasticity import levene_test
dataset = pd.read_csv("dataset.csv")
alpha = 0.05
columns = list(dataset.columns)
statistic, p_value, rejected_value, hypothesis = levene_test(dataset, alpha, center='mean')
print(hypothesis)
print(f"Statistic {statistic}, Rejected Value {rejected_value}")

Parametrics tests: T-test

from statds.parametrics import t_test_paired
dataset = pd.read_csv("dataset.csv")
alpha = 0.05
columns = list(dataset.columns)
selected_columns = [columns[1], columns[2]]
statistic, rejected_value, p_value, hypothesis = t_test_paired(dataset[selected_columns], alpha)
print(hypothesis)
print(f"Statistic {statistic}, Rejected Value {rejected_value}, p-value {p_value}")

Parametrics tests: ANOVA

from statds.parametrics import anova_test
dataset = pd.read_csv("dataset.csv")
alpha = 0.05
columns = list(dataset.columns)
statistic, p_value, rejected_value, hypothesis = anova_test(dataset, alpha)
print(hypothesis)
print(f"Statistic {statistic}, Rejected Value {rejected_value}, p-value {p_value}")

Non-parametrics tests: Wilcoxon

from statds.no_parametrics import wilcoxon

dataset = pd.read_csv("dataset.csv")
alpha = 0.05
columns = list(dataset.columns)
selected_columns = [columns[1], columns[2]]
statistic, p_value, rejected_value, hypothesis = wilcoxon(dataset[selected_columns], alpha)
print(hypothesis)
print(f"Statistic {statistic}, Rejected Value {rejected_value}, p-value {p_value}")

Non-parametrics tests: Friedman Test

import pandas as pd
from statds.no_parametrics import friedman

dataset = pd.read_csv("dataset.csv")
alpha = 0.05
columns = list(dataset.columns)
rankings, statistic, p_value, critical_value, hypothesis = friedman(dataset, alpha, minimize=False)
print(hypothesis)
print(f"Statistic {statistic}, Rejected Value {rejected_value}, p-value {p_value}")
print(rankings)

Post-hoc tests: Bonferroni

from statds.no_parametrics import friedman, bonferroni
dataset = pd.read_csv("dataset.csv")
alpha = 0.05
columns = list(dataset.columns)
rankings, statistic, p_value, critical_value, hypothesis = friedman(dataset, alpha, minimize=False)
print(hypothesis)
print(f"Statistic {statistic}, Rejected Value {rejected_value}, p-value {p_value}")
print(rankings)
num_cases = dataset.shape[0]
results, figure = bonferroni(rankings, num_cases, alpha, control = None, type_rank = "Friedman")
print(results)
figure.show()

Post-hoc tests: Nemenyi

from statds.no_parametrics import friedman, nemenyi
dataset = pd.read_csv("dataset.csv")
alpha = 0.05
columns = list(dataset.columns)
rankings, statistic, p_value, critical_value, hypothesis = friedman(dataset, alpha, minimize=False)
print(hypothesis)
print(f"Statistic {statistic}, Rejected Value {rejected_value}, p-value {p_value}")
print(rankings)
num_cases = dataset.shape[0]
ranks_values, critical_distance_nemenyi, figure = nemenyi(rankings, num_cases, alpha)
print(ranks_values)
print(critical_distance_nemenyi)
figure.show()

Using StaTDS Web Client

Local with Python

You only need create a python script with next code:

from statds import app

app.start_app(port=8050)

Now, you can access to the interface with your Web navigator through the following url: http://localhost:8050

Local Using Docker

Firstly, to begin with, it is essential to download the repository from GitHub to obtain the Dockerfile. Before this step, ensure that Docker is installed on your computer [2]. With Docker ready to use, you can build the application's image by executing the following command:

docker build -t name-lib ./

After the image has been successfully created, the next step is to instantiate a container using that image.

docker run -p 8050:8050 --name container name-lib

Now, you can access to the interface with your Web navigator through the following url: http://localhost:8050

References

[1] 1.5 getting started - installing git. Git. (n.d.). https://git-scm.com/book/en/v2/Getting-Started-Installing-Git [2] Get Docker — Docker Docs. Docker Inc. 2023. url: https://docs.docker.com/get-docker

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statds-1.1.7.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statds-1.1.7-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file statds-1.1.7.tar.gz.

File metadata

  • Download URL: statds-1.1.7.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for statds-1.1.7.tar.gz
Algorithm Hash digest
SHA256 9c76c84e744afd7040210ba3275fa60c6f6c59c14f0b1eba1c2970900bbb9aad
MD5 cd477fe15abbab29fa11d4c0fd52bffc
BLAKE2b-256 c6d30ef3f19bb1dc08e4f28cce462992c77a918e12df6eccb76017c3c226d15d

See more details on using hashes here.

File details

Details for the file statds-1.1.7-py3-none-any.whl.

File metadata

  • Download URL: statds-1.1.7-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for statds-1.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 cb372362fc3ea5dd4fb3b6c6b32c2324432bcb274093d6dc102b0844f8d88230
MD5 9d83848e4d483117154121dd71ee5231
BLAKE2b-256 abeb5fca072229be547c50c2f882e6a2d2241ee995c4c3b18a0cd3c937c8821a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page