Skip to main content

A Python package for comparing groups and measuring associations using robust statistics.

Project description

Hypothesize

PyPI version PyPI - Downloads license Colab

A Python package for comparing groups and measuring associations using robust statistics.

This package is a port of Rand R. Wilcox's R library WRS. The functions in this repository, as well as the issues of robustness, are described in his book "Introduction to Robust Estimation and Hypothesis Testing".

Please visit the Hypothesize documentation site.

:warning: This repository is still in the early stages of development

Installation

The Hypothesize package is available on PyPI. To install it, simply type:

pip install hypothesize

Expand the following topics to see examples

How to compare two groups

Load data from a CSV or create some example data

from hypothesize.utilities import create_example_data

df=create_example_data(design_values=2)

df.head()
cell_1 cell_2
0 0.0446518 0.90675
1 0.763458 0.291555
2 0.71039 0.59828
3 0.175208 0.268073
4 0.957819 0.222688

Import the desired function and pass in the data for each group

  • This example uses the bootstrapped-t method with 20% trimmed means
  • The output is a dictionary containing the results (95% confidence interval, p_value, test statistics, etc...)
from hypothesize.compare_groups_with_single_factor import yuenbt

results=yuenbt(df.cell_1, df.cell_2)

print(results['ci'])

[-0.3115715617702292, 0.10636703554225341]

How to compare groups in a factorial design

Load data from a CSV or create some example data

from hypothesize.utilities import create_example_data

df=create_example_data(design_values=[2,3])

df.head() 
cell_1_1 cell_1_2 cell_1_3 cell_2_1 cell_2_2 cell_2_3
0 0.0446518 0.90675 0.795696 0.519486 0.333636 0.232153
1 0.763458 0.291555 0.84158 0.0339891 0.511235 0.732503
2 0.71039 0.59828 0.110407 0.898072 0.769496 0.0484005
3 0.175208 0.268073 0.888728 0.287442 0.100153 0.210394
4 0.957819 0.222688 0.834161 0.599158 0.655308 0.203486

Import the desired function and pass in the data

  • This example uses a 2-by-3 design
  • One approach is to use a set of linear contrasts that will test all main effects and interactions
  • Then, the bootstrap-t method and the 20% trimmed mean can be used
  • The results are a dictionary of DataFrames that contain various statistics for each factor and the interactions
from hypothesize.compare_groups_with_two_factors import bwmcp

results=bwmcp(J=2, K=3, x=df)

results['factor_A']

con_num psihat se test crit_value p_value
0 0 0.0393584 0.169849 0.231726 3.35959 0.941569

results['factor_B']

con_num psihat se test crit_value p_value
0 0 -0.104506 0.126135 -0.828529 2.4329 0.452421
1 1 -0.0931364 0.151841 -0.613382 2.4329 0.552588
2 2 0.01137 0.135392 0.0839783 2.4329 0.923205

results['factor_AB']

con_num psihat se test crit_value p_value
0 0 -0.100698 0.126135 -0.798336 2.3771 0.410684
1 1 -0.037972 0.151841 -0.250078 2.3771 0.804674
2 2 0.0627261 0.135392 0.463291 2.3771 0.659432
How to compute a robust correlation

Load data from a CSV or create some example data

from hypothesize.utilities import create_example_data

df=create_example_data(design_values=2)

df.head() 
cell_1 cell_2
0 0.0446518 0.90675
1 0.763458 0.291555
2 0.71039 0.59828
3 0.175208 0.268073
4 0.957819 0.222688

Import the desired function and pass in the data for each group

  • One approach is to winsorize the x and y data
  • A heteroscedastic method for testing zero correlation is also provided in this package but not shown here
  • Please see the function corb which uses the percentile bootstrap to compute a 1-alpha CI and p_value for any correlation
  • The output is a dictionary containing various statistics (the winsorized correlation, winsorized covariance, etc...)
from hypothesize.measuring_associations import wincor

results=wincor(df.cell_1, df.cell_2)

print(results['wcor'])

-0.05690314435050796

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypothesize-0.1.dev21.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hypothesize-0.1.dev21-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file hypothesize-0.1.dev21.tar.gz.

File metadata

  • Download URL: hypothesize-0.1.dev21.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/2.7.17

File hashes

Hashes for hypothesize-0.1.dev21.tar.gz
Algorithm Hash digest
SHA256 e36d8ca614b36e8476df72080387c254e7ef94adc91f18d74a4c67d946c5137a
MD5 62305ed66e21d8423e9741b6524971ea
BLAKE2b-256 853634070bf9ca1eaae22591cbd239d211149a7f02e75ac7e978a7b498eef282

See more details on using hashes here.

File details

Details for the file hypothesize-0.1.dev21-py3-none-any.whl.

File metadata

  • Download URL: hypothesize-0.1.dev21-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/2.7.17

File hashes

Hashes for hypothesize-0.1.dev21-py3-none-any.whl
Algorithm Hash digest
SHA256 475c829774df32b351ca1c7ead8ab23f577cef76bed0963c7b2d89fbec20c76a
MD5 fcd85b98a87684d690197ec60365a6fb
BLAKE2b-256 a9308b322c19b415de75e8757da40a5de6b67aaa9f769b3b405f0642fe83ec83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page