A Python package for comparing groups and measuring associations using robust statistics.
Project description
Hypothesize
A Python package for comparing groups and measuring associations using robust statistics.
This package is a port of Rand R. Wilcox's R library WRS. The functions in this repository, as well as the issues of robustness, are described in his book "Introduction to Robust Estimation and Hypothesis Testing".
Please visit the Hypothesize documentation site.
:warning: This repository is still in the early stages of development
Installation
The Hypothesize package is available on PyPI. To install it, simply type:
pip install hypothesize
Expand the following topics to see examples
How to compare two groups
Load data from a CSV or create some example data
from hypothesize.utilities import create_example_data
df=create_example_data(design_values=3)
df.head()
cell_1 | cell_2 | |
---|---|---|
0 | 0.0446518 | 0.90675 |
1 | 0.763458 | 0.291555 |
2 | 0.71039 | 0.59828 |
3 | 0.175208 | 0.268073 |
4 | 0.957819 | 0.222688 |
Import the desired function and pass in the data for each group
- This example uses the bootstrapped-t method with 20% trimmed means
- The output is a dictionary containing the results (95% confidence interval, p_value, test statistics, etc...)
from hypothesize.compare_groups_with_single_factor import yuenbt
results=yuenbt(df.Group_1, df.Group_2)
print(results['ci'])
[-0.3115715617702292, 0.10636703554225341]
How to compare groups in a factorial design
Load data from a CSV or create some example data
from hypothesize.utilities import create_example_data
df=create_example_data(design_values=[2,3])
df.head()
cell_1_1 | cell_1_2 | cell_1_3 | cell_2_1 | cell_2_2 | cell_2_3 | |
---|---|---|---|---|---|---|
0 | 0.0446518 | 0.90675 | 0.795696 | 0.519486 | 0.333636 | 0.232153 |
1 | 0.763458 | 0.291555 | 0.84158 | 0.0339891 | 0.511235 | 0.732503 |
2 | 0.71039 | 0.59828 | 0.110407 | 0.898072 | 0.769496 | 0.0484005 |
3 | 0.175208 | 0.268073 | 0.888728 | 0.287442 | 0.100153 | 0.210394 |
4 | 0.957819 | 0.222688 | 0.834161 | 0.599158 | 0.655308 | 0.203486 |
Import the desired function and pass in the data
- This example uses a 2-by-3 design
- One approach is to use a set of linear contrasts that will test all main effects and interactions
- Then, the bootstrap-t method and the 20% trimmed mean can be used
- The results are a dictionary of DataFrames that contain various statistics for each factor and the interactions
from hypothesize.compare_groups_with_two_factors import bwmcp
results=bwmcp(J=2, K=3, x=df)
results['factor_A']
con_num | psihat | se | test | crit_value | p_value | |
---|---|---|---|---|---|---|
0 | 0 | 0.0393584 | 0.169849 | 0.231726 | 3.35959 | 0.941569 |
results['factor_B']
con_num | psihat | se | test | crit_value | p_value | |
---|---|---|---|---|---|---|
0 | 0 | -0.104506 | 0.126135 | -0.828529 | 2.4329 | 0.452421 |
1 | 1 | -0.0931364 | 0.151841 | -0.613382 | 2.4329 | 0.552588 |
2 | 2 | 0.01137 | 0.135392 | 0.0839783 | 2.4329 | 0.923205 |
results['factor_AB']
con_num | psihat | se | test | crit_value | p_value | |
---|---|---|---|---|---|---|
0 | 0 | -0.100698 | 0.126135 | -0.798336 | 2.3771 | 0.410684 |
1 | 1 | -0.037972 | 0.151841 | -0.250078 | 2.3771 | 0.804674 |
2 | 2 | 0.0627261 | 0.135392 | 0.463291 | 2.3771 | 0.659432 |
How to compute a robust correlation
Load data from a CSV or create some example data
from hypothesize.utilities import create_example_data
df=create_example_data(design_values=2)
df.head()
cell_1 | cell_2 | |
---|---|---|
0 | 0.0446518 | 0.90675 |
1 | 0.763458 | 0.291555 |
2 | 0.71039 | 0.59828 |
3 | 0.175208 | 0.268073 |
4 | 0.957819 | 0.222688 |
Import the desired function and pass in the data for each group
- One approach is to winsorize the x and y data
- A heteroscedastic method for testing zero correlation is also provided in this package but not shown here
- Please see the function
corb
which uses the percentile bootstrap to compute a 1-alpha CI and p_value for any correlation - The output is a dictionary containing various statistics (the winsorized correlation, winsorized covariance, etc...)
from hypothesize.measuring_associations import wincor
results=wincor(df.Group_1, df.Group_2)
print(results['wcor'])
-0.05690314435050796
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hypothesize-0.1.dev18.tar.gz
(21.4 kB
view hashes)
Built Distribution
Close
Hashes for hypothesize-0.1.dev18-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6cb6c16f158fe3144f46d52e8df3db858d5c8406551c2719d25f3e4cce0889f |
|
MD5 | 9331fb6c0d97bb8888ce91948c84391f |
|
BLAKE2b-256 | 4af95f8894d4583effdd7fbcc93644a1034be7368dc5886214717f89a18c0a8f |