collection of utility functions for correlation analysis
Project description
korr
collection of utility functions for correlation analysis
Usage
Check the examples folder for notebooks.
Compute correlation matrix and its p-values
pearson – Pearson/Sample correlation (interval- and ratio-scale data)
kendall – Kendall’s tau rank correlation (ordinal data)
spearman – Spearman rho rank correlation (ordinal data)
mcc – Matthews correlation coefficient between binary variables
EDA, Dig deeper into results
flatten – A table (pandas) with one row for each correlation pairs with the variable indicies, corr., p-value. For example, try to find “good” cutoffs with corr_vs_pval and then look up the variable indicies with flatten afterwards.
slice_yx – slice a correlation and p-value matrix of a (y,X) dataset into a (y,x_i) vector and (x_j, x_k) matrices
corr_vs_pval – Histogram to find p-value cutoffs (alpha) for a) highly correlated pairs, b) unrelated pairs, c) the mixed results.
bracket_pval – Histogram with more fine-grained p-value brackets.
corrgram – Correlogram, heatmap of correlations with p-values in brackets
Utility functions
confusion – Confusion matrix. Required for Matthews correlation (mcc) and is a bitter faster than sklearn’s
Parameter Stability
bootcorr – Estimate multiple correlation matrices based on bootstrapped samples. From there you can assess how stable correlation estimates are (how sensitive against in-sample variation). For example, stable estimates are good candidates for modeling, and unstable correlation pairs are good candidates for P-hacking and non-reproducibility.
Variable Selection, Search Functions
mincorr – From all estimated correlation pairs, pick a given n=3,5,.. of variables with low and insignificant correlations among each other. (See binsel package for an application.)
find_best – Find the N “best”, i.e. high and most significant, correlations
find_worst – Find the N “worst”, i.e. insignificant/random and low, correlations
find_unrelated – Return variable indicies of unrelated pairs (in terms of insignificant p-value)
Appendix
Installation
The korr git repo is available as PyPi package
pip install korr
Install a virtual environment
python3.7 -m venv .venv source .venv/bin/activate pip install --upgrade pip pip install -r requirements.txt --no-cache-dir pip install -r requirements-dev.txt --no-cache-dir pip install -r requirements-demo.txt --no-cache-dir
(If your git repo is stored in a folder with whitespaces, then don’t use the subfolder .venv. Use an absolute path without whitespaces.)
Commands
Check syntax: flake8 --ignore=F401
Run Unit Tests: pytest
Remove .pyc files: find . -type f -name "*.pyc" | xargs rm
Remove __pycache__ folders: find . -type d -name "__pycache__" | xargs rm -rf
Publish
pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist
twine upload -r pypi dist/*
Support
Please open an issue for support.
Contributing
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.