Skip to main content

Compute correlation coefficients with uncertainties

# pymccorrelation

A tool to calculate correlation coefficients for data, using bootstrapping and/or perturbation to estimate the uncertainties on the correlation coefficient. This was initially a python implementation of the Curran (2014) method for calculating uncertainties on Spearman's Rank Correlation Coefficient, but has since been expanded. Curran's original C implementation is `MCSpearman` (ASCL entry).

Currently the following correlation coefficients can be calculated (with bootstrapping and/or perturbation):

Kendall's tau can also calculated when some of the data are left/right censored, following the method described by Isobe+1986.

• python3
• scipy
• numpy

## Installation

`pymccorrelation` is available via PyPi and can be installed with:

``````pip install pymccorrelation
``````

## Usage

`pymccorrelation` exports a single function to the user (also called `pymccorrelation`).

``````from pymccorrelation import pymccorrelation

[... load your data ...]
``````

The correlation coefficient can be one of `pearsonr`, `spearmanr`, or `kendallt`.

For example, to compute the Pearson's r for a sample, using 1000 bootstrapping iterations to estimate the uncertainties:

``````res = pymccorrelation(data['x'], data['y'],
coeff='pearsonr',
Nboot=1000)
``````

The output, `res` is a tuple of length 2, and the two elements are:

• numpy array with the correlation coefficient (Pearson's r, in this case) percentiles (by default 16%, 50%, and 84%)
• numpy array with the p-value percentiles (by default 16%, 50%, and 84%)

The percentile ranges can be adjusted using the `percentiles` keyword argument.

Additionally, if the full posterior distribution is desired, that can be obtained by setting the `return_dist` keyword argument to `True`. In that case, `res` becomes a tuple of length four:

• numpy array with the correlation coefficient (Pearson's r, in this case) percentiles (by default 16%, 50%, and 84%)
• numpy array with the p-value percentiles (by default 16%, 50%, and 84%)
• numpy array with full set of correlation coefficient values from the bootstrapping
• numpy array with the full set of p-values computed from the bootstrapping

Please see the docstring for the full set of arguments and information including measurement uncertainties (necessary for point perturbation) and for marking censored data.

## Citing

If you use this script as part of your research, I encourage you to cite the following papers:

• Curran 2014: Describes the technique and application to Spearman's rank correlation coefficient
• Privon+ 2020: First use of this software, as `pymcspearman`.

Please also cite scipy and numpy.

If your work uses Kendall's tau with censored data please also cite:

• Isobe+ 1986: Censoring of data when computing Kendall's rank correlation coefficient.

## Release history Release notifications | RSS feed

This version 0.2.4 0.2.3 0.2.2 0.2.1

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pymccorrelation, version 0.2.4
Filename, size File type Python version Upload date Hashes
Filename, size pymccorrelation-0.2.4-py3-none-any.whl (19.9 kB) File type Wheel Python version py3 Upload date Hashes
Filename, size pymccorrelation-0.2.4.tar.gz (20.7 kB) File type Source Python version None Upload date Hashes