Skip to main content

A package for Bayesian inference of correlations

Project description

pybaycor

It's all you knead

Pybaycor ("Pie Baker") is package for estimating Bayesian correlation coefficients with python. It reimplements the "Bayesian First Aid" robust and non-robust Bayesian correlation coefficients in python using PyMC3. It should natively work for datasets with more than 2 features (although that hasn't been tested yet, both for the fitting and plotting routines). It also provides hierarchical inference of correlations in the presence of measurement uncertainty, based on Matzke et al. (2017), who in turn based their approach on Behseta et al. (2009). This package extends their approaches using multivariate Student's t distributions to provide robust alternatives to the methods they lay out.

Installation:

Pybaycor can be installed with pip:

pip install pybaycor

Dependencies:

Pybaycor depends on a small number of packages:

  • Numpy
  • Matplotlib
  • Seaborn
  • PyMC3
  • Arviz
  • xarray

Usage:

Pybaycor implements a number of classes for different kinds of inference. The most basic of these is the BayesianCorrelation class. This class can be used quite straightfowardly to infer correlations with a multi-dimensional dataset with no measurement uncertainty:

import pybaycor as pbc
baycor = pbc.BayesianCorrelation(data=data) #where data is an (n_points, n_dimensions) array or array_like
baycor.fit(steps=1000, tune=1000) #do MCMC to infer the correlations
baycor.summarise() #print out a summary of the posteriors from the the MCMC
baycor.plot_trace(show=True) #Plot the trace and marginal distributions
baycor.plot_data(show=True) #Plot the original data with the 2-sigma ellipse superimposed on it
baycor.plot_corner(show=True) #Plot the 1D and 2D marginal distributions of the multivariate distrib.ution

The summary table will contain rows chol_corr, which indicate the summary statistics for the correlation coefficients. This includes the posterior mean and 2-sigma credible interval, as well as Rhat for the chains. The chol_corr[i,i] rows should all give means of 1 and standard deviations of 0, while the chol_corr[i,j] rows are the rows of interest. Remember that the covariance matrix is symmetrical, so chol_corr[0,1] == chol_cor[1,0] and you only need to read off one of those rows.

The (recommended) robust interface is available through the RobustBayesianCorrelation class. This is invoked identically to the basic class, and uses a multivariate t distribution to reduce the influence of outliers. As a result, there is an additional hyperparameter nu, the number of degrees of freedom. Like all hyperparameters in pybaycor, this is chosen to follow a weakly-informative prior. The other methods (fit, summarise, plot_trace and plot_data) work identically and transparantly in the robust case as well. However, at present the plot_data() method only works for 2-dimensional correlations.

If your data has uncertain measurements, however, these classes are not appropriate. For that purpose, pybaycor implements hierarchical equivalents that perform joint inference on the data and the correlation to determine the distribution of true correlations given the diluted, observed correlation. Once again, both robust and non-robust interfaces are available, and I recommend the robust interface although the runtime for fit() is roughly 5-times longer. This can be invoked as:

baycor = pbc.HierarchicalRobustBayesianCorrelation(data=data, sigma=sigma) #where data and sigma are (n_points, n_dimensions) arrays or array_like
baycor.fit(steps=1000, tune=1000)

Because this approach introduces n_dimensions parameters per data point, it can be difficult to read the summary. I'm working on improving the default formatting to make this useful, but you can always

summary = baycor.summarise()

to access the dataframe directly and extract useful parameters. You can also call the plotting routines in exactly the same way as above. If you use plot_data() for uncertain data, it will show you the original data, the inferred data based on the dilution of any correlation by the uncertainty, and the ellipse representing the 2-sigma region.

Future work

  • Implement inference of correlations when only some features are uncertain.
  • Implement inference of correlations with censored data (if this is possible?)
  • Improve plotting and output

Community input is most welcome!

Relevant citations:

@inbook{inbook,
author = {Gelman, Andrew and Hill, Jennifer},
year = {2006},
month = {11},
pages = {},
title = {Data Analysis Using Regression And Multilevel/Hierarchical Models},
volume = {3},
isbn = {0521867061},
journal = {Cambridge Universty Press},
doi = {10.1017/CBO9780511790942}
}

@article{doi:10.1152/jn.90727.2008,
author = {Behseta, Sam and Berdyyeva, Tamara and Olson, Carl R. and Kass, Robert E.},
title = {Bayesian Correction for Attenuation of Correlation in Multi-Trial Spike Count Data},
journal = {Journal of Neurophysiology},
volume = {101},
number = {4},
pages = {2186-2193},
year = {2009},
doi = {10.1152/jn.90727.2008},
    note ={PMID: 19129297},

URL = { 
        https://doi.org/10.1152/jn.90727.2008

},
eprint = { 
        https://doi.org/10.1152/jn.90727.2008

}
}

@article{10.1525/collabra.78,
    author = {Matzke, Dora and Ly, Alexander and Selker, Ravi and Weeda, Wouter D. and Scheibehenne, Benjamin and Lee, Michael D. and Wagenmakers, Eric-Jan},
    title = "{Bayesian Inference for Correlations in the Presence of Measurement Error and Estimation Uncertainty}",
    journal = {Collabra: Psychology},
    volume = {3},
    number = {1},
    year = {2017},
    month = {10},
    issn = {2474-7394},
    doi = {10.1525/collabra.78},
    url = {https://doi.org/10.1525/collabra.78},
    note = {25},
    eprint = {https://online.ucpress.edu/collabra/article-pdf/3/1/25/436268/78-1314-1-pb.pdf},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybaycor-0.2.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pybaycor-0.2.1-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file pybaycor-0.2.1.tar.gz.

File metadata

  • Download URL: pybaycor-0.2.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for pybaycor-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3130308839edba32c34bc0ead07e995a4c2106499527d0489af16457e23720a1
MD5 ed629e7f51e49aea432d6948948c525d
BLAKE2b-256 c26b5192b9523ef7e27b3872236ce059d7f554b0e0202f86a0962971cdcdb2ad

See more details on using hashes here.

File details

Details for the file pybaycor-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pybaycor-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.5.0.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for pybaycor-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3eb0726f8e4f21d2ac553a526aa1d72c9cebb53b1d53ce8bf077e0d3a071e6e
MD5 13425a6ac522eaaf93aa91bd001dd99a
BLAKE2b-256 5f768ef90d00466895ba194179501e7a5f27bfa8ea0d07abfec52a7293721350

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page