CorALS is an open-source software package for the construction and analysis of large-scale correlation networks for high-dimensional data.
Project description
CorALS
CorALS is an open-source software package for the construction and analysis of large-scale correlation networks for high-dimensional data.
If you use CorALS for a scientific publication, please cite:
Becker, M., Nassar, H., Espinosa, C. et al.
Large-scale correlation network construction for unraveling the coordination of complex biological systems.
Nat Comput Sci (2023).
https://doi.org/10.1038/s43588-023-00429-y
Install
pip install corals
Quick start
The following quick start examples can also be found in an executable notebook.
Note: If any of the following examples do not work, check the previously mentioned executable notebook as well. It is tested automatically, and this README
may not have been updated.
Prepare parallelization
Before running anything, we make sure that numpy
will not oversubscribe CPUs and slow things down.
Note that this has to be executed before importing numpy
.
- For full correlation matrix calculation, setting
n_threads > 1
can be used to parallelize the calculation. - For the top-k approaches, setting
n_threads=1
makes the most sense, since parallelization is specified separately.
from corals.threads import set_threads_for_external_libraries
set_threads_for_external_libraries(n_threads=1)
Load data
Create some data (alternatively load your own):
import numpy as np
# create random data
n_features = 20000
n_samples = 50
X = np.random.random((n_samples, n_features))
Full correlation matrix computation
# runtime: ~2 sec
from corals.correlation.full.base import cor_full
cor_values = cor_full(X)
Top-k correlation matrix computation using Spearman correlation
# runtime: ~5 sec with `n_jobs=8`
from corals.correlation.topk.base import cor_topk
cor_topk_result = cor_topk(X, k=0.001, correlation_type="spearman", n_jobs=8)
Top-k differential correlation matrix computation using Spearman correlation
# generate some more data
X1 = X
X2 = np.random.random((n_samples, n_features))
# runtime: ~5 sec with `n_jobs=8`
from corals.correlation.topkdiff.base import cor_topkdiff
cor_topkdiff_result = cor_topkdiff(X1, X2, k=0.001, correlation_type="spearman", n_jobs=8)
Calculating p-values
# reusing correlation from the top-k example
# runtime: ~20 sec with `n_jobs=8`
from corals.correlation.topk.base import cor_topk
cor_topk_values, cor_topk_coo = cor_topk(X, correlation_type="spearman", k=0.001, n_jobs=8)
from corals.correlation.utils import derive_pvalues, multiple_test_correction
n_samples = X.shape[0]
n_features = X.shape[1]
# calculate p-values
pvalues = derive_pvalues(cor_topk_values, n_samples)
# multiple hypothesis correction
pvalues_corrected = multiple_test_correction(pvalues, n_features, method="fdr_bh")
Detailed examples
For detailed examples and recommendations, see the corresponding notebook.
The docs/notebooks
folder may contain additional examples and tutorials in the form of Jupyter Notebooks.
Quick setup for Jupyter notebooks.
export ENV_NAME=corals
conda create -n ${ENV_NAME} python=3.10
conda activate ${ENV_NAME}
pip install corals
conda install -c conda-forge jupyterlab # optional if Jupyter Lab is already installed
conda install -c conda-forge ipykernel
python -m ipykernel install --user --name ${ENV_NAME}
Development
TODO: add documentation for contributing new code / methods
Setup
git clone git@github.com:mgbckr/corals-lib-python.git
pip install -e .
Release
git tag -a x.x.x -m "Release x.x.x"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for corals-0.1.7-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07146461f1839aea0a1dbf4f2dc8abe859149b10edf4fbedf9c11ef4af0f3c31 |
|
MD5 | c9bc160d3150f98c2f37d32d3f6b7414 |
|
BLAKE2b-256 | 4b8b9e2bfe7f7452d3a7026afc44659d5a1d4ec96354f84f7137a2885ef29e40 |