The Clustermatch Correlation Coefficient (CCC) is a highly-efficient, next-generation not-only-linear correlation coefficient that can work on numerical and categorical data types.
Project description
Clustermatch on gene expression data (code)
Overview
TODO: update description and links to manuscripts
This repository contains the source code to reproduce the analyses of Clustermatch on gene expression data. If you want to use Clustermatch as a standalone tool to perform your own analyses, please go to the official repository and follow the installation instructions.
For more details, check out our manuscript in COMPLETE or our Manubot web version.
Setup
To prepare the environment to run the analyses, follow the steps in environment. This will create a conda environment and download the necessary data. Alternatively, you can use our Docker image (see below).
Running code
From command-line
First, activate your conda environment and export your settings to environmental variables so non-Python scripts can access them:
conda activate clustermatch_gene_expr
eval `python libs/conf.py`
The code to preprocess data and generate results is in the nbs/
folder.
All notebooks are organized by directories, such as 01_preprocessing
, with file names that indicate the order in which they should be run (if they share the prefix, then it means they can be run in parallel).
For example, to run all notebooks for the preprocessing step, you can use this command (requires GNU Parallel):
cd nbs/
parallel -k --lb --halt 2 -j1 'bash run_nbs.sh {}' ::: 01_preprocessing/*.ipynb
From your browser
Alternatively, you can start your JupyterLab server by running:
bash scripts/run_nbs_server.sh
Then, go to http://localhost:8892
, browse the nbs
folder, and run the notebooks in the specified order.
Using Docker
You can also run all the steps below using a Docker image instead of a local installation.
docker pull miltondp/clustermatch_gene_expr
The image only contains the conda environment with the code in this repository so, after pulling the image, you need to download the data as well:
mkdir -p /tmp/clustermatch_gene_expr_data
docker run --rm \
-v "/tmp/clustermatch_gene_expr_data:/opt/data" \
--user "$(id -u):$(id -g)" \
miltondp/clustermatch_gene_expr \
/bin/bash -c "python environment/scripts/setup_data.py"
The -v
parameter allows specifying a local directory (/tmp/clustermatch_gene_expr_data
) where the data will be downloaded.
If you want to generate the figures and tables for the manuscript, you need to clone the manuscript repo and pass it with -v [PATH_TO_MANUSCRIPT_REPO]:/opt/manuscript
.
If you want to change any other setting, you can set environmental variables when running the container; for example, to change the number of cores used to 2: -e CM_N_JOBS=2
.
You can run notebooks from the command line, for example:
docker run --rm \
-v "/tmp/clustermatch_gene_expr_data:/opt/data" \
--user "$(id -u):$(id -g)" \
miltondp/clustermatch_gene_expr \
/bin/bash -c "parallel -k --lb --halt 2 -j1 'bash nbs/run_nbs.sh {}' ::: nbs/05_preprocessing/*.ipynb"
or start a Jupyter Notebook server with:
docker run --rm \
-p 8888:8893 \
-v "/tmp/clustermatch_gene_expr_data:/opt/data" \
--user "$(id -u):$(id -g)" \
miltondp/clustermatch_gene_expr
and access the interface by going to http://localhost:8888
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ccc-coef-0.1.3.tar.gz
.
File metadata
- Download URL: ccc-coef-0.1.3.tar.gz
- Upload date:
- Size: 19.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 789fc5b93ba0cbfdc0ec2ce1dd16bbe98e44bacf6bdc21f6f88d045d986f0c4c |
|
MD5 | b5eba27ad63fc6fb956cb3656add7375 |
|
BLAKE2b-256 | 3e2046c0e3ce205dea835fec12b3c2b922fc149a23f4da406b4276593d44cbdc |