Skip to main content

The Clustermatch Correlation Coefficient (CCC) is a highly-efficient, next-generation not-only-linear correlation coefficient that can work on numerical and categorical data types.

Project description

Clustermatch on gene expression data (code)

Code tests codecov HTML Manuscript PDF Manuscript

Overview

TODO: update description and links to manuscripts

This repository contains the source code to reproduce the analyses of Clustermatch on gene expression data. If you want to use Clustermatch as a standalone tool to perform your own analyses, please go to the official repository and follow the installation instructions.

For more details, check out our manuscript in COMPLETE or our Manubot web version.

Setup

To prepare the environment to run the analyses, follow the steps in environment. This will create a conda environment and download the necessary data. Alternatively, you can use our Docker image (see below).

Running code

From command-line

First, activate your conda environment and export your settings to environmental variables so non-Python scripts can access them:

conda activate clustermatch_gene_expr
eval `python libs/conf.py`

The code to preprocess data and generate results is in the nbs/ folder. All notebooks are organized by directories, such as 01_preprocessing, with file names that indicate the order in which they should be run (if they share the prefix, then it means they can be run in parallel). For example, to run all notebooks for the preprocessing step, you can use this command (requires GNU Parallel):

cd nbs/
parallel -k --lb --halt 2 -j1 'bash run_nbs.sh {}' ::: 01_preprocessing/*.ipynb

From your browser

Alternatively, you can start your JupyterLab server by running:

bash scripts/run_nbs_server.sh

Then, go to http://localhost:8892, browse the nbs folder, and run the notebooks in the specified order.

Using Docker

You can also run all the steps below using a Docker image instead of a local installation.

docker pull miltondp/clustermatch_gene_expr

The image only contains the conda environment with the code in this repository so, after pulling the image, you need to download the data as well:

mkdir -p /tmp/clustermatch_gene_expr_data

docker run --rm \
  -v "/tmp/clustermatch_gene_expr_data:/opt/data" \
  --user "$(id -u):$(id -g)" \
  miltondp/clustermatch_gene_expr \
  /bin/bash -c "python environment/scripts/setup_data.py"

The -v parameter allows specifying a local directory (/tmp/clustermatch_gene_expr_data) where the data will be downloaded. If you want to generate the figures and tables for the manuscript, you need to clone the manuscript repo and pass it with -v [PATH_TO_MANUSCRIPT_REPO]:/opt/manuscript. If you want to change any other setting, you can set environmental variables when running the container; for example, to change the number of cores used to 2: -e CM_N_JOBS=2.

You can run notebooks from the command line, for example:

docker run --rm \
  -v "/tmp/clustermatch_gene_expr_data:/opt/data" \
  --user "$(id -u):$(id -g)" \
  miltondp/clustermatch_gene_expr \
  /bin/bash -c "parallel -k --lb --halt 2 -j1 'bash nbs/run_nbs.sh {}' ::: nbs/05_preprocessing/*.ipynb"

or start a Jupyter Notebook server with:

docker run --rm \
  -p 8888:8893 \
  -v "/tmp/clustermatch_gene_expr_data:/opt/data" \
  --user "$(id -u):$(id -g)" \
  miltondp/clustermatch_gene_expr

and access the interface by going to http://localhost:8888.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ccc-coef-0.1.5.tar.gz (20.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page