Skip to main content

The Clustermatch Correlation Coefficient (CCC) is a highly-efficient, next-generation not-only-linear correlation coefficient that can work on numerical and categorical data types.

Project description

Clustermatch on gene expression data (code)

Code tests codecov HTML Manuscript PDF Manuscript

Overview

TODO: update description and links to manuscripts

This repository contains the source code to reproduce the analyses of Clustermatch on gene expression data. If you want to use Clustermatch as a standalone tool to perform your own analyses, please go to the official repository and follow the installation instructions.

For more details, check out our manuscript in COMPLETE or our Manubot web version.

Setup

To prepare the environment to run the analyses, follow the steps in environment. This will create a conda environment and download the necessary data. Alternatively, you can use our Docker image (see below).

Running code

From command-line

First, activate your conda environment and export your settings to environmental variables so non-Python scripts can access them:

conda activate clustermatch_gene_expr
eval `python libs/conf.py`

The code to preprocess data and generate results is in the nbs/ folder. All notebooks are organized by directories, such as 01_preprocessing, with file names that indicate the order in which they should be run (if they share the prefix, then it means they can be run in parallel). For example, to run all notebooks for the preprocessing step, you can use this command (requires GNU Parallel):

cd nbs/
parallel -k --lb --halt 2 -j1 'bash run_nbs.sh {}' ::: 01_preprocessing/*.ipynb

From your browser

Alternatively, you can start your JupyterLab server by running:

bash scripts/run_nbs_server.sh

Then, go to http://localhost:8892, browse the nbs folder, and run the notebooks in the specified order.

Using Docker

You can also run all the steps below using a Docker image instead of a local installation.

docker pull miltondp/clustermatch_gene_expr

The image only contains the conda environment with the code in this repository so, after pulling the image, you need to download the data as well:

mkdir -p /tmp/clustermatch_gene_expr_data

docker run --rm \
  -v "/tmp/clustermatch_gene_expr_data:/opt/data" \
  --user "$(id -u):$(id -g)" \
  miltondp/clustermatch_gene_expr \
  /bin/bash -c "python environment/scripts/setup_data.py"

The -v parameter allows specifying a local directory (/tmp/clustermatch_gene_expr_data) where the data will be downloaded. If you want to generate the figures and tables for the manuscript, you need to clone the manuscript repo and pass it with -v [PATH_TO_MANUSCRIPT_REPO]:/opt/manuscript. If you want to change any other setting, you can set environmental variables when running the container; for example, to change the number of cores used to 2: -e CM_N_JOBS=2.

You can run notebooks from the command line, for example:

docker run --rm \
  -v "/tmp/clustermatch_gene_expr_data:/opt/data" \
  --user "$(id -u):$(id -g)" \
  miltondp/clustermatch_gene_expr \
  /bin/bash -c "parallel -k --lb --halt 2 -j1 'bash nbs/run_nbs.sh {}' ::: nbs/05_preprocessing/*.ipynb"

or start a Jupyter Notebook server with:

docker run --rm \
  -p 8888:8893 \
  -v "/tmp/clustermatch_gene_expr_data:/opt/data" \
  --user "$(id -u):$(id -g)" \
  miltondp/clustermatch_gene_expr

and access the interface by going to http://localhost:8888.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ccc-coef-0.1.3.tar.gz (19.9 kB view details)

Uploaded Source

File details

Details for the file ccc-coef-0.1.3.tar.gz.

File metadata

  • Download URL: ccc-coef-0.1.3.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for ccc-coef-0.1.3.tar.gz
Algorithm Hash digest
SHA256 789fc5b93ba0cbfdc0ec2ce1dd16bbe98e44bacf6bdc21f6f88d045d986f0c4c
MD5 b5eba27ad63fc6fb956cb3656add7375
BLAKE2b-256 3e2046c0e3ce205dea835fec12b3c2b922fc149a23f4da406b4276593d44cbdc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page