The Clustermatch Correlation Coefficient (CCC) is a highly-efficient, next-generation not-only-linear correlation coefficient that can work on numerical and categorical data types.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Clustermatch on gene expression data (code)

Overview

TODO: update description and links to manuscripts

This repository contains the source code to reproduce the analyses of Clustermatch on gene expression data. If you want to use Clustermatch as a standalone tool to perform your own analyses, please go to the official repository and follow the installation instructions.

For more details, check out our manuscript in COMPLETE or our Manubot web version.

Setup

To prepare the environment to run the analyses, follow the steps in environment. This will create a conda environment and download the necessary data. Alternatively, you can use our Docker image (see below).

Running code

From command-line

First, activate your conda environment and export your settings to environmental variables so non-Python scripts can access them:

conda activate clustermatch_gene_expr
eval `python libs/conf.py`

The code to preprocess data and generate results is in the nbs/ folder. All notebooks are organized by directories, such as 01_preprocessing, with file names that indicate the order in which they should be run (if they share the prefix, then it means they can be run in parallel). For example, to run all notebooks for the preprocessing step, you can use this command (requires GNU Parallel):

cd nbs/
parallel -k --lb --halt 2 -j1 'bash run_nbs.sh {}' ::: 01_preprocessing/*.ipynb

From your browser

Alternatively, you can start your JupyterLab server by running:

bash scripts/run_nbs_server.sh

Then, go to http://localhost:8892, browse the nbs folder, and run the notebooks in the specified order.

Using Docker

You can also run all the steps below using a Docker image instead of a local installation.

docker pull miltondp/clustermatch_gene_expr

The image only contains the conda environment with the code in this repository so, after pulling the image, you need to download the data as well:

mkdir -p /tmp/clustermatch_gene_expr_data

docker run --rm \
  -v "/tmp/clustermatch_gene_expr_data:/opt/data" \
  --user "$(id -u):$(id -g)" \
  miltondp/clustermatch_gene_expr \
  /bin/bash -c "python environment/scripts/setup_data.py"

The -v parameter allows specifying a local directory (/tmp/clustermatch_gene_expr_data) where the data will be downloaded. If you want to generate the figures and tables for the manuscript, you need to clone the manuscript repo and pass it with -v [PATH_TO_MANUSCRIPT_REPO]:/opt/manuscript. If you want to change any other setting, you can set environmental variables when running the container; for example, to change the number of cores used to 2: -e CM_N_JOBS=2.

You can run notebooks from the command line, for example:

docker run --rm \
  -v "/tmp/clustermatch_gene_expr_data:/opt/data" \
  --user "$(id -u):$(id -g)" \
  miltondp/clustermatch_gene_expr \
  /bin/bash -c "parallel -k --lb --halt 2 -j1 'bash nbs/run_nbs.sh {}' ::: nbs/05_preprocessing/*.ipynb"

or start a Jupyter Notebook server with:

docker run --rm \
  -p 8888:8893 \
  -v "/tmp/clustermatch_gene_expr_data:/opt/data" \
  --user "$(id -u):$(id -g)" \
  miltondp/clustermatch_gene_expr

and access the interface by going to http://localhost:8888.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.7

Sep 6, 2023

0.1.6

Jun 24, 2022

This version

0.1.5

Jun 22, 2022

0.1.4

Jun 21, 2022

0.1.3

Jun 21, 2022

0.1.2

Jun 21, 2022

0.1.1

Jun 21, 2022

0.1.0

Jun 21, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ccc-coef-0.1.5.tar.gz (20.4 kB view hashes)

Uploaded Jun 22, 2022 Source

Hashes for ccc-coef-0.1.5.tar.gz

Hashes for ccc-coef-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`c247b191868a0447066fec5af54929f608c1ad8f7f83a1089c71ee2e64e800c6`
MD5	`3df7d6161968746c7e3bfa3c01c659de`
BLAKE2b-256	`cdb9d417a0d6a57d7259c5c547f7087c2ddd4d3162c7301b18953680e4d989fd`