Skip to main content

Deep Learning for Single-cell Analysis

Project description


PyPI version License Documentation Status Test Examples

DANCE is a Python toolkit to support deep learning models for analyzing single-cell gene expression at scale. It includes three modules at present:

  1. Single-modality analysis
  2. Single-cell multimodal omics
  3. Spatially resolved transcriptomics

Our goal is to build up a deep learning community for single cell analysis and provide GNN based architecture for users for further development in single cell analysis.

Usage

Overview

In release 1.0, the main usage of the PyDANCE is to provide readily available experiment reproduction (see detail information about the reproduced performance below). Users can easily reproduce selected experiments presented in the original papers for the computational single-cell methods implemented in PyDANCE, which can be found under examples/.

Motivation

Computational methods for single-cell analysis are quickly emerging, and the field is revolutionizing the usage of single-cell data to gain biological insights. A key challenge to continually developing computational single-cell methods that achieve new state-of-the-art performance is reproducing previous benchmarks. More specifically, different studies prepare their datasets and perform evaluation differently, and not to mention the compatibility of different methods, as they could be written in different languages or using incompatible library versions.

PyDANCE addresses these challenges by providing a unified Python packge implementing many popular computational single-cell methods (see Implemented Algorithms), as well as easily reproducible experiments by providing unified tools for

  • Data downloading
  • Data (pre-)processing and transformation (e.g. graph construction)
  • Model training and evaluation

Example: runing cell-type annotation benchmark using scDeepSort

  • Step0. Install PyDANCE (see Installation)
  • Step1. Navigate to the folder containing the corresponding example scrtip. In this case, it is examples/single_modality/cell_type_annotation.
  • Step2. Obtain command line interface (CLI) options for a particular experiment to reproduce at the end of the script. For example, the CLI options for reproducing the Mouse Brain experiment is
    python scdeepsort.py --data_type scdeepsort --tissue Brain --test_data 2695
    
  • Step3. Wait for the experiment to finsh and check results.

Installation

Quick install

The full installation process might be a bit tedious and could involve some debugging when using CUDA enabled packages. Thus, we provide an install.sh script that simplifies the installation process, assuming the user have conda set up on their machines. The installation script creates a conda environment pydance and install the PyDANCE package along with all its dependencies with a apseicifc CUDA version. Currently, three options are accepted: cpu, cu102, and cu113. For example, to install the DANCE package using CUDA10.2, simply run:

git clone git@github.com:OmicsML/dance.git
cd dance

source install.sh cu102

Custom install


Step1. Setup environment

First create a conda environment for pydance (optional)

conda create -n pydance python=3.8 -y && conda activate dance-dev

Then, install CUDA enabled packages (PyTorch, PyG, DGL) with CUDA 10.2:

conda install pytorch=1.12.1 torchvision cudatoolkit=10.2 -c pytorch -y
conda install dgl-cu102 -c dglteam -y
pip install torch-geometric==2.1.0 torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-1.12.1+cu102.html

Alternatively, install these dependencies for CPU only:

conda install pytorch=1.12.1 torchvision cpuonly -c pytorch -y
conda install dgl -c dglteam
pip install torch-geometric==2.1.0 torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-1.12.1+cpu.html

Note: If you installed PyG using conda and encountered an issue with GLIBC_2.27 when importing torch_geometric.nn, then you may need to uninstall torch-spline-conv (see https://github.com/pyg-team/pytorch_geometric/issues/3593)

pip uninstall torch-spline-conv

For more information about installation or other CUDA version options, check out the installation pages for the corresponding packges

Step2. Install PyDANCE

Install from PyPI

pip install pydance

Install the latest dev version from source

git clone https://github.com/OmicsML/dance.git
cd dance
pip install -e .

Implemented Algorithms

P1 not covered in the first release

Single Modality Module

1)Imputation

BackBone Model Algorithm Year CheckIn
GNN GraphSCI Imputing Single-cell RNA-seq data by combining Graph Convolution and Autoencoder Neural Networks 2021
GNN scGNN (2020) SCGNN: scRNA-seq Dropout Imputation via Induced Hierarchical Cell Similarity Graph 2020 P1
GNN scGNN (2021) scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses 2021
GNN GNNImpute An efficient scRNA-seq dropout imputation method using graph attention network 2021 P1
Graph Diffusion MAGIC MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data 2018 P1
Probabilistic Model scImpute An accurate and robust imputation method scImpute for single-cell RNA-seq data 2018 P1
GAN scGAIN scGAIN: Single Cell RNA-seq Data Imputation using Generative Adversarial Networks 2019 P1
NN DeepImpute DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data 2019
NN + TF Saver-X Transfer learning in single-cell transcriptomics improves data denoising and pattern discovery 2019 P1
Model Evaluation Metric Mouse Brain (current/reported) Mouse Embryo (current/reported)
DeepImpute MSE 0.12 / N/A 0.12 / N/A
ScGNN MSE 0.47 / N/A 1.10 / N/A
GraphSCI MSE 0.42 / N/A 0.87 / N/A

Note: the data split modality of DeepImpute is different from ScGNN and GraphSCI, so the results are not comparable.

2)Cell Type Annotation

BackBone Model Algorithm Year CheckIn
GNN ScDeepsort Single-cell transcriptomics with weighted GNN 2021
Logistic Regression Celltypist Automated cell type annotation for scRNA-seq datasets 2021
Random Forest singleCellNet SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species 2019
Neural Network ACTINN ACTINN: automated identification of cell types in single cell RNA sequencing. 2020
Hierarchical Clustering SingleR Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. 2019 P1
SVM SVM A comparison of automatic cell identification methods for single-cell RNA sequencing data. 2018
Model Evaluation Metric Mouse Brain 2695 (current/reported) Mouse Spleen 1759 (current/reported) Mouse Kidney 203 (current/reported)
scDeepsort ACC 0.363/0.363 0.965 /0.965 0.901/0.911
Celltypist* ACC 0.680/0.666 0.966/0.848 0.879/0.832
singleCellNet ACC 0.693/0.803 0.975/0.975 0.795/0.842
ACTINN ACC 0.860/0.778 0.516/0.236 0.829/0.798
SVM ACC 0.683/0.683 0.056/0.049 0.704/0.695

Note: * Benchmark datasets were renormalied before running the original implementation of Celltypist to match its form requirements.

3)Clustering

BackBone Model Algorithm Year CheckIn
GNN graph-sc GNN-based embedding for clustering scRNA-seq data 2022
GNN scTAG ZINB-based Graph Embedding Autoencoder for Single-cell RNA-seq Interpretations 2022
GNN scDSC Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network 2022
GNN scGAC scGAC: a graph attentional architecture for clustering single-cell RNA-seq data 2022 P1
AutoEncoder scDeepCluster Clustering single-cell RNA-seq data with a model-based deep learning approach 2019
AutoEncoder scDCC Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data 2021
AutoEncoder scziDesk Deep soft K-means clustering with self-training for single-cell RNA sequence data 2020 P1
Model Evaluation Metric 10x PBMC (current/reported) Mouse ES (current/reported) Worm Neuron (current/reported) Mouse Bladder (current/reported)
graph-sc ARI 0.72 / 0.70 0.82 / 0.78 0.57 / 0.46 0.68 / 0.63
scDCC ARI 0.82 / 0.81 0.98 / N/A 0.51 / 0.58 0.60 / 0.66
scDeepCluster ARI 0.81 / 0.78 0.98 / 0.97 0.51 / 0.52 0.56 / 0.58
scDSC ARI 0.72 / 0.78 0.84 / N/A 0.46 / 0.65 0.65 / 0.72
scTAG ARI 0.75 / N/A 0.96 / N/A 0.53 / N/A 0.60 / N/A

Multimodality Module

1)Modality Prediction

BackBone Model Algorithm Year CheckIn
GNN ScMoGCN Graph Neural Networks for Multimodal Single-Cell Data Integration 2022
GNN ScMoLP Link Prediction Variant of ScMoGCN 2022 P1
GNN scGNN scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses 2021 P1
GNN GRAPE Handling Missing Data with Graph Representation Learning 2020 P1
Generative Model SCMM SCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS 2021
Auto-encoder Cross-modal autoencoders Multi-domain translation between single-cell imaging and sequencing data using autoencoders 2021
Auto-encoder BABEL BABEL enables cross-modality translation between multiomic profiles at single-cell resolution 2021
Model Evaluation Metric GEX2ADT (current/reported) ADT2GEX (current/reported) GEX2ATAC (current/reported) ATAC2GEX (current/reported)
ScMoGCN RMSE 0.3885 / 0.3885 0.3242 / 0.3242 0.1778 / 0.1778 0.2315 / 0.2315
SCMM RMSE 0.6264 / N/A 0.4458 / N/A 0.2163 / N/A 0.3730 / N/A
Cross-modal autoencoders RMSE 0.5725 / N/A 0.3585 / N/A 0.1917 / N/A 0.2551 / N/A
BABEL RMSE 0.4335 / N/A 0.3673 / N/A 0.1816 / N/A 0.2394 / N/A

2) Modality Matching

BackBone Model Algorithm Year CheckIn
GNN ScMoGCN Graph Neural Networks for Multimodal Single-Cell Data Integration 2022
GNN scGNN scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses 2021 P1
Generative Model SCMM SCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS 2021
Auto-encoder Cross-modal autoencoders Multi-domain translation between single-cell imaging and sequencing data using autoencoders 2021
Model Evaluation Metric GEX2ADT (current/reported) GEX2ATAC (current/reported)
ScMoGCN Accuracy 0.0827 / 0.0810 0.0600 / 0.0630
SCMM Accuracy 0.005 / N/A 5e-5 / N/A
Cross-modal autoencoders Accuracy 0.0002 / N/A 0.0002 / N/A

3) Joint Embedding

BackBone Model Algorithm Year CheckIn
GNN ScMoGCN Graph Neural Networks for Multimodal Single-Cell Data Integration 2022
Auto-encoder scMVAE Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data 2020
Auto-encoder scDEC Simultaneous deep generative modelling and clustering of single-cell genomic data 2021
GNN/Auto-ecnoder GLUE Multi-omics single-cell data integration and regulatory inference with graph-linked embedding 2021 P1
Auto-encoder DCCA Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data 2021
Model Evaluation Metric GEX2ADT (current/reported) GEX2ATAC (current/reported)
ScMoGCN ARI 0.706 / N/A 0.702 / N/A
ScMoGCNv2 ARI 0.734 / N/A N/A / N/A
scMVAE ARI 0.499 / N/A 0.577 / N/A
scDEC(JAE) ARI 0.705 / N/A 0.735 / N/A
DCCA ARI 0.35 / N/A 0.381 / N/A

4) Multimodal Imputation

BackBone Model Algorithm Year CheckIn
GNN ScMoLP Link Prediction Variant of ScMoGCN 2022 P1
GNN scGNN scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses 2021 P1
GNN GRAPE Handling Missing Data with Graph Representation Learning 2020 P1

5) Multimodal Integration

BackBone Model Algorithm Year CheckIn
GNN ScMoGCN Graph Neural Networks for Multimodal Single-Cell Data Integration 2022 P1
GNN scGNN scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses (GCN on Nearest Neighbor graph) 2021 P1
Nearest Neighbor WNN Integrated analysis of multimodal single-cell data 2021 P1
GAN MAGAN MAGAN: Aligning Biological Manifolds 2018 P1
Auto-encoder SCIM SCIM: universal single-cell matching with unpaired feature sets 2020 P1
Auto-encoder MultiMAP MultiMAP: Dimensionality Reduction and Integration of Multimodal Data 2021 P1
Generative Model SCMM SCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS 2021 P1

Spatial Module

1)Spatial Domain

BackBone Model Algorithm Year CheckIn
GNN SpaGCN SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network 2021
GNN STAGATE Deciphering spatial domains from spatially resolved transcriptomics with adaptive graph attention auto-encoder 2021
Bayesian BayesSpace Spatial transcriptomics at subspot resolution with BayesSpace 2021 P1
Pseudo-space-time (PST) Distance stLearn stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues 2020
Heuristic Louvain Fast unfolding of community hierarchies in large networks 2008
Model Evaluation Metric 151673 (current/reported) 151676 (current/reported) 151507 (current/reported)
SpaGCN ARI 0.51 / 0.522 0.41 / N/A 0.45 / N/A
STAGATE ARI 0.59 / N/A 0.60 / 0.60 0.608 / N/A
stLearn ARI 0.30 / 0.36 0.29 / N/A 0.31 / N/A
Louvain ARI 0.31 / 0.33 0.2528 / N/A 0.28 / N/A

2)Cell Type Deconvolution

BackBone Model Algorithm Year CheckIn
GNN DSTG DSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence 2021
logNormReg SpatialDecon Advances in mixed cell deconvolution enable quantification of cell types in spatial transcriptomic data 2022
NNMFreg SPOTlight SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes 2021
NN Linear + CAR assumption CARD Spatially informed cell-type deconvolution for spatial transcriptomics 2022
Model Evaluation Metric GSE174746 (current/reported) CARD Synthetic (current/reported) SPOTlight Synthetic (current/reported)
DSTG MSE .172 / N/A .0247 / N/A .042 / N/A
SpatialDecon MSE .0014 / .009 .0077 / N/A .0055 / N/A
SPOTlight MSE .0098 / N/A .0246 / 0.118 .0109 / .16
CARD MSE .0012 / N/A .0078 / 0.0062 .0076 / N/A

Dev notes

Dev installation

Install PyDANCE with extra dependencies for dev

pip install -e ."[dev]"

Make sure dependencies have specific pinned versions

pip install -r requirements.txt

Install pre-commit hooks for code quality checks

pre-commit install

Run tests

Run all tests on current environment using pytest

pytest -v

Run full test from the ground up including environment set up using tox on CPU

tox -e python3.8-cpu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydance-1.0.0rc0.tar.gz (223.1 kB view details)

Uploaded Source

Built Distribution

pydance-1.0.0rc0-py3-none-any.whl (252.0 kB view details)

Uploaded Python 3

File details

Details for the file pydance-1.0.0rc0.tar.gz.

File metadata

  • Download URL: pydance-1.0.0rc0.tar.gz
  • Upload date:
  • Size: 223.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for pydance-1.0.0rc0.tar.gz
Algorithm Hash digest
SHA256 43fce5ac5a51766da1eb3f66536dff479c22ec9ea985066580530a15cf90c50f
MD5 16550b58e3747d660dc0ae360838ec9d
BLAKE2b-256 d038a0a7d36cc860e39ca2bef0454269314f1fad0c60284ce09c4edef9e54ab2

See more details on using hashes here.

File details

Details for the file pydance-1.0.0rc0-py3-none-any.whl.

File metadata

  • Download URL: pydance-1.0.0rc0-py3-none-any.whl
  • Upload date:
  • Size: 252.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for pydance-1.0.0rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 c00fd8976ad8a0de971e1235a0ed2a617283b77f579d7e5d85071c4fbb3cac19
MD5 846c0c9fad94327b73d0bb8aa38e34aa
BLAKE2b-256 728424a3699248c717bbfce0866655333790ff86469d4d139d9667d56aae6bf5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page