Deep Learning for Single-cell Analysis
Project description
DANCE is a Python toolkit to support deep learning models for analyzing single-cell gene expression at scale. It includes three modules at present:
- Single-modality analysis
- Single-cell multimodal omics
- Spatially resolved transcriptomics
Our goal is to build up a deep learning community for single cell analysis and provide GNN based architecture for users for further development in single cell analysis.
Usage
Overview
In release 1.0, the main usage of the PyDANCE is to provide readily available experiment reproduction
(see detail information about the reproduced performance below).
Users can easily reproduce selected experiments presented in the original papers for the computational single-cell methods implemented in PyDANCE, which can be found under examples/
.
Motivation
Computational methods for single-cell analysis are quickly emerging, and the field is revolutionizing the usage of single-cell data to gain biological insights. A key challenge to continually developing computational single-cell methods that achieve new state-of-the-art performance is reproducing previous benchmarks. More specifically, different studies prepare their datasets and perform evaluation differently, and not to mention the compatibility of different methods, as they could be written in different languages or using incompatible library versions.
PyDANCE addresses these challenges by providing a unified Python packge implementing many popular computational single-cell methods (see Implemented Algorithms), as well as easily reproducible experiments by providing unified tools for
- Data downloading
- Data (pre-)processing and transformation (e.g. graph construction)
- Model training and evaluation
Example: runing cell-type annotation benchmark using scDeepSort
- Step0. Install PyDANCE (see Installation)
- Step1. Navigate to the folder containing the corresponding example scrtip.
In this case, it is
examples/single_modality/cell_type_annotation
. - Step2. Obtain command line interface (CLI) options for a particular experiment to reproduce at the end of the
script.
For example, the CLI options for reproducing the
Mouse Brain
experiment ispython scdeepsort.py --data_type scdeepsort --tissue Brain --test_data 2695
- Step3. Wait for the experiment to finsh and check results.
Installation
Quick install
The full installation process might be a bit tedious and could involve some debugging when using CUDA enabled packages.
Thus, we provide an install.sh
script that simplifies the installation process, assuming the user have conda set up on their machines.
The installation script creates a conda environment pydance
and install the PyDANCE package along with all its dependencies with a apseicifc CUDA version.
Currently, three options are accepted: cpu
, cu102
, and cu113
.
For example, to install the DANCE package using CUDA10.2, simply run:
git clone git@github.com:OmicsML/dance.git
cd dance
source install.sh cu102
Custom install
Step1. Setup environment
First create a conda environment for pydance (optional)
conda create -n pydance python=3.8 -y && conda activate dance-dev
Then, install CUDA enabled packages (PyTorch, PyG, DGL) with CUDA 10.2:
conda install pytorch=1.12.1 torchvision cudatoolkit=10.2 -c pytorch -y
conda install dgl-cu102 -c dglteam -y
pip install torch-geometric==2.1.0 torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-1.12.1+cu102.html
Alternatively, install these dependencies for CPU only:
conda install pytorch=1.12.1 torchvision cpuonly -c pytorch -y
conda install dgl -c dglteam
pip install torch-geometric==2.1.0 torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-1.12.1+cpu.html
Note: If you installed PyG using conda and encountered an issue with GLIBC_2.27
when importing torch_geometric.nn
,
then you may need to uninstall torch-spline-conv
(see https://github.com/pyg-team/pytorch_geometric/issues/3593)
pip uninstall torch-spline-conv
For more information about installation or other CUDA version options, check out the installation pages for the corresponding packges
Step2. Install PyDANCE
Install from PyPI
pip install pydance
Install the latest dev version from source
git clone https://github.com/OmicsML/dance.git
cd dance
pip install -e .
Implemented Algorithms
P1 not covered in the first release
Single Modality Module
1)Imputation
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | GraphSCI | Imputing Single-cell RNA-seq data by combining Graph Convolution and Autoencoder Neural Networks | 2021 | ✅ |
GNN | scGNN (2020) | SCGNN: scRNA-seq Dropout Imputation via Induced Hierarchical Cell Similarity Graph | 2020 | P1 |
GNN | scGNN (2021) | scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses | 2021 | ✅ |
GNN | GNNImpute | An efficient scRNA-seq dropout imputation method using graph attention network | 2021 | P1 |
Graph Diffusion | MAGIC | MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data | 2018 | P1 |
Probabilistic Model | scImpute | An accurate and robust imputation method scImpute for single-cell RNA-seq data | 2018 | P1 |
GAN | scGAIN | scGAIN: Single Cell RNA-seq Data Imputation using Generative Adversarial Networks | 2019 | P1 |
NN | DeepImpute | DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data | 2019 | ✅ |
NN + TF | Saver-X | Transfer learning in single-cell transcriptomics improves data denoising and pattern discovery | 2019 | P1 |
Model | Evaluation Metric | Mouse Brain (current/reported) | Mouse Embryo (current/reported) |
---|---|---|---|
DeepImpute | MSE | 0.12 / N/A | 0.12 / N/A |
ScGNN | MSE | 0.47 / N/A | 1.10 / N/A |
GraphSCI | MSE | 0.42 / N/A | 0.87 / N/A |
Note: the data split modality of DeepImpute is different from ScGNN and GraphSCI, so the results are not comparable.
2)Cell Type Annotation
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | ScDeepsort | Single-cell transcriptomics with weighted GNN | 2021 | ✅ |
Logistic Regression | Celltypist | Automated cell type annotation for scRNA-seq datasets | 2021 | ✅ |
Random Forest | singleCellNet | SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species | 2019 | ✅ |
Neural Network | ACTINN | ACTINN: automated identification of cell types in single cell RNA sequencing. | 2020 | ✅ |
Hierarchical Clustering | SingleR | Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. | 2019 | P1 |
SVM | SVM | A comparison of automatic cell identification methods for single-cell RNA sequencing data. | 2018 | ✅ |
Model | Evaluation Metric | Mouse Brain 2695 (current/reported) | Mouse Spleen 1759 (current/reported) | Mouse Kidney 203 (current/reported) |
---|---|---|---|---|
scDeepsort | ACC | 0.363/0.363 | 0.965 /0.965 | 0.901/0.911 |
Celltypist* | ACC | 0.680/0.666 | 0.966/0.848 | 0.879/0.832 |
singleCellNet | ACC | 0.693/0.803 | 0.975/0.975 | 0.795/0.842 |
ACTINN | ACC | 0.860/0.778 | 0.516/0.236 | 0.829/0.798 |
SVM | ACC | 0.683/0.683 | 0.056/0.049 | 0.704/0.695 |
Note: * Benchmark datasets were renormalied before running the original implementation of Celltypist to match its form requirements.
3)Clustering
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | graph-sc | GNN-based embedding for clustering scRNA-seq data | 2022 | ✅ |
GNN | scTAG | ZINB-based Graph Embedding Autoencoder for Single-cell RNA-seq Interpretations | 2022 | ✅ |
GNN | scDSC | Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network | 2022 | ✅ |
GNN | scGAC | scGAC: a graph attentional architecture for clustering single-cell RNA-seq data | 2022 | P1 |
AutoEncoder | scDeepCluster | Clustering single-cell RNA-seq data with a model-based deep learning approach | 2019 | ✅ |
AutoEncoder | scDCC | Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data | 2021 | ✅ |
AutoEncoder | scziDesk | Deep soft K-means clustering with self-training for single-cell RNA sequence data | 2020 | P1 |
Model | Evaluation Metric | 10x PBMC (current/reported) | Mouse ES (current/reported) | Worm Neuron (current/reported) | Mouse Bladder (current/reported) |
---|---|---|---|---|---|
graph-sc | ARI | 0.72 / 0.70 | 0.82 / 0.78 | 0.57 / 0.46 | 0.68 / 0.63 |
scDCC | ARI | 0.82 / 0.81 | 0.98 / N/A | 0.51 / 0.58 | 0.60 / 0.66 |
scDeepCluster | ARI | 0.81 / 0.78 | 0.98 / 0.97 | 0.51 / 0.52 | 0.56 / 0.58 |
scDSC | ARI | 0.72 / 0.78 | 0.84 / N/A | 0.46 / 0.65 | 0.65 / 0.72 |
scTAG | ARI | 0.75 / N/A | 0.96 / N/A | 0.53 / N/A | 0.60 / N/A |
Multimodality Module
1)Modality Prediction
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | ScMoGCN | Graph Neural Networks for Multimodal Single-Cell Data Integration | 2022 | ✅ |
GNN | ScMoLP | Link Prediction Variant of ScMoGCN | 2022 | P1 |
GNN | scGNN | scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses | 2021 | P1 |
GNN | GRAPE | Handling Missing Data with Graph Representation Learning | 2020 | P1 |
Generative Model | SCMM | SCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS | 2021 | ✅ |
Auto-encoder | Cross-modal autoencoders | Multi-domain translation between single-cell imaging and sequencing data using autoencoders | 2021 | ✅ |
Auto-encoder | BABEL | BABEL enables cross-modality translation between multiomic profiles at single-cell resolution | 2021 | ✅ |
Model | Evaluation Metric | GEX2ADT (current/reported) | ADT2GEX (current/reported) | GEX2ATAC (current/reported) | ATAC2GEX (current/reported) |
---|---|---|---|---|---|
ScMoGCN | RMSE | 0.3885 / 0.3885 | 0.3242 / 0.3242 | 0.1778 / 0.1778 | 0.2315 / 0.2315 |
SCMM | RMSE | 0.6264 / N/A | 0.4458 / N/A | 0.2163 / N/A | 0.3730 / N/A |
Cross-modal autoencoders | RMSE | 0.5725 / N/A | 0.3585 / N/A | 0.1917 / N/A | 0.2551 / N/A |
BABEL | RMSE | 0.4335 / N/A | 0.3673 / N/A | 0.1816 / N/A | 0.2394 / N/A |
2) Modality Matching
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | ScMoGCN | Graph Neural Networks for Multimodal Single-Cell Data Integration | 2022 | ✅ |
GNN | scGNN | scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses | 2021 | P1 |
Generative Model | SCMM | SCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS | 2021 | ✅ |
Auto-encoder | Cross-modal autoencoders | Multi-domain translation between single-cell imaging and sequencing data using autoencoders | 2021 | ✅ |
Model | Evaluation Metric | GEX2ADT (current/reported) | GEX2ATAC (current/reported) |
---|---|---|---|
ScMoGCN | Accuracy | 0.0827 / 0.0810 | 0.0600 / 0.0630 |
SCMM | Accuracy | 0.005 / N/A | 5e-5 / N/A |
Cross-modal autoencoders | Accuracy | 0.0002 / N/A | 0.0002 / N/A |
3) Joint Embedding
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | ScMoGCN | Graph Neural Networks for Multimodal Single-Cell Data Integration | 2022 | ✅ |
Auto-encoder | scMVAE | Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data | 2020 | ✅ |
Auto-encoder | scDEC | Simultaneous deep generative modelling and clustering of single-cell genomic data | 2021 | ✅ |
GNN/Auto-ecnoder | GLUE | Multi-omics single-cell data integration and regulatory inference with graph-linked embedding | 2021 | P1 |
Auto-encoder | DCCA | Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data | 2021 | ✅ |
Model | Evaluation Metric | GEX2ADT (current/reported) | GEX2ATAC (current/reported) |
---|---|---|---|
ScMoGCN | ARI | 0.706 / N/A | 0.702 / N/A |
ScMoGCNv2 | ARI | 0.734 / N/A | N/A / N/A |
scMVAE | ARI | 0.499 / N/A | 0.577 / N/A |
scDEC(JAE) | ARI | 0.705 / N/A | 0.735 / N/A |
DCCA | ARI | 0.35 / N/A | 0.381 / N/A |
4) Multimodal Imputation
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | ScMoLP | Link Prediction Variant of ScMoGCN | 2022 | P1 |
GNN | scGNN | scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses | 2021 | P1 |
GNN | GRAPE | Handling Missing Data with Graph Representation Learning | 2020 | P1 |
5) Multimodal Integration
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | ScMoGCN | Graph Neural Networks for Multimodal Single-Cell Data Integration | 2022 | P1 |
GNN | scGNN | scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses (GCN on Nearest Neighbor graph) | 2021 | P1 |
Nearest Neighbor | WNN | Integrated analysis of multimodal single-cell data | 2021 | P1 |
GAN | MAGAN | MAGAN: Aligning Biological Manifolds | 2018 | P1 |
Auto-encoder | SCIM | SCIM: universal single-cell matching with unpaired feature sets | 2020 | P1 |
Auto-encoder | MultiMAP | MultiMAP: Dimensionality Reduction and Integration of Multimodal Data | 2021 | P1 |
Generative Model | SCMM | SCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS | 2021 | P1 |
Spatial Module
1)Spatial Domain
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | SpaGCN | SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network | 2021 | ✅ |
GNN | STAGATE | Deciphering spatial domains from spatially resolved transcriptomics with adaptive graph attention auto-encoder | 2021 | ✅ |
Bayesian | BayesSpace | Spatial transcriptomics at subspot resolution with BayesSpace | 2021 | P1 |
Pseudo-space-time (PST) Distance | stLearn | stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues | 2020 | ✅ |
Heuristic | Louvain | Fast unfolding of community hierarchies in large networks | 2008 | ✅ |
Model | Evaluation Metric | 151673 (current/reported) | 151676 (current/reported) | 151507 (current/reported) |
---|---|---|---|---|
SpaGCN | ARI | 0.51 / 0.522 | 0.41 / N/A | 0.45 / N/A |
STAGATE | ARI | 0.59 / N/A | 0.60 / 0.60 | 0.608 / N/A |
stLearn | ARI | 0.30 / 0.36 | 0.29 / N/A | 0.31 / N/A |
Louvain | ARI | 0.31 / 0.33 | 0.2528 / N/A | 0.28 / N/A |
2)Cell Type Deconvolution
BackBone | Model | Algorithm | Year | CheckIn |
---|---|---|---|---|
GNN | DSTG | DSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence | 2021 | ✅ |
logNormReg | SpatialDecon | Advances in mixed cell deconvolution enable quantification of cell types in spatial transcriptomic data | 2022 | ✅ |
NNMFreg | SPOTlight | SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes | 2021 | ✅ |
NN Linear + CAR assumption | CARD | Spatially informed cell-type deconvolution for spatial transcriptomics | 2022 | ✅ |
Model | Evaluation Metric | GSE174746 (current/reported) | CARD Synthetic (current/reported) | SPOTlight Synthetic (current/reported) |
---|---|---|---|---|
DSTG | MSE | .172 / N/A | .0247 / N/A | .042 / N/A |
SpatialDecon | MSE | .0014 / .009 | .0077 / N/A | .0055 / N/A |
SPOTlight | MSE | .0098 / N/A | .0246 / 0.118 | .0109 / .16 |
CARD | MSE | .0012 / N/A | .0078 / 0.0062 | .0076 / N/A |
Dev notes
Dev installation
Install PyDANCE with extra dependencies for dev
pip install -e ."[dev]"
Make sure dependencies have specific pinned versions
pip install -r requirements.txt
Install pre-commit hooks for code quality checks
pre-commit install
Run tests
Run all tests on current environment using pytest
pytest -v
Run full test from the ground up including environment set up using tox on CPU
tox -e python3.8-cpu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pydance-1.0.0rc0.tar.gz
.
File metadata
- Download URL: pydance-1.0.0rc0.tar.gz
- Upload date:
- Size: 223.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43fce5ac5a51766da1eb3f66536dff479c22ec9ea985066580530a15cf90c50f |
|
MD5 | 16550b58e3747d660dc0ae360838ec9d |
|
BLAKE2b-256 | d038a0a7d36cc860e39ca2bef0454269314f1fad0c60284ce09c4edef9e54ab2 |
File details
Details for the file pydance-1.0.0rc0-py3-none-any.whl
.
File metadata
- Download URL: pydance-1.0.0rc0-py3-none-any.whl
- Upload date:
- Size: 252.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c00fd8976ad8a0de971e1235a0ed2a617283b77f579d7e5d85071c4fbb3cac19 |
|
MD5 | 846c0c9fad94327b73d0bb8aa38e34aa |
|
BLAKE2b-256 | 728424a3699248c717bbfce0866655333790ff86469d4d139d9667d56aae6bf5 |