An efficient, accurate and flexible method for single-cell data integration.
Project description
Portal
Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets
An efficient, accurate and flexible method for single-cell data integration.
Check out our manuscript in Nature Computational Science:
Reproducibility
We provide source codes for reproducing the experiments of the paper "Adversarial domain translation networks for fast and accurate integration of large-scale atlas-level single-cell datasets".
- Integration of mouse spleen datasets (we reproduce the result of performance metrics in this notebook as an example). Benchmarking.
- Integration of mouse marrow datasets.
- Integration of mouse bladder datasets.
- Integration of mouse brain cerebellum datasets.
- Integration of mouse brain hippocampus datasets.
- Integration of mouse brain thalamus datasets.
- Integration of human PBMC datasets (sensitivity analysis).
- Integration of entire mouse cell atlases from the Tablula Muris project.
- Integration of mouse brain scRNA-seq and snRNA-seq datasets.
- Integration of human PBMC scRNA-seq and human brain snRNA-seq datasets.
- Integration of scRNA-seq and scATAC-seq datasets.
- Integration of developmental trajectories.
- Integration of spermatogenesis differentiation process across multiple species. Gene lists from Ensembl Biomart (we only use genes that are assigned with the type "ortholog_one2one" in the lists): orthologues (human vs mouse), orthologues (human vs macaque).
Installation
To run Portal, please follow the installation instruction:
git clone https://github.com/YangLabHKUST/Portal.git
cd Portal
conda env update --f environment.yml
conda activate portal
Normally the installation time is less than 5 minutes.
Quick Start
Basic Usage
Starting with raw count matrices formatted as AnnData objects, Portal uses a standard pipline adopted by Seurat and Scanpy to preprocess data, followed by PCA for dimensionality reduction. After preprocessing, Portal can be trained via model.train()
.
import portal
import scanpy as sc
# read AnnData
adata_1 = sc.read_h5ad("adata_1.h5ad")
adata_2 = sc.read_h5ad("adata_2.h5ad")
model = portal.model.Model()
model.preprocess(adata_1, adata_2) # perform preprocess and PCA
model.train() # train the model
model.eval() # get integrated latent representation of cells
The evaluating procedure model.eval()
saves the integrated latent representation of cells in model.latent
, which can be used for downstream integrative analysis.
Parameters in portal.model.Model()
:
lambdacos
: Coefficient of the regularizer for preserving cosine similarity across domains. Default:20.0
.training_steps
: Number of steps for training. Default:2000
. Usetraining_steps=1000
for datasets with sample size < 20,000.npcs
: Dimensionality of the embeddings in each domain (number of PCs). Default:30
.n_latent
: Dimensionality of the shared latent space. Default:20
.batch_size
: Batch size for training. Default:500
.seed
: Random seed. Default:1234
.
The default setting of the parameter lambdacos
works in general. We also enable tuning of this parameter to achieve a better performance, see Tuning lambdacos
(optional). For the integration task where the cosine similarity is not a reliable cross-domain correspondance (such as cross-species integration), we recommend to use a lower value such as lambdacos=10.0
.
Memory-efficient Version
To deal with large single-cell datasets, we also developed a memory-efficient version by reading mini-batches from the disk:
model = portal.model.Model()
model.preprocess_memory_efficient(adata_A_path="adata_1.h5ad", adata_B_path="adata_2.h5ad")
model.train_memory_efficient()
model.eval_memory_efficient()
Integrating Multiple Datasets
Portal integrates multiple datasets incrementally. Given adata_list = [adata_1, ..., adata_n]
is a list of AnnData objects, they can be integrated by running the following commands:
lowdim_list = portal.utils.preprocess_datasets(adata_list)
integrated_data = portal.utils.integrate_datasets(lowdim_list)
Tuning lambdacos
(optional)
An optional choice is to tune the parameter lambdacos
in the range [15.0, 50.0]. Users can run the following command to search for an optimal parameter that yields the best integration result in terms of the mixing metric:
lowdim_list = portal.utils.preprocess_datasets(adata_list)
integrated_data = portal.utils.integrate_datasets(lowdim_list, search_cos=True)
Recovering expression matrices
Portal can provide harmonized expression matrices (in scaled level or log-normalized level):
lowdim_list, hvg, mean, std, pca = portal.utils.preprocess_recover_expression(adata_list)
expression_scaled, expression_log_normalized = portal.utils.integrate_recover_expression(lowdim_list, mean, std, pca)
Demos
We provide demos for users to get a quick start: Demo 1, Demo 2.
Development
This package is developed by Jia Zhao (jzhaoaz@connect.ust.hk) and Gefei Wang (gwangas@connect.ust.hk).
Citation
Jia Zhao, Gefei Wang, Jingsi Ming, Zhixiang Lin, Yang Wang, The Tabula Microcebus Consortium, Angela Ruohao Wu, Can Yang. Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets. Nature Computational Science 2, 317–330 (2022).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file portal_sc-1.0.4.tar.gz
.
File metadata
- Download URL: portal_sc-1.0.4.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d9911ef107e8c64d591a7b88dfa57b73e4c16a717d0f7b15ec547c0cfade456 |
|
MD5 | f242b0eb82b2099348f7c859f37af16c |
|
BLAKE2b-256 | 8d61ab3c8ee0c61fdbcd93ebcd7bf311daf6718c25fd06be0fd2d8b3bd3d43d8 |
File details
Details for the file portal_sc-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: portal_sc-1.0.4-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 130fd01353bc06f12e69ec5fe92b94688a4a3486705c2023f62445cedd90a4de |
|
MD5 | 38c16ef69fb999d42f1562baa043e37b |
|
BLAKE2b-256 | 45984b1e66101ac3e7f49a9e22c733bab8ddf432dd8e332b765e96059c674312 |