Skip to main content

An efficient, accurate and flexible method for single-cell data integration.

Project description

Portal

DOI

Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets

An efficient, accurate and flexible method for single-cell data integration.

Check out our manuscript in Nature Computational Science:

Reproducibility

We provide source codes for reproducing the experiments of the paper "Adversarial domain translation networks for fast and accurate integration of large-scale atlas-level single-cell datasets".

Installation

To run Portal, please follow the installation instruction:

git clone https://github.com/YangLabHKUST/Portal.git
cd Portal
conda env update --f environment.yml
conda activate portal

Normally the installation time is less than 5 minutes.

Quick Start

Basic Usage

Starting with raw count matrices formatted as AnnData objects, Portal uses a standard pipline adopted by Seurat and Scanpy to preprocess data, followed by PCA for dimensionality reduction. After preprocessing, Portal can be trained via model.train().

import portal
import scanpy as sc

# read AnnData
adata_1 = sc.read_h5ad("adata_1.h5ad")
adata_2 = sc.read_h5ad("adata_2.h5ad")

model = portal.model.Model()
model.preprocess(adata_1, adata_2) # perform preprocess and PCA
model.train() # train the model
model.eval() # get integrated latent representation of cells

The evaluating procedure model.eval() saves the integrated latent representation of cells in model.latent, which can be used for downstream integrative analysis.

Parameters in portal.model.Model():

  • lambdacos: Coefficient of the regularizer for preserving cosine similarity across domains. Default: 20.0.
  • training_steps: Number of steps for training. Default: 2000. Use training_steps=1000 for datasets with sample size < 20,000.
  • npcs: Dimensionality of the embeddings in each domain (number of PCs). Default: 30.
  • n_latent: Dimensionality of the shared latent space. Default: 20.
  • batch_size: Batch size for training. Default: 500.
  • seed: Random seed. Default: 1234.

The default setting of the parameter lambdacos works in general. We also enable tuning of this parameter to achieve a better performance, see Tuning lambdacos (optional). For the integration task where the cosine similarity is not a reliable cross-domain correspondance (such as cross-species integration), we recommend to use a lower value such as lambdacos=10.0.

Memory-efficient Version

To deal with large single-cell datasets, we also developed a memory-efficient version by reading mini-batches from the disk:

model = portal.model.Model()
model.preprocess_memory_efficient(adata_A_path="adata_1.h5ad", adata_B_path="adata_2.h5ad")
model.train_memory_efficient()
model.eval_memory_efficient()

Integrating Multiple Datasets

Portal integrates multiple datasets incrementally. Given adata_list = [adata_1, ..., adata_n] is a list of AnnData objects, they can be integrated by running the following commands:

lowdim_list = portal.utils.preprocess_datasets(adata_list)
integrated_data = portal.utils.integrate_datasets(lowdim_list)

Tuning lambdacos (optional)

An optional choice is to tune the parameter lambdacos in the range [15.0, 50.0]. Users can run the following command to search for an optimal parameter that yields the best integration result in terms of the mixing metric:

lowdim_list = portal.utils.preprocess_datasets(adata_list)
integrated_data = portal.utils.integrate_datasets(lowdim_list, search_cos=True)

Recovering expression matrices

Portal can provide harmonized expression matrices (in scaled level or log-normalized level):

lowdim_list, hvg, mean, std, pca = portal.utils.preprocess_recover_expression(adata_list)
expression_scaled, expression_log_normalized = portal.utils.integrate_recover_expression(lowdim_list, mean, std, pca)

Demos

We provide demos for users to get a quick start: Demo 1, Demo 2.

Development

This package is developed by Jia Zhao (jzhaoaz@connect.ust.hk) and Gefei Wang (gwangas@connect.ust.hk).

Citation

Jia Zhao, Gefei Wang, Jingsi Ming, Zhixiang Lin, Yang Wang, The Tabula Microcebus Consortium, Angela Ruohao Wu, Can Yang. Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets. Nature Computational Science 2, 317–330 (2022).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

portal_sc-1.0.4.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

portal_sc-1.0.4-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file portal_sc-1.0.4.tar.gz.

File metadata

  • Download URL: portal_sc-1.0.4.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.9

File hashes

Hashes for portal_sc-1.0.4.tar.gz
Algorithm Hash digest
SHA256 9d9911ef107e8c64d591a7b88dfa57b73e4c16a717d0f7b15ec547c0cfade456
MD5 f242b0eb82b2099348f7c859f37af16c
BLAKE2b-256 8d61ab3c8ee0c61fdbcd93ebcd7bf311daf6718c25fd06be0fd2d8b3bd3d43d8

See more details on using hashes here.

File details

Details for the file portal_sc-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: portal_sc-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.9

File hashes

Hashes for portal_sc-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 130fd01353bc06f12e69ec5fe92b94688a4a3486705c2023f62445cedd90a4de
MD5 38c16ef69fb999d42f1562baa043e37b
BLAKE2b-256 45984b1e66101ac3e7f49a9e22c733bab8ddf432dd8e332b765e96059c674312

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page