Skip to main content

integration

Project description

Scbean

scbean integrates a range of models for single-cell data analysis, including dimensionality reduction, remvoing batch effects, and transferring well-annotated cell type labels from scRNA-seq to scATAC-seq. It is efficient and scalable for large-scale datasets.

Documentation Status Build Status PyPI Downloads GitHub Repo stars

scbean.DAVAE

Domain-adversarial and variational approximation framework, DAVAE, can integrate multiple single-cell data across samples, technologies and modalities without any post hoc data processing. DAVAE fit normalized gene expression into a non-linear model, which transforms a latent variable of a lower-dimension into expression space with a non-linear function, a KL regularizier and a domain-adversarial regularizer.

We will also provide more fundamental analyses for multi-modal data and spatial resoved transcriptomics in the future. The output can be easily used for downstream data analyses such as clustering, identification of cell subpopulations, differential gene expression, visualization using either Seurat or Scanpy.

Installation

  • Create conda environment

    $ conda create -n scbean python=3.8
    $ conda activate scbean
    
  • Install scbean from pypi

    $ pip install scbean
    
  • Alternatively, install the develop version of scbean from GitHub source code

    $ git clone https://github.com/jhu99/scbean.git
    $ cd ./scbean/
    $ python -m pip install .
    

Note: Please make sure your python version >= 3.7, and install tensorflow-gpu if GPU is available on your your machine.

Usage

For detailed guide about the usage of scbean, the tutorial and documentation were provided here.

Quick start with DAVAE

Download the data of the following test code.

import scbean.model.davae as davae
import scbean.tools.utils as tl

# Please choose an appropiate matplotlib backend.
import matplotlib
matplotlib.use('TkAgg')

# read single-cell data.
adata_b1 = tl.read_sc_data("./data/mixed_cell_lines/293t.h5ad", batch_name="293t")
adata_b2 = tl.read_sc_data("./data/mixed_cell_lines/jurkat.h5ad", batch_name="jurkat")
adata_b3 = tl.read_sc_data("./data/mixed_cell_lines/mixed.h5ad", batch_name="mixed")

# tl.preprocessing include filteration, log-TPM normalization, selection of highly variable genes.
adata_all= tl.preprocessing([adata_b1, adata_b2, adata_b3])

# Training and integrating multiple single-cell datasets. The DAVAE's output include cell representation in 
# reduced dimensional space and recovered gene expression.
adata_integrate = davae.fit_integration(
    adata_all,
    batch_num=3,
    split_by='batch_label',
    domain_lambda=2.0,
    epochs=25,
    sparse=True,
    hidden_layers=[64, 32, 6]
)
# Visualization
sc.pp.neighbors(adata_integrate, use_rep='X_davae')
sc.tl.umap(adata_integrate)
sc.pl.umap(adata_integrate, color='batch')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scbean-0.4.1.tar.gz (4.2 MB view hashes)

Uploaded Source

Built Distribution

scbean-0.4.1-py3-none-any.whl (22.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page