Skip to main content

scAce: an adaptive embedding and clustering method for single-cell gene expression data

Project description

scAce: an adaptive embedding and clustering method for scRNA-seq data

Overview

Overview

scAce is consisted of three major steps, a pre-training step based on a variational autoencoder, a cluster initialization step to obtain initial cluster labels, and an adaptive cluster merging step to iteratively update cluster labels and cell embeddings. In the pre-training step, scAce takes the single-cell gene expression matrix as its input to train a VAE network. For each gene, the VAE learns and outputs three parameters of a ZINB distribution (mean, dispersion, and proportion of zero). In the cluster initialization step, scAce offeres two manners. With de novo initialization, Leiden is used to obtain initial cluster labels; with clustering enhancement, initial cluster labels are obtained by applying a cluster splitting approach to a set of existing clustering results. In the adaptive cluster merging step, given the pre-trained VAE network and the initial cluster labels, the network parameters, cell embeddings, cluster labels and centroids are iteratively updated by alternately performing network update and cluster merging steps. The final results of cell embeddings and cluster labels are output by scAce after the iteration process stops.

Installation

Please install scAce from pypi with:

pip install scace

Or clone this repository and use

pip install -e .

in the root of this repository.

Quick start

Load the data to be analyzed:

import scanpy as sc

adata = sc.AnnData(data)

Perform data pre-processing:

# Basic filtering
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.filter_cells(adata, min_genes=200)

adata.raw = adata.copy()

# Total-count normlize, logarithmize and scale the data  
sc.pp.normalize_per_cell(adata)
adata.obs['scale_factor'] = adata.obs.n_counts / adata.obs.n_counts.median()

sc.pp.log1p(adata)
sc.pp.scale(adata)

Run the scAce method:

from scace import run_scace
adata = run_scace(adata)

The output adata contains cluster labels in adata.obs['scace_cluster'] and the cell embeddings in adata.obsm['scace_emb']. The embeddings can be used as input of other downstream analyses.

Please refer to tutorial.ipynb for a detailed description of scAce's usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scace-0.1.2.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

scace-0.1.2-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file scace-0.1.2.tar.gz.

File metadata

  • Download URL: scace-0.1.2.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for scace-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5ebd434902a3614f4a57d9cf36844e4932845f81c7fb1415ea177f69e653a53d
MD5 a3dcb1061ea1b9bf9887496777318bf6
BLAKE2b-256 eb8ed9f1b28346b9837dac34c91a9611de2e451c4542a43081f186d59a251c9e

See more details on using hashes here.

File details

Details for the file scace-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: scace-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for scace-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3020024a7a1f4dc32288fc358436c574053a4bf61c956c7f97700f403e8487ae
MD5 e69df3d77f63c4908c7c0b20b50d28da
BLAKE2b-256 6a347c32d3428231ef94f58b4422b3770dfca47788cff8041338675f82ec408c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page