Skip to main content

Cell state space analysis of variance for signal recovery with batch correction

Project description

CellANOVA: Cell State Space Analysis of Variance

Introduction

CellANOVA is a Python package, for batch correction with signal recovery. It contructs a pool of control samples to estimate a latent linear space that captures cell- and gene-specific unwanted variations, which can then be used to remove batch effects from cells across all samples. By using only samples in the control pool in the estimation of the batch variation space, CellANOVA preserves any biological differences in the non-control samples that lie outside this space. Importantly, CellANOVA produces a batch corrected gene expression matrix which can be used for gene- and pathway-level downstream analyses, and is fast and scalable to data sets containing millions of cells.

CellANOVA can be applied to multiple settings:

  • Case-control design
  • Longitudinal design
  • Irregular block design

For more model details, validation results and real dataset analysis, please check out our paper (to add link). If you use our method, please use the following citation:

Installation

Dependencies

Our Python package has been tested on python=3.7, 3.8, 3.9. It depends on numpy>=1.20.3, scipy>=1.7.1, pandas>=1.3.2, scikit-learn>=1.0.2, anndata>=0.7.6, scanpy>=1.8.1, harmonypy>=0.0.6.

If you use conda environment, you can use the following command for an easy setup. It will build a seperate cellanova environment, and have all dependencies installed.

conda env create -f environment.yml

Download

Tutorials

Quick Start

The following is an quick example showing CellANOVA integration pipeline.

## load required package
import anndata as ad
import scanpy as sc
from cellanova import *

## load and preprocess data
adata = sc.read_h5ad('raw_data.h5ad')
adata_prep = preprocess_data(adata, integrate_key='dataidx')

## construct control pool
control_dict = {
    'pool1': list(set(adata_prep[adata_prep.obs['condition']=='control',].obs['dataidx'])),
}

## model fitting
adata_prep= calc_ME(adata_prep, integrate_key='dataidx')
adata_prep = calc_BE(adata_prep, integrate_key, control_dict)
adata_prep = calc_TE(adata_prep, integrate_key)

## create an independent anndata object for cellanova-integrated data
integrated = ad.AnnData(adata_prep.layers['denoised'], dtype=np.float32)
integrated.obs = adata_prep.obs.copy()
integrated.var_names = adata_prep.var_names

Example Notebooks

For more detailed examples, please refer to the following jupyter notebooks:

  • CellANOVA integration workfolow (link)
  • Evaluation of batch removal performance (link)
  • Evaluation of distortion (link)
  • Evaluation of signal preservation (link)

License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellanova-0.1.0.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

cellanova-0.1.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file cellanova-0.1.0.tar.gz.

File metadata

  • Download URL: cellanova-0.1.0.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for cellanova-0.1.0.tar.gz
Algorithm Hash digest
SHA256 848721029afa8d211ccf1c27611648722af6255dc38a7eb3cd680490861203dd
MD5 69d2aea9dc53a71ba0f92aeca874daa0
BLAKE2b-256 5de809a903c8a35d118baf8bba6e577407645c5ac6abd3287f49ee6b61309f80

See more details on using hashes here.

File details

Details for the file cellanova-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cellanova-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for cellanova-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 58faed1475bb47395134efa00422b7ffd18a7d1dc30320b2c437a91a0ef1cf54
MD5 49100bd525cffd309bb758e3165bce3f
BLAKE2b-256 2e45160ce0f4d4ebf58ca788a86beb3b813966822990d2373135aa1996a448e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page