Cell state space analysis of variance for signal recovery with batch correction
Project description
CellANOVA: Cell State Space Analysis of Variance
Introduction
CellANOVA is a Python package, for batch correction with signal recovery. It contructs a pool of control samples to estimate a latent linear space that captures cell- and gene-specific unwanted variations, which can then be used to remove batch effects from cells across all samples. By using only samples in the control pool in the estimation of the batch variation space, CellANOVA preserves any biological differences in the non-control samples that lie outside this space. Importantly, CellANOVA produces a batch corrected gene expression matrix which can be used for gene- and pathway-level downstream analyses, and is fast and scalable to data sets containing millions of cells.
CellANOVA can be applied to multiple settings:
- Case-control design
- Longitudinal design
- Irregular block design
For more model details, validation results and real dataset analysis, please check out our paper (to add link). If you use our method, please use the following citation:
Installation
Dependencies
Our Python package has been tested on python=3.7, 3.8, 3.9. It depends on numpy>=1.20.3, scipy>=1.7.1, pandas>=1.3.2, scikit-learn>=1.0.2, anndata>=0.7.6, scanpy>=1.8.1, harmonypy>=0.0.6.
If you use conda environment, you can use the following command for an easy setup. It will build a seperate cellanova environment, and have all dependencies installed.
conda env create -f environment.yml
Download
Tutorials
Quick Start
The following is an quick example showing CellANOVA integration pipeline.
## load required package
import anndata as ad
import scanpy as sc
from cellanova import *
## load and preprocess data
adata = sc.read_h5ad('raw_data.h5ad')
adata_prep = preprocess_data(adata, integrate_key='dataidx')
## construct control pool
control_dict = {
'pool1': list(set(adata_prep[adata_prep.obs['condition']=='control',].obs['dataidx'])),
}
## model fitting
adata_prep= calc_ME(adata_prep, integrate_key='dataidx')
adata_prep = calc_BE(adata_prep, integrate_key, control_dict)
adata_prep = calc_TE(adata_prep, integrate_key)
## create an independent anndata object for cellanova-integrated data
integrated = ad.AnnData(adata_prep.layers['denoised'], dtype=np.float32)
integrated.obs = adata_prep.obs.copy()
integrated.var_names = adata_prep.var_names
Example Notebooks
For more detailed examples, please refer to the following jupyter notebooks:
- CellANOVA integration workfolow (link)
- Evaluation of batch removal performance (link)
- Evaluation of distortion (link)
- Evaluation of signal preservation (link)
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cellanova-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58faed1475bb47395134efa00422b7ffd18a7d1dc30320b2c437a91a0ef1cf54 |
|
MD5 | 49100bd525cffd309bb758e3165bce3f |
|
BLAKE2b-256 | 2e45160ce0f4d4ebf58ca788a86beb3b813966822990d2373135aa1996a448e7 |