Skip to main content

CellDrift: A Python Package to Infer Temporal Patterns of Peturbation Effects in Single Cell Data

Project description

Documentation Status Pypi Stars

CellDrift

CellDrift: temporal perturbation effects for single cell data

Perturbation effects on gene programs are commonly investigated in single-cell experiments. Existing models measure perturbation responses independently across time series, disregarding the temporal consistency of specific gene programs. We introduce CellDrift, a generalized linear model based functional data analysis approach to investigate temporal gene patterns in response to perturbations. overview

Reference

CellDrift: Inferring Perturbation Responses in Temporally-Sampled Single Cell Data. BioRxiv. Apr 2022 (https://www.biorxiv.org/content/10.1101/2022.04.13.488194v1)

Prerequisite

# It's recommended to create a new environment using conda (python 3.7 is recommended)
conda create -n celldrift_py python=3.7
# Install prerequisite package scikit-fda (development version)
conda activate celldrift_py # activate celldrift environment
pip install git+https://github.com/GAA-UAM/scikit-fda.git

Installation

git clone https://github.com/KANG-BIOINFO/CellDrift.git
cd CellDrift
pip install .

Tutorial

Quick Start

import numpy as np
import pandas as pd
import scanpy as sc
import CellDrift as ct

1. Load data and preparation

adata = sc.read("example.h5ad")
adata.obs['size_factor'] = np.sum(adata.X, axis = 1)

2. Set up CellDrift object

adata = ct.setup_celldrift(
    adata, 
    cell_type_key = 'cell_type',
    perturb_key = 'perturb', 
    time_key = 'time', # the name of time covariate. Must be numeric
    control_name = 'Control', 
    perturb_name = None, 
    size_factor_key = 'size_factor', 
    batch_key = 'batch', 
    n_reps = 3,
    n_cells_perBlock = 100,
    use_pseudotime = False,
    min_cells_perGene = 0
)

3. Run GLM model

adata = ct.model_timescale(
    adata, 
    n_processes = 16, # number of processes for multiprocessing
    chunksize = 100, # number of genes in each chunk
    pairwise_contrast_only = True, 
    adjust_batch = False
)

4. set up FDA object

df_zscore = pd.read_csv('Temporal_CellDrift/Contrast_Coefficients_combined_zscores_.txt', sep = '\t', header = 0, index_col = 0) # CellDrift z scores
df_meta = pd.read_csv('Temporal_CellDrift/Contrast_Coefficients_combined_metadata_.txt', sep = '\t', header = 0, index_col = 0) # metadata of contrast comparisons

fda = ct.FDA(df_zscore, df_meta)

5. temporal clustering

fd, genes = fda.create_fd_genes(genes = df_zscore.index.values, cell_type = 'Type_0', perturbation = 'Perturb_0')
df_cluster = ct.fda_cluster(fd, genes, n_clusters = 3)

6. visualization for each temporal cluster

ct.draw_smoothing_clusters(
    fd, 
    df_cluster, 
    n_neighbors = 2, 
    bandwidth = 1,
    cluster_key = 'clusters_fuzzy', 
    output_folder = 'Temporal_CellDrift/cluster_fuzzy/'
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CellDrift-0.1.3.tar.gz (2.3 kB view hashes)

Uploaded Source

Built Distribution

CellDrift-0.1.3-py3-none-any.whl (2.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page