CellDrift: A Python Package to Infer Temporal Patterns of Peturbation Effects in Single Cell Data
Project description
CellDrift
CellDrift: temporal perturbation effects for single cell data
Perturbation effects on gene programs are commonly investigated in single-cell experiments. Existing models measure perturbation responses independently across time series, disregarding the temporal consistency of specific gene programs. We introduce CellDrift, a generalized linear model based functional data analysis approach to investigate temporal gene patterns in response to perturbations.
Reference
CellDrift: Inferring Perturbation Responses in Temporally-Sampled Single Cell Data. BioRxiv. Apr 2022 (https://www.biorxiv.org/content/10.1101/2022.04.13.488194v1)
Prerequisite
# It's recommended to create a new environment using conda (python 3.7 is recommended)
conda create -n celldrift_py python=3.7
# Install prerequisite package scikit-fda (development version)
conda activate celldrift_py # activate celldrift environment
pip install git+https://github.com/GAA-UAM/scikit-fda.git
Installation
git clone https://github.com/KANG-BIOINFO/CellDrift.git
cd CellDrift
pip install .
Tutorial
- Example on HIV Infection Study
- Example on Pseudo-time Data of Brain Organoid Development
- Check the complete Document
Quick Start
import numpy as np
import pandas as pd
import scanpy as sc
import CellDrift as ct
1. Load data and preparation
adata = sc.read("example.h5ad")
adata.obs['size_factor'] = np.sum(adata.X, axis = 1)
2. Set up CellDrift object
adata = ct.setup_celldrift(
adata,
cell_type_key = 'cell_type',
perturb_key = 'perturb',
time_key = 'time', # the name of time covariate. Must be numeric
control_name = 'Control',
perturb_name = None,
size_factor_key = 'size_factor',
batch_key = 'batch',
n_reps = 3,
n_cells_perBlock = 100,
use_pseudotime = False,
min_cells_perGene = 0
)
3. Run GLM model
adata = ct.model_timescale(
adata,
n_processes = 16, # number of processes for multiprocessing
chunksize = 100, # number of genes in each chunk
pairwise_contrast_only = True,
adjust_batch = False
)
4. set up FDA object
df_zscore = pd.read_csv('Temporal_CellDrift/Contrast_Coefficients_combined_zscores_.txt', sep = '\t', header = 0, index_col = 0) # CellDrift z scores
df_meta = pd.read_csv('Temporal_CellDrift/Contrast_Coefficients_combined_metadata_.txt', sep = '\t', header = 0, index_col = 0) # metadata of contrast comparisons
fda = ct.FDA(df_zscore, df_meta)
5. temporal clustering
fd, genes = fda.create_fd_genes(genes = df_zscore.index.values, cell_type = 'Type_0', perturbation = 'Perturb_0')
df_cluster = ct.fda_cluster(fd, genes, n_clusters = 3)
6. visualization for each temporal cluster
ct.draw_smoothing_clusters(
fd,
df_cluster,
n_neighbors = 2,
bandwidth = 1,
cluster_key = 'clusters_fuzzy',
output_folder = 'Temporal_CellDrift/cluster_fuzzy/'
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
CellDrift-0.1.3.tar.gz
(2.3 kB
view hashes)
Built Distribution
Close
Hashes for CellDrift-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18b65ee9c51329a825e14b516b9cc2ea39eca2a7429d347b98e69a14cf275442 |
|
MD5 | 1f96b1ea5e6403b4e2bcfab9a444cdf4 |
|
BLAKE2b-256 | fba1c885c40e1a55928885eaa951e687218be832798ff4cb2ffbd54a7430b966 |