Single Cell Pathway Activity Factor Analysis
Project description
scPAFA
Single Cell Pathway Activity Factor Analysis
A Python library designed for large-scale single-cell datasets allowing rapid PAS computation and uncovering biologically interpretable disease-related multicellular pathway modules, which are low-dimensional representations of disease-related PAS variance in multiple cell types.
Workflow
Installation
We recommend using scPAFA in a virtual environment.
conda create -n scPAFA_env python=3.10
Install from PyPi
conda activate scPAFA_env
pip install scPAFA
Install from GitHub
In your workdir
conda activate scPAFA_env
git clone https://github.com/ZhuoliHuang/scPAFA
cd ./scPAFA
python setup.py install
Tutorial
Pathway input: Download pathway information and generate a pathway dictionary
The pathway input of scPAFA is a Python dictionary, each item with a pathway name as a key and a list of genes as values.
(1) Download pathway collection
Pathway collection can be downloaded from MsigDB ('JSON bundle' is recommended), or NCATS bioplanet. Users can also use a custom pathway collection.
(2) Generate pathway dictionary
We provided examples of constructing pathway dictionary from the MsigDB and NCATS bioplanet databases.
Step1: Calculate Pathway Activity Score
In step1, single-cell gene expression matrix and collection of pathways are used to compute PAS by ‘fast_Ucell’(example) or ‘fast_score_genes’(example). These functions are more computationally efficient implementation of UCell and AddModuleScore (also known as ‘score_genes’ in Scanpy)
Step2~3: Pseudobulk processing and MOFA model training
In step 2, the single-cell PAS matrix is reformatted into a suitable input (a long-table-like pandas dataframe) for Multi-Omics Factor Analysis (MOFA) along with cell-level metadata including sample/donor, cell type, and technical batch information. In step 3, MOFA model is trained to capture variance in PAS among different samples. Notably, MOFA contains general framework (single-group framework) and multi-group framework, the aim of multi-group framework is to find out which sources of variability are shared between the different groups. In the presence of clearly known batch effects, we recommend using multi-group MOFA+ framework for correction. We provided examples of steps 2 and 3.
Step4: Downstream analysis of the MOFA Model
In step 4, together with sample-level clinical metadata, disease-related multicellular pathway modules (latent factor and its corresponding weights of pathways across cell types) can be identified by statistical analysis. Downstream analyses include characterization and interpretation of multicellular pathway modules, sample/donor stratification, and visualization of high-weighted pathways (example).
Supported Systems
Including but not limited to:
Ubuntu 20.04
Windows 10/11
Citation
scPAFA is now reported on biorxiv.
Huang, Z., Zheng, Y., Wang, W., Zhou, W., Wei, C., Zhang, X., ... & Yin, J. (2024). Uncovering disease-related multicellular pathway modules on large-scale single-cell transcriptomes with scPAFA. bioRxiv, 2024-03.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scPAFA-0.1.3.tar.gz
.
File metadata
- Download URL: scPAFA-0.1.3.tar.gz
- Upload date:
- Size: 125.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62410a2b6e695393166225132747288490c76b7fede6908430c49553ad96a203 |
|
MD5 | 737aa88622eff53155d982f3291bdf85 |
|
BLAKE2b-256 | 2dba2672479991785491b73510ccd861af9ce5c895c2f40dd2d9d24c6783f8c1 |