Flexible analysis of high-content CRISPR screening
Project description
ScreenPro2
The complete docs are available at screenpro2.rtfd.io.
Table of Contents
Introduction
Functional genomics field is evolving rapidly and many more CRISPR screen platforms are now developed. Therefore, it's important to have a standardized workflow to analyze the data from these screens. ScreenPro2 is provided to enable researchers to easily process and analyze data from CRISPR screens. Currently, you need to have a basic background in programming (especially Python) to use ScreenPro2.
ScreenPro2 is conceptually similar to the ScreenProcessing pipeline but ScreenPro2 is designed to be more modular, flexible, and extensible. Common CRISPR screen methods that we have implemented here are illustrated in a recent review paper:
Fig. 1: Common types of CRISPR screening modalities indicating advances in CRISPR methods.
Installation
ScreenPro2 is available on PyPI and can be installed with pip:
pip install ScreenPro2
For the latest version (development version) install from GitHub:
pip install git+https://github.com/ArcInstitute/ScreenPro2.git
Usage
Data analysis for CRISPR screens with NGS readouts can be broken down into three main steps:
Step 1: FASTQ to counts
Since version 0.2.7, ScreenPro2 has a built-in method to process FASTQ files and generate counts. This method is implemented in the ngs
module and relvent submodules. A minor novelty here has enabled processing single, dual, or multiple sgRNA CRISPR screens. Also, this approach can retain recombination events which can occur in dual or higher
order sgRNA CRISPR screens.
There is no example code for this step yet, but a command line interface (CLI) will be available soon.
Step 2: Phenotype calculation
Once you have the counts, you can use ScreenPro2 phenoScore
and phenoStats
modules to calculate the phenotype scores and statistics between screen arms.
Load Data
First, load your data into an AnnData
object (see anndata for
more information).
The AnnData
object should have the following structure:
adata.X
should be a pandas dataframe of counts (samples x oligos)adata.obs
should be a pandas dataframe of sample metadata including "condition" and "replicate" columnsadata.var
should be a pandas dataframe of oligo metadata including "target" and "targetType" columns- "target" column should be the gene name or other identifier for the reference oligo
- "targetType" column should be the type of reference oligo. Currently, negative control oligos should have
"targetType" == "negCtrl"
Then you need create a ScreenPro
object. Here is an example code making a ScreenPro
object from an AnnData
object:
import pandas as pd
import anndata as ad
import screenpro as scp
adata = ad.AnnData(
X = counts_df, # pandas dataframe of counts (samples x oligos)
obs = meta_df, # pandas dataframe of sample metadata including "condition" and "replicate" columns
var = target_df # pandas dataframe of oligo metadata including "target" and "targetType" columns
)
screen = scp.ScreenPro(adata)
Perform Screen Processing Analysis
Once the ScreenPro
object is created, you can use several available workflows to calculate the enrichment of each oligo
between screen arms.
Drug Screen Workflow: calculate gamma
, rho
, and tau
scores
.calculateDrugScreen
method can be used to calculate the enrichment of each gene between screen arms for a drug
screen experiment. This method calculates gamma
, rho
, and tau
scores for each gene and adds them to the
.phenotypes
attribute of the ScreenPro
object.
Here is an example for running the workflow on a CRISPRi-dual-sgRNA-screens dataset:
# Run the ScreenPro2 workflow for CRISPRi-dual-sgRNA-screens
screen.calculateDrugScreen(
t0='T0',
untreated='DMSO', # replace with the label for untreated condition
treated='Drug', # replace with the label for treated condition
db_untreated=1, # replace with doubling rate of untreated condition
db_treated=1, # replace with doubling rate of treated condition
score_level='compare_reps'
)
For example, in a Decitabine CRISPRi drug screen (see Figure 1B-C in this bioRxiv paper), each phenotype score represents a comparison between different arms of the screen and rho
scores shows the main drug phenotype as illustrated here:
Flow cytometry based screen workflow: calculate phenotype score to compare high and low bins
.calculateFlowBasedScreen
method can be used to calculate the enrichment of each target between high bin vs. low bin
of a flow cytometry-based screen experiment. This method calculates PhenoScore
for each target and adds them to the
.phenotypes
attribute of the ScreenPro
object.
# Run the ScreenPro2 workflow for CRISPRi-dual-sgRNA-screens
screen.calculateFlowBasedScreen(
low_bin='low_bin', high_bin='high_bin',
score_level='compare_reps'
)
Step 3: Explore results and QC reports
Once the phenotypes are calculated, you can extract and explore the results using the .phenotypes
attribute of the ScreenPro
object. Currently, there are very limited functionalities built-in to visualize the results, but we are working on adding more features to make it easier for users. However, you can easily extract the results and use other libraries like seaborn
and matplotlib
in Python or ggplot2
in R to visualize the results.
Supported CRISPR Screen Platforms
One of the main goals of ScreenPro2 is to make it easy to process data from commonly used CRISPR screen platforms. Also, it is designed to be modular to enable easy extension to custom CRISPR screen platforms or other commonly used platforms in addition to the ones currently implemented.
Currently, ScreenPro2 has easy-to-use workflows for the following CRISPR screen platforms:
dCas9 CRISPRa/i single-sgRNA screens
Horlbeck et al. developed a CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) screening platform that uses a single sgRNA within a single plasmid and then there are up to 10 sgRNAs per gene. The multiple sgRNAs per gene can be used to perfrom statistical comparisons in guide-level or gene-level between screen arms. ScreenProcessing has been developed to process data from this type of screen. We reimplemented the same workflow in ScreenPro2 and it has all the necessary tools to process data from this type of screen. An automated workflow / pipeline will be available soon.
dCas9 CRISPRa/i dual-sgRNA screens
Replogle et al. developed a CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) screening platform that uses two sgRNAs per gene within a single plasmid, and it has been used to perform genome-scale CRISPRi screens. ScreenPro2 has all the necessary tools to process data from this type of screen. An automated workflow / pipeline will be available soon.
License
ScreenPro2 is licensed under the terms of the MIT license (see LICENSE for more information) and developed by Abolfazl (Abe) Arab (@abearab), a Research Associate in the Gilbert lab at UCSF and Arc Institute.
Citation
If you use ScreenPro2 in your research, please cite the following paper.
Coming soon...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for ScreenPro2-0.2.10-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ac9dfad97e4265685a7162b4cc4ea2476d4e56e1fb3b3262b5401e6f763a5c3 |
|
MD5 | 90e1b40bf7cb64451b5d58342db786c9 |
|
BLAKE2b-256 | 2f9b2da7977aaaa328abae283c1ed791b8a1c0f843d147617a14b8456fe3af22 |