Modelling CRISPR dropout data
Project description
Module with utility functions to process CRISPR-based screens and method to correct gene independent copy-number effects.
Description
Crispy uses Sklearn implementation of Gaussian Process Regression, fitting each sample independently.
Install
Install pybedtools
and then install Crispy
conda install -c bioconda pybedtools
pip install cy
Examples
Support to library imports:
from crispy.CRISPRData import Library # Master Library, standardised assembly of KosukeYusa V1.1, Avana, Brunello and TKOv3 # CRISPR-Cas9 libraries. master_lib = Library.load_library("MasterLib_v1.csv.gz") # Genome-wide minimal CRISPR-Cas9 library. minimal_lib = Library.load_library("MinLibCas9.csv.gz") # Some of the most broadly adopted CRISPR-Cas9 libraries: # 'Avana_v1.csv.gz', 'Brunello_v1.csv.gz', 'GeCKO_v2.csv.gz', 'Manjunath_Wu_v1.csv.gz', # 'TKOv3.csv.gz', 'Yusa_v1.1.csv.gz' brunello_lib = Library.load_library("Brunello_v1.csv.gz")
Select sgRNAs (across multiple CRISPR-Cas9 libraries) for a given gene:
from crispy.GuideSelection import GuideSelection # sgRNA selection class gselection = GuideSelection() # Select 5 optimal sgRNAs for MCL1 across multiple libraries gene_guides = gselection.select_sgrnas( "MCL1", n_guides=5, offtarget=[1, 0], jacks_thres=1, ruleset2_thres=.4 ) # Perform different rounds of sgRNA selection with increasingly relaxed efficiency thresholds gene_guides = gselection.selection_rounds("TRIM49", n_guides=5, do_amber_round=True, do_red_round=True)
Copy-number correction:
import crispy as cy import matplotlib.pyplot as plt from crispy.CRISPRData import ReadCounts, Library """ Import sample data """ rawcounts, copynumber = cy.Utils.get_example_data() """ Import CRISPR-Cas9 library Important: Library has to have the following columns: "Chr", "Start", "End", "Approved_Symbol" Library and segments have to have consistent "Chr" formating: "Chr1" or "chr1" or "1" Gurantee that "Start" and "End" columns are int """ lib = Library.load_library("Yusa_v1.1.csv.gz") lib = lib.rename( columns=dict(start="Start", end="End", chr="Chr", Gene="Approved_Symbol") ).dropna(subset=["Chr", "Start", "End"]) lib["Chr"] = "chr" + lib["Chr"] lib["Start"] = lib["Start"].astype(int) lib["End"] = lib["End"].astype(int) """ Calculate fold-change """ plasmids = ["ERS717283"] rawcounts = ReadCounts(rawcounts).remove_low_counts(plasmids) sgrna_fc = rawcounts.norm_rpm().foldchange(plasmids) """ Correct CRISPR-Cas9 sgRNA fold changes """ crispy = cy.Crispy( sgrna_fc=sgrna_fc.mean(1), copy_number=copynumber, library=lib.loc[sgrna_fc.index] ) # Fold-changes and correction integrated funciton. # Output is a modified/expanded BED formated data-frame with sgRNA and segments information # n_sgrna: represents the minimum number of sgRNAs required per segment to consider in the fit. # Recomended default values range between 4-10. bed_df = crispy.correct(n_sgrna=10) print(bed_df.head()) # Gaussian Process Regression is stored crispy.gpr.plot(x_feature="ratio", y_feature="fold_change") plt.show()
Credits and License
Developed at the Wellcome Sanger Institue (2017-2020).
For citation please refer to:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cy-0.5.8.tar.gz
(56.0 kB
view hashes)
Built Distribution
cy-0.5.8-py3-none-any.whl
(52.8 MB
view hashes)