Skip to main content

Modelling CRISPR dropout data

Project description

Crispy logo

License PyPI version DOI

Module with utility functions to process CRISPR-based screens and method to correct gene independent copy-number effects.

Description

Crispy uses Sklearn implementation of Gaussian Process Regression, fitting each sample independently.

Install

Install pybedtools and then install Crispy

conda install -c bioconda pybedtools

pip install cy

Examples

Support to library imports:

from crispy.CRISPRData import Library

# Master Library, standardised assembly of KosukeYusa V1.1, Avana, Brunello and TKOv3 
# CRISPR-Cas9 libraries.
master_lib = Library.load_library("MasterLib_v1.csv.gz")


# Genome-wide minimal CRISPR-Cas9 library. 
minimal_lib = Library.load_library("MinLibCas9.csv.gz")

# Some of the most broadly adopted CRISPR-Cas9 libraries:
# 'Avana_v1.csv.gz', 'Brunello_v1.csv.gz', 'GeCKO_v2.csv.gz', 'Manjunath_Wu_v1.csv.gz', 
# 'TKOv3.csv.gz', 'Yusa_v1.1.csv.gz'
brunello_lib = Library.load_library("Brunello_v1.csv.gz")

Select sgRNAs (across multiple CRISPR-Cas9 libraries) for a given gene:

from crispy.GuideSelection import GuideSelection

# sgRNA selection class
gselection = GuideSelection()

# Select 5 optimal sgRNAs for MCL1 across multiple libraries 
gene_guides = gselection.select_sgrnas(
    "MCL1", n_guides=5, offtarget=[1, 0], jacks_thres=1, ruleset2_thres=.4
)

# Perform different rounds of sgRNA selection with increasingly relaxed efficiency thresholds 
gene_guides = gselection.selection_rounds("TRIM49", n_guides=5, do_amber_round=True, do_red_round=True)

Copy-number correction:

import crispy as cy
import matplotlib.pyplot as plt
from crispy.CRISPRData import ReadCounts, Library

"""
Import sample data
"""
rawcounts, copynumber = cy.Utils.get_example_data()

"""
Import CRISPR-Cas9 library

Important:
      Library has to have the following columns: "Chr", "Start", "End", "Approved_Symbol"
      Library and segments have to have consistent "Chr" formating: "Chr1" or "chr1" or "1"
      Gurantee that "Start" and "End" columns are int
"""
lib = Library.load_library("Yusa_v1.1.csv.gz")

lib = lib.rename(
    columns=dict(start="Start", end="End", chr="Chr", Gene="Approved_Symbol")
).dropna(subset=["Chr", "Start", "End"])

lib["Chr"] = "chr" + lib["Chr"]

lib["Start"] = lib["Start"].astype(int)
lib["End"] = lib["End"].astype(int)

"""
Calculate fold-change
"""
plasmids = ["ERS717283"]
rawcounts = ReadCounts(rawcounts).remove_low_counts(plasmids)
sgrna_fc = rawcounts.norm_rpm().foldchange(plasmids)

"""
Correct CRISPR-Cas9 sgRNA fold changes
"""
crispy = cy.Crispy(
    sgrna_fc=sgrna_fc.mean(1), copy_number=copynumber, library=lib.loc[sgrna_fc.index]
)

# Fold-changes and correction integrated funciton.
# Output is a modified/expanded BED formated data-frame with sgRNA and segments information
#   n_sgrna: represents the minimum number of sgRNAs required per segment to consider in the fit.
#            Recomended default values range between 4-10.
bed_df = crispy.correct(n_sgrna=10)
print(bed_df.head())

# Gaussian Process Regression is stored
crispy.gpr.plot(x_feature="ratio", y_feature="fold_change")
plt.show()

GPR

Credits and License

Developed at the Wellcome Sanger Institue (2017-2020).

For citation please refer to:

Gonçalves E, Behan FM, Louzada S, Arnol D, Stronach EA, Yang F, Yusa K, Stegle O, Iorio F, Garnett MJ (2019) Structural rearrangements generate cell-specific, gene-independent CRISPR-Cas9 loss of fitness effects. Genome Biol 20: 27

Gonçalves E, Thomas M, Behan FM, Picco G, Pacini C, Allen F, Parry-Smith D, Iorio F, Parts L, Yusa K, Garnett MJ (2019) Minimal genome-wide human CRISPR-Cas9 library. bioRxiv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cy-0.5.8.tar.gz (56.0 kB view details)

Uploaded Source

Built Distribution

cy-0.5.8-py3-none-any.whl (52.8 MB view details)

Uploaded Python 3

File details

Details for the file cy-0.5.8.tar.gz.

File metadata

  • Download URL: cy-0.5.8.tar.gz
  • Upload date:
  • Size: 56.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for cy-0.5.8.tar.gz
Algorithm Hash digest
SHA256 b0555d9e6aef68fbe9868aaf950f8aec0d42c1ce05108a1d3e562749f1114e5b
MD5 a6f968e9aac9e3b695b7572bf2592b26
BLAKE2b-256 0159217fe9c5cab3da35afd6a0bfa3ad0ba982014554f26efcac7588ea823bc3

See more details on using hashes here.

File details

Details for the file cy-0.5.8-py3-none-any.whl.

File metadata

  • Download URL: cy-0.5.8-py3-none-any.whl
  • Upload date:
  • Size: 52.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for cy-0.5.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f10d11e64eab9b1aa198d1a9e5ec3eb460781c27822f903fd68778f61c31e4a2
MD5 69e71d2b8cc9c1aa07b9b0380f971bb6
BLAKE2b-256 17deeb1c2b0eca44eeb9a7bad2b9fd8b87394afb254ff5917f1084feb0781e74

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page