Skip to main content

Call HS regions from CRISPR tiling screen data and predict HS region from common protein features

Project description

Copyright (C) 2019, MD Anderson Cancer Center (whe3@mdanderson.org)

ProTiler

ProTiler is a novel computational method for fine-mapping of protein regions that are hyper-sensitive to CRISPR/Cas9 mediated gene knockouts from high-throughput tiling-sgRNA functional screens.

Also, ProTiler is able to predict HS regions for protein encoded by any given gene from other common protein features including conservation, domain annotation, secondary structures and PTMs distribution.

If you use ProTiler please cite the following paper we published on Nature Communications:

He et al. De novo identification of essential protein domains from CRISPR-Cas9 tiling-sgRNA knockout screens. Nat Commun 10, 4547(2019).

Installation

ProTiler is written in Python and R, Python>=2.7 and R>=3.5.0 is needed

Dependencies

Python Packages:

  • matplotlib
  • pandas
  • numpy
  • seaborn

R packages:

  • breakfast
  • stringr

Step1: Install Anaconda (highly recomended)

wget https://repo.continuum.io/archive/Anaconda2-2018.12-Linux-x86_64.sh 
bash Anaconda2-2018.12-Linux-x86_64.sh 

Step2: Install required packages

Install Python Packages with pip:

pip install matplotlib==2.2.3 pandas sklearn numpy seaborn

Install R packages in R IDE:

install.packages('breakfast')
install.packages('stringr')

Step3: Install ProTiler

Through git clone

git clone https://github.com/MDhewei/ProTiler-1.0.0.git
cd ProTiler-1.0.0
python setup.py install

Through pip

pip install protiler

Usage

ProTiler has two major functions:

1. Call: Call and visualize HS regions from CRISPR tiling screen data.

Protiler call take table file(.cvs or .txt) recording CRISPR tiling screen data as inpu.An example is shown as below:

Three colums are required:

  • Symbol: This column record the symbol of target gene, for example: 'CREBBP','ACTL6A'

  • AA: the amino acid position which certain sgRNAs cutting (what does this means?)

  • CRISPR score: the signals for each sgRNA, in the example file, z-scores in three different cell lines are used. User should select at least one column.

Arguments of the program:

Required arguments:

  • -i/--inputfile:

    the file path to the input table recording tiling CRISPR sgRNA annotations and signals. .csv,.txt,.xlsx format are supported

  • -g/--gene_id:

    the official symbol of target gene, for example: 'CREBBP','ACTL6A'

  • -s/--score_columns:

    the column number(s) of input table that recording CRISPR knowckout scores

Optional arguments:

  • -o/--outputdir:

    the directory name created in the current working directory to save output files, default='ProTilerOutput'

  • -f/--half_size:

    The number of neiboring signals from each side selected to filter inefficient sgRNAs',default='5'

  • -t1/--threshold:

    Threshold to supress the outliers among the signals',default='2'

  • -t2/--threshold2:

    Threshold to detect changing points using TGUH method',default='1.5'

Example to run protiler call

protiler call -i sample.txt -g CREBBP -s 9,10,11 -o ProtilerOutput

Output

1. SegmentFile: A table record all the HS regions called by ProTiler for certain gene.

  • AA.start: the start residue position of the segments called with TGUH
  • AA.end: the end residue position of the segments called with TGUH
  • n: the number of sgRNAs targeting the region
  • m: the mean score of sgRNAs targeting the region
  • is.HS.site: to judge whether the segment is a hyper-sensitive region
  • length: the length of the segment
  • Gene: the symbol of the target gene

2. Figure4Visualization: Figure presenting signals, HS regions and other protein annotations. For example:

2. Predict: Predict and visualize HS regions from common protein features.

Arguments of the program:

Required arguments:

  • -l/--gene_list:

    A list of candidate genes for which you want to predict HS regions. eg: CREBBP,FAM122A,AURKB

Optional arguments:

  • -b1/--bandwidth1:

    Bandwidth for PTMs kernel density estimation training

  • -b2/--bandwidth2::

    Bandwideth for SIFT score kernel density estimation training

  • -o/--outputdir:

    the directory name created to save output files

  • -m/--gamma:

    The gamma parameter for SVM model,default='10'

  • -c/--penalty:

    The penalty parameter for SVM model,default='0.01'

Example to run protiler predict

protiler predict -l CREBBP,FAM122A,SAMRCB1,AURKB -o ProtilerOutput

Output

1. PredictionTable: A table record all the features of target protein and SVM score/class at each residue postion.

2. Figure4Visualization: Figure presenting predicted HS regions and other protein annotations. For example:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protiler-1.0.2.tar.gz (44.0 MB view details)

Uploaded Source

File details

Details for the file protiler-1.0.2.tar.gz.

File metadata

  • Download URL: protiler-1.0.2.tar.gz
  • Upload date:
  • Size: 44.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for protiler-1.0.2.tar.gz
Algorithm Hash digest
SHA256 3bde8fd7c4691d1dd783817b2bb11957197406efd0efbbab74b9b2072c7f4d92
MD5 6046a5881e3189bdd7292a6caafe7a2c
BLAKE2b-256 cae3a5da3443ed521a3883d17702ca22e32ea15a5ff7cb9f291191ba22bfdc70

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page