Tracking-seq data analysis
Project description
OFF-TRACKER
OFF-TRACKER is an end to end pipeline of Tracking-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
System requirements
- Linux/Unix
- Python >= 3.6
Dependency
# We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
# If you don't use mamba, just replace the code with conda
mamba create -n offtracker -c bioconda blast snakemake pybedtools
Installation
# Activate the environment
conda activate offtracker
# Direct installation with pip
pip install offtracker
# (Alternative) Download the offtracker from github
git clone https://github.com/Lan-lab/offtracker.git
cd offtracker
pip install .
Before analyzing samples
# Build blast index (only need once for each genome)
makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
-in /Your_Path_To_Reference/hg38_genome.fa \
-out /Your_Path_To_Reference/hg38_genome.blastdb \
-logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
# Build chromap index (only need once for each genome)
chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
-o /Your_Path_To_Reference/hg38_genome.chromap.index
# Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
# --name: the name of the sgRNA, which will be used in the following analysis
offtracker_candidates.py -t 8 -g hg38 \
-r /Your_Path_To_Reference/hg38_genome.fa \
-b /Your_Path_To_Reference/hg38_genome.blastdb \
--name 'VEGFA2' --sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG' \
-o /Your_Path_To_Candidates
Strand-specific mapping of Tracking-seq data
# Generate snakemake config file
# --subfolder: If different samples are in seperate folders, set this to 1
# if -o is not set, the output will be in the same folder as the fastq files
offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
-r /Your_Path_To_Reference/hg38_genome.fa \
-i /Your_Path_To_Reference/hg38_genome.chromap.index \
-f /Your_Path_To_Fastq \
-o /Your_Path_To_Output \
--subfolder 0
# Run the snakemake program
cd /Your_Path_To_Fastq
snakemake -np # dry run
nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
## about cores
# --cores of snakemake must be larger than -t of offtracker_config.py
# parallel number = cores/t
## about output
# This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
# "*.fw.bed" and "*.rv.bed" are used in the next part.
Analyzing the genome-wide off-target sites
# In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
offtracker_analysis.py -g hg38 --name "VEGFA2" \
--exp 'Cas9_VEGFA2' \
--control 'WT' \
--outname 'Cas9_VEGFA_293' \
-f /Your_Path_To_Output \
--seqfolder /Your_Path_To_Candidates
# --name: the same gRNA name you set when running offtracker_candidates.py
# --exp/--control: add one or multiple patterns of file name in regular expressions
# If multiple samples meet the pattern, their signals will be averaged. Thus, only samples with the same condition should be included in a single analysis.
# This step will generate Offtracker_result_{outname}.csv
# Default FDR is 0.05, which can be changed by --fdr. This will empirically make the threshold of Track score around 2.
# Sites with Track score >=2, which is a empirical threshold, are output regardless of FDR.
# Intermediate files are saved in ./temp folder, which can be deleted.
# Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
Off-target sequences visualization
# After get the Offtracker_result_{outname}.csv, you can visualize the off-target sites with their genomic sequence with the following command:
offtracker_plot.py --result Your_Offtracker_Result_CSV \
--sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG'
# The default output is a pdf file with Offtracker_result_{outname}.pdf
# Change the suffix of the output file to change the format (e.g.: .png)
# The orange dash line indicates the empirical threshold of Track score = 2
# Empirically, the off-target sites with Track score < 2 are less likely to be real off-target sites.
Note1
The default setting only includes chr1-chr22, chrX, chrY, and chrM. Please make sure the reference genome contains "chr" at the beginning.
Currently, this software is only ready-to-use for mm10 and hg38. For any other genome, e.g., hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping and instal manually. Besides, add "--blacklist none" or "--blacklist Your_Blacklist" (e.g., ENCODE blacklist) when running offtracker_config.py, because we only provide blacklists for mm10 and hg38.
If you have a requirement for species other than human/mouse, please post an issue.
Note2
The FDRs in the Tracking-seq result do not reflect the real off-target probability. It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using genome browser like IGV to visually inspect each target location from the Tracking-seq result.
Example Data
Here are example data that contains reads of chr6 from HEK293T cells edited with Cas9 + sgRNA VEGFA2 and wild type cells:
https://figshare.com/articles/dataset/WT_HEK239T_chr6/25956034
It takes about 5-10 minutes to run the mapping (offtracker_config.py & snakemake) of example data with -t 8 and --cores 16 (2 parallel tasks)
Signal visualization
After mapping, there will be 4 .bw files in the output folder:
Cas9_VEGFA2_chr6.fw.scaled.bw
Cas9_VEGFA2_chr6.rv.scaled.bw
WT_chr6.fw.scaled.bw
WT_chr6.rv.scaled.bw
These files can be visualized in genome browser like IGV:
Whole genome off-target analysis
For analyzing the signals (offtracker_analysis.py), it takes about 3-5 minutes and outputs a file named "Offtracker_result_{outname}.csv"
After that, you can visualize the off-target sites with their genomic sequence (offtracker_plot.py) and get an image like this:
Citation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.