Tracking-seq data analysis
Project description
OFF-TRACKER
OFF-TRACKER is an end to end pipeline of Tracking-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
System requirements
- Linux/Unix
- Python >= 3.6
Dependency
# We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
# If you don't use mamba, just replace the code with conda
mamba create -n offtracker -c bioconda blast snakemake pybedtools
Installation
# Activate the environment
conda activate offtracker
# Direct installation with pip
pip install offtracker
# (Alternative) Download the offtracker from github
git clone https://github.com/Lan-lab/offtracker.git
cd offtracker
pip install .
Before analyzing samples
# Build blast index (only need once for each genome)
makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
-in /Your_Path_To_Reference/hg38_genome.fa \
-out /Your_Path_To_Reference/hg38_genome.blastdb \
-logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
# Build chromap index (only need once for each genome)
chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
-o /Your_Path_To_Reference/hg38_genome.chromap.index
# Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
# --name: the name of the sgRNA, which will be used in the following analysis
offtracker_candidates.py -t 8 -g hg38 \
-r /Your_Path_To_Reference/hg38_genome.fa \
-b /Your_Path_To_Reference/hg38_genome.blastdb \
--name 'VEGFA2' --sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG' \
-o /Your_Path_To_Candidates
Strand-specific mapping of Tracking-seq data
# Generate snakemake config file
# --subfolder: If different samples are in seperate folders, set this to 1
# if -o is not set, the output will be in the same folder as the fastq files
offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
-r /Your_Path_To_Reference/hg38_genome.fa \
-i /Your_Path_To_Reference/hg38_genome.chromap.index \
-f /Your_Path_To_Fastq \
-o /Your_Path_To_Output \
--subfolder 0
# Run the snakemake program
cd /Your_Path_To_Fastq
snakemake -np # dry run
nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
## about cores
# --cores of snakemake must be larger than -t of offtracker_config.py
# parallel number = cores/t
## about output
# This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
# "*.fw.bed" and "*.rv.bed" are used in the next part.
Analyzing the genome-wide off-target sites
# In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
offtracker_analysis.py -g hg38 --name "VEGFA2" \
--exp 'Cas9_VEGFA2' \
--control 'WT' \
--outname 'Cas9_VEGFA_293' \
-f /Your_Path_To_Output \
--seqfolder /Your_Path_To_Candidates
# --name: the same gRNA name you set when running offtracker_candidates.py
# --exp/--control: add one or multiple patterns of file name in regular expressions
# If multiple samples meet the pattern, their signals will be averaged. Thus, only samples with the same condition should be included in a single analysis.
# This step will generate Offtracker_result_{outname}.csv
# Default FDR is 0.05, which can be changed by --fdr. This will empirically make the threshold of Track score around 2.
# Sites with Track score >=2, which is a empirical threshold, are output regardless of FDR.
# Intermediate files are saved in ./temp folder, which can be deleted.
# Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
Off-target sequences visualization
# After get the Offtracker_result_{outname}.csv, you can visualize the off-target sites with their genomic sequence with the following command:
offtracker_plot.py --result Your_Offtracker_Result_CSV \
--sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG'
# The default output is a pdf file with Offtracker_result_{outname}.pdf
# Change the suffix of the output file to change the format (e.g.: .png)
# The orange dash line indicates the empirical threshold of Track score = 2
# Empirically, the off-target sites with Track score < 2 are less likely to be real off-target sites.
Note1
The default setting only includes chr1-chr22, chrX, chrY, and chrM. Please make sure the reference genome contains "chr" at the beginning.
Currently, this software is only ready-to-use for mm10 and hg38. For any other genome, e.g., hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping and instal manually. Besides, add "--blacklist none" or "--blacklist Your_Blacklist" (e.g., ENCODE blacklist) when running offtracker_config.py, because we only provide blacklists for mm10 and hg38.
If you have a requirement for species other than human/mouse, please post an issue.
Note2
The FDRs in the Tracking-seq result do not reflect the real off-target probability. It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using genome browser like IGV to visually inspect each target location from the Tracking-seq result.
Example Data
Here are example data that contains reads of chr6 from HEK293T cells edited with Cas9 + sgRNA VEGFA2 and wild type cells:
https://figshare.com/articles/dataset/WT_HEK239T_chr6/25956034
It takes about 5-10 minutes to run the mapping (offtracker_config.py & snakemake) of example data with -t 8 and --cores 16 (2 parallel tasks)
Signal visualization
After mapping, there will be 4 .bw files in the output folder:
Cas9_VEGFA2_chr6.fw.scaled.bw
Cas9_VEGFA2_chr6.rv.scaled.bw
WT_chr6.fw.scaled.bw
WT_chr6.rv.scaled.bw
These files can be visualized in genome browser like IGV:
Whole genome off-target analysis
For analyzing the signals (offtracker_analysis.py), it takes about 3-5 minutes and outputs a file named "Offtracker_result_{outname}.csv"
After that, you can visualize the off-target sites with their genomic sequence (offtracker_plot.py) and get an image like this:
Citation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file offtracker-2.7.10.zip
.
File metadata
- Download URL: offtracker-2.7.10.zip
- Upload date:
- Size: 3.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d124c0671383e0cd3fc97ab6aec7b0ca557f5ee7577a00881e10ce0ee53e738e |
|
MD5 | 0fdc8b1b0642aea8c5e7aaf5515e398e |
|
BLAKE2b-256 | b64c27d4b46322f7f0a576d7894e60746796355ec3ad4e8d82ef2332fdb6a154 |