A package for identifying the translated ORFs using ribosome-profiling data
Project description
RiboCode is a very simple but high-quality computational algorithm to identify genome-wide translated ORFs using ribosome-profiling data.
Dependencies:
pysam
pyfasta
h5py
Biopython
Numpy
Scipy
matplotlib
setuptools
Installation
RiboCode can be installed like any other Python packages. Here are some popular ways:
Install from PyPI:
pip install RiboCode
Install from local:
pip install RiboCode-*.tar.gz
If you have not administrator permission, you need to install RiboCode locally in you own directory by adding the option --user to installation commands. Then, you need to add ~/.local/bin/ to the PATH variable, and ~/.local/lib/ to the PYTHONPATH variable. For example, if you are using the bash shell, you would do this by adding the following lines to your ~/.bashrc file:
export PATH=$PATH:$HOME/.local/bin/
export PYTHONPATH=$HOME/.local/lib/python2.7
You then need to source your ~/.bashrc file by this command:
source ~/.bashrc
Tutorial to analyze ribosome-profiling data and run RiboCode
Here, we use the HEK293 dataset as an example to illustrate the use of RiboCode. Please make sure the path of file is correctly.
Required files
The genome FASTA file, GTF file for annotation can be downloaded from:
or from:
http://asia.ensembl.org/info/data/ftp/index.html
http://useast.ensembl.org/info/data/ftp/index.html
For example, the required files in this tutorial can be downloaded from following URL:
GTF: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
FASTA: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz
The raw Ribo-seq FASTQ file can be download by using fastq-dump tool from SRA_Toolkit:
fastq-dump -A <SRR1630831>
Trimming adapter sequence for ribo-seq data
Using cutadapt program https://cutadapt.readthedocs.io/en/stable/installation.html
Example:
cutadapt -m 20 --match-read-wildcards -a (Adapter sequence) -o <Trimmed fastq file> <Input fastq file>
Here, the adapter sequences for this data had already been trimmed off, so we can skip this step.
Removing ribosomal RNA(rRNA) derived reads
Align the trimmed reads to rRNA sequences using Bowtie, then select unaligned reads for the next step.
Bowtie program http://bowtie-bio.sourceforge.net/index.shtml
rRNA sequences: We provided a rRNA.fa file in data folder of this package.
Example:
bowtie-build <rRNA.fa> rRNA bowtie -p 8 -norc --un un_aligned.fastq rRNA -q <SRR1630831.fastq> <HEK293_rRNA.align>
Aligning the clean reads to reference genome
Using STAR program: https://github.com/alexdobin/STAR
Example:
(1). Build index
STAR --runThreadN 8 --runMode genomeGenerate --genomeDir <hg19_STARindex> --genomeFastaFiles <hg19_genome.fa> --sjdbGTFfile <gencode.v19.annotation.gtf>
(2). Alignment:
STAR --outFilterType BySJout --runThreadN 8 --outFilterMismatchNmax 2 --genomeDir <hg19_STARindex> --readFilesIn <un_aligned.fastq> --outFileNamePrefix (HEK293) --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM GeneCounts --outFilterMultimapNmax 1 --outFilterMatchNmin 16
Running *RiboCode* to identify translated ORFs
(1). Preparing the transcripts annotation files:
prepare_transcripts -g <gencode.v19.annotation.gtf> -f <hg19_genome.fa> -o <RiboCode_annot>
(2). Selecting the length range of the RPF reads and identify the P-site locations:
metaplots -a <RiboCode_annot> -r <HEK293Aligned.toTranscriptome.out.bam>
This step will generate a PDF file, which plots the aggregate profiles of the distance between the 5’-end of reads and the annotated start codons or stop codons.
Users can select the read lengths which show strong 3-nt periodicity and identify the P-site locations for each length.
(3). Detecting translated ORFs using the ribosome-profiling data:
RiboCode -a <RiboCode_annot> -c <config.txt> -l no -o <RiboCode_ORFs_result>
Specify the information of the bam file and P-site parameters in config.txt, please refer to the example file in data folder.
Explanation of final result files
The RiboCode generates two text files as below: The “(output file name).txt” contains the information of predicted ORFs in each transcript; The “(output file name)_collapsed.txt” file combines the ORFs with the same stop codon in different transcript isoforms: the one harboring the most upstream in-frame ATG is chosen. Some column names of the result file:
- ORF_ID: The identifier of ORFs that predicated. - ORF_type: The type of ORF. The following ORF categories are reported: "annotated" (overlapping annotated CDS, have the same stop with annnotated CDS) "uORF" (in upstream of annotated CDS, not overlapping annotated CDS) "dORF" (in downstream of annotated CDS, not overlapping annotated CDS) "Overlap_uORF" (in upstream of annotated CDS, overlapping annotated CDS) "Overlap_dORF" (in downstream of annotated CDS, overlapping annotated CDS" "Internal" (in internal of annotated CDS, but in a different frame relative annotated CDS) "novel" (in non-coding genes or non-coding transcripts of coding genes). - ORF_tstart, ORF_tstop: the beginning and end of ORF in RNA transcript (1-based coordinate) - ORF_gstart, ORF_gstop: the beginning and end of ORF in genome (1-based coordinate) - pval_frame0_vs_frame1: significance levels of P-site densities of frame0 greater than of frame1 - pval_frame0_vs_frame2: significance levels of P-site densities of frame0 greater than of frame2 - pval_combined: integrated P-value
(4). (optional) plot the P-site densities of predicted ORFs
Users can plot the density of predicted ORFs using the “plot_orf_density” command, as example below:
plot_orf_density -a <RiboCode_annot> -c <config.txt> -t (transcript_id) -s (ORF_gstart) -e (ORF_gstop)
For any questions, please contact:
Zhengtao Xiao (xzt13@mails.tsinghua.edu.cn)
Rongyao Huang (THUhry12@163.com)
Xudong Xing (xudonxing_bioinf@sina.com)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file RiboCode-1.2.2.tar.gz
.
File metadata
- Download URL: RiboCode-1.2.2.tar.gz
- Upload date:
- Size: 38.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f83b6f51062df9f4e98b9c12b533bba5f43d110b9b9468d8ce82319deee60c5 |
|
MD5 | 7357963ca3b0c3817831eafdca5bd26d |
|
BLAKE2b-256 | 2513df4a2fe23df14382003d0b0a12f8a988339589a5b57664fe69e92e4bfddd |
File details
Details for the file RiboCode-1.2.2-py2.py3-none-any.whl
.
File metadata
- Download URL: RiboCode-1.2.2-py2.py3-none-any.whl
- Upload date:
- Size: 28.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db2a1c083aea689c3985443ef59f3e6f16c5eaf9c6bbe67b745468f6d5c3b1e8 |
|
MD5 | 37e6bc9e766f84ff8d919636c55bab8a |
|
BLAKE2b-256 | fb6455b8fc775f2361423ca75eb48a30255cc7585c7a701b8d0f71121c43a06b |