Skip to main content

Python package to detect translating ORF from Ribo-seq data

Project description

Accurate detection of short and long active ORFs using Ribo-seq data

install with pip install with bioconda


We highly recommend that you install ribotricer via conda:

conda install -c bioconda ribotricer

To install locally, you can either download the source code from release or clone the latest version using git clone. After you get a copy of the source code, please change into the directory where the source code locates, and type

python install

NOTE: The above will install the following depencies:


If some of these are already present, they might be replaced by the designated version. So we strongly recommend creating a separate enrivoment (using venv or conda) before installing ribotricer.

Workflow of ribotricer

In order to run ribotricer, you need to have the following three files prepared including:

  • genome annotation file in GTF format, supporting both GENCODE and Ensembl annotation
  • reference genome file in FASTA format
  • alignment file in BAM format

Preparing candidate ORFs

The first step of ribotricer is to take the GTF file and the FASTA file to find all candidate ORFs. In order to generate all candidate ORFs, please run

ribotricer prepare-orfs --gtf {GTF} --fasta {FASTA} --prefix {RIBOTRICER_INDEX_PREFIX}

The command above by default only includes ORFs with length longer than 60 nts, and only uses 'ATG' as start codon. You can change the setting by including options --min_orf_length and --start_codons.

Output: {PREFIX}_candidate_orfs.tsv.

Detecting translating ORFs

The second step of ribotricer is to take the index file generated by prepare-orfs and the BAM file to detect the actively translating ORFs by assessing the periodicity of all candidate ORFs:

ribotricer detect-orfs --bam {BAM} --ribotricer_index {RIBOTRICER_INDEX_PREFIX}_candidate_ORFs.tsv --prefix {OUTPUT_PREFIX}

NOTE: This above command, by default, uses a phase-score cutoff of 0.428. Our species specific recommended cutoffs are as follows:

Species Cutoff
Arabidopsis 0.330
C. elegans 0.239
Baker's Yeast 0.318
Drosophila 0.181
Human/Mouse 0.428
Rat 0.453
Zebrafish 0.249

In order to assign non-translating or translating status, ribotricer by default uses a cutoff threshold of 0.428. ORFs with phase score above 0.428 are marked as translating as long as they have at least five codons with non-zero read count. Ribotricer does not take coverage into account for predicting an ORF to be translating or not-translating. Apart from these two criteria, there is no other requirement for an ORF to be active. the cutoff and the number of valid codons can also be specified by the user by --phase_score_cutoff and --min_valid_codons options respectively. If the number of codons are lesser than --min_valid_codons, the ORF status is assigned to be non-translating irrespective of its phase score.

The ORF detection step consists of several small steps including:

  1. Infer the experimental protocol (strandedness of the reads)
    You can directly assign the strandedness using option --stranded, it can be 'yes', 'no', or 'reverse'. If this option is not provided, ribotricer will automatically infer the experimental protocol by comparing the strand of reads to the reference.

Output: {OUTPUT_PREFIX}_protocol.txt

  1. Split the bam file by strand and read length
    In this step, all mapped reads will be filtered to include only uniquely mapped reads. Reads will be split by strand and read length with respect to the strandedness provided or inferred from the previous step. If you only want to include certain read lengths, they can be assigned with option --read_lengths.
    Output: {OUTPUT_PREFIX}_bam_summary.txt

  2. Plot read length distribution
    In this step, read length distribution will be plotted and serves as quality control
    Output: {OUTPUT_PREFIX}_read_length_dist.pdf

  3. Calculate metagene profiles
    In this step, the metagene profile of all CDS transcripts for each read length is calculated by aligning with start codon or stop codon.
    Output: {OUTPUT_PREFIX}_metagene_profiles_5p.tsv is the metagene profile aligning with the start codon and {OUTPUT_PREFIX}_metagene_profiles_3p.tsv is the metagene profile aligning with the stop codon

  4. Plot metagene profiles
    In this step, metagene plots will be made to serve as quality control.
    Output: {OUTPUT_PREFIX}_metagene_plots.pdf

  5. Align metagene profiles
    If the P-site offsets are not provided, this step will use cross-correlation to find out the relative offsets between different read lengths
    Output: {OUTPUT_PREFIX}_psite_offsets.txt

  6. merge reads from different read lengths based on P-site offsets
    This step will integrate reads of different read lengths by shifting with the P-site offsets

  7. Export wig file
    A WIG file is exported in this step to be used for visualization in Genome Browser
    Output: {OUTPUT_PREFIX}_pos.wig for the positive strand and {OUTPUT_PREFIX}_neg.wig for the negative strand.

  8. Export actively translating ORFs
    The periodicity of all ORF profiles are assessed and the translating ones are outputed. You can output all ORFs regardless of the translation status with option --report_all
    Output: {OUTPUT_PREFIX}_translating_ORFs.tsv

Definition of ORF types

Ribotricer reports eight different ORF types as defined below:

  • annotated: CDS annotated in the provided GTF file
  • super_uORF: upstream ORF of the annotated CDS, not overlapping with any CDS of the same gene
  • super_dORF: downstream ORF of the annotated CDS, not overlapping with any CDS of the same gene
  • uORF: upstream ORF of the annotated CDS, not overlapping with the main CDS
  • dORF: downstream ORF of the annotated CDS, not overlapping with the main CDS
  • overlap_uORF: upstream ORF of the annotated CDS, overlapping with the main CDS
  • overlap_dORF: downstream ORF of the annotated CDS, overlapping with the main CDS
  • novel: ORF in non-coding genes or in non-coding transcripts of coding genes

Contacts and bug reports

Andrew D. Smith

Wenzheng Li

Saket Choudhary

We are dedicated to make the best ORF detector for Ribo-seq data analysis. If you found a bug or mistake in this project, we would like to know about it. Before you send us the bug report though, please check the following:

  1. Are you using the latest version? The bug you found may already have been fixed.
  2. Check that your input is in the correct format and you have selected the correct options.
  3. Please reduce your input to the smallest possible size that still produces the bug; we will need your input data to reproduce the problem, and the smaller you can make it, the easier it will be.


Ribotricer for detecting actively translating ORFs from Ribo-seq data Copyright (C) 2018 Andrew D Smith, Wenzheng Li, Saket Choudhary and the University of Southern California

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ribotricer, version 1.1.1
Filename, size File type Python version Upload date Hashes
Filename, size ribotricer-1.1.1-py3-none-any.whl (51.8 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size ribotricer-1.1.1.tar.gz (42.7 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page