Python package to detect translating ORFs from Ribo-seq data

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

saketkc

These details have not been verified by PyPI

Project description

alt text

ribotricer: Accurate detection of short and long active ORFs using Ribo-seq data

python versions

Publication | PDF | Supplementary File | Benchmarking scripts

Installation

We highly recommend that you install ribotricer via conda in a clean environment:

conda create -n ribotricer_env -c bioconda ribotricer
conda activate ribotricer_env
ribotricer --help

To install locally, you can either download the source code from release or clone the latest version using git clone. After you get a copy of the source code, please change into the source directory and run:

make install

NOTE: ribotricer will install the following dependencies (If some of these are already present, they might be replaced by the designated version):

pyfaidx>=0.5.0
pysam>=0.11.2.2
numpy>=1.11.0
pandas>=0.20.3
scipy>=0.19.1
matplotlib>=2.1.0
click>=6.0
click-help-colors>=0.3
quicksect>=0.2.0
tqdm>=4.23.4

Workflow of ribotricer

In order to run ribotricer, you need to have the following three files prepared including:

genome annotation file in GTF format: our implementation handles all variations of GTFs besides the often used GENCODE and Ensembl hosted ones
reference genome file in FASTA format
alignment file in BAM format

Preparing candidate ORFs

The first step of ribotricer is to take the GTF file and the FASTA file to find all candidate ORFs. In order to generate all candidate ORFs, please run

ribotricer prepare-orfs --gtf {GTF} --fasta {FASTA} --prefix {RIBOTRICER_INDEX_PREFIX}

The command above by default only includes ORFs with length longer than 60 nts, and only uses 'ATG' as start codon. You can change the setting by including options --min_orf_length and --start_codons.

Output: {PREFIX}_candidate_orfs.tsv.

Detecting translating ORFs

The second step of ribotricer is to take the index file generated by prepare-orfs and the BAM file to detect the actively translating ORFs by assessing the periodicity of all candidate ORFs:

ribotricer detect-orfs \
             --bam {BAM} \
             --ribotricer_index {RIBOTRICER_INDEX_PREFIX}_candidate_ORFs.tsv \
             --prefix {OUTPUT_PREFIX}

NOTE: This above command, by default, uses a phase-score cutoff of 0.428. Our species specific recommended cutoffs are as follows:

Species	Cutoff
Arabidopsis	0.330
C. elegans	0.239
Baker's Yeast	0.318
Drosophila	0.181
Human	0.440
Mouse	0.418
Rat	0.453
Zebrafish	0.249

In order to assign non-translating or translating status, ribotricer by default uses a cutoff threshold of 0.428. ORFs with phase score above 0.428 are marked as translating as long as they have at least five codons with non-zero read count. By default, ribotricer does not take coverage into account for predicting an ORF to be translating or not-translating. However, this behavior can be changed by following filters:

--min_valid_codons (default=5): Minimum number of codons with non-zero reads for determining active translation
--min_valid_codons_ratio (default=0): Minimum ratio of codons with non-zero reads to total codons for determining active translation
--min_reads_per_codon (default=0): Minimum number of reads per codon for determining active translation
--min_read_density (default=0.0): Minimum read density (total_reads/length) over an ORF total codons for determining active translation

For each of the above filters, an ORF failing any of the filters is marked as non-translating.

For example, to ensure that each ORF has at least 3/4 of its codons non-empty, we can specify --min_valid_codons_ratio to be 0.75:


ribotricer detect-orfs \
             --bam {BAM} \
             --ribotricer_index {RIBOTRICER_INDEX_PREFIX}_candidate_ORFs.tsv \
             --prefix {OUTPUT_PREFIX}
             --min_valid_codons_ratio 0.75

The ORF detection step consists of several small steps including:

Infer the experimental protocol (strandedness of the reads)
You can directly assign the strandedness using option --stranded, it can be 'yes', 'no', or 'reverse'. If this option is not provided, ribotricer will automatically infer the experimental protocol by comparing the strand of reads to the reference.

Output: {OUTPUT_PREFIX}_protocol.txt

Split the bam file by strand and read length
In this step, all mapped reads will be filtered to include only uniquely mapped reads. Reads will be split by strand and read length with respect to the strandedness provided or inferred from the previous step. If you only want to include certain read lengths, they can be assigned with option --read_lengths.
Output: {OUTPUT_PREFIX}_bam_summary.txt
Plot read length distribution
In this step, read length distribution will be plotted and serves as quality control
Output: {OUTPUT_PREFIX}_read_length_dist.pdf
Calculate metagene profiles
In this step, the metagene profile of all CDS transcripts for each read length is calculated by aligning with start codon or stop codon.
Output: {OUTPUT_PREFIX}_metagene_profiles_5p.tsv is the metagene profile aligning with the start codon and {OUTPUT_PREFIX}_metagene_profiles_3p.tsv is the metagene profile aligning with the stop codon
Plot metagene profiles
In this step, metagene plots will be made to serve as quality control.
Output: {OUTPUT_PREFIX}_metagene_plots.pdf
Align metagene profiles
If the P-site offsets are not provided, this step will use cross-correlation to find out the relative offsets between different read lengths
Output: {OUTPUT_PREFIX}_psite_offsets.txt
merge reads from different read lengths based on P-site offsets
This step will integrate reads of different read lengths by shifting with the P-site offsets
Export wig file
A WIG file is exported in this step to be used for visualization in Genome Browser
Output: {OUTPUT_PREFIX}_pos.wig for the positive strand and {OUTPUT_PREFIX}_neg.wig for the negative strand.
Export actively translating ORFs
The periodicity of all ORF profiles are assessed and the translating ones are outputed. You can output all ORFs regardless of the translation status with option --report_all
Output: {OUTPUT_PREFIX}_translating_ORFs.tsv

Definition of ORF types

Ribotricer reports eight different ORF types as defined below:

annotated: CDS annotated in the provided GTF file
super_uORF: upstream ORF of the annotated CDS, not overlapping with any CDS of the same gene (first or most upstream uORF)
super_dORF: downstream ORF of the annotated CDS, not overlapping with any CDS of the same gene (last or most downstream dORF)
uORF: upstream ORF of the annotated CDS, not overlapping with the main CDS
dORF: downstream ORF of the annotated CDS, not overlapping with the main CDS
overlap_uORF: upstream ORF of the annotated CDS, overlapping with the main CDS
overlap_dORF: downstream ORF of the annotated CDS, overlapping with the main CDS
novel: ORF in non-coding genes or in non-coding transcripts of coding genes

Learning cutoff empirically from data

Ribotricer can also learn cutoff empirically from the data. Given at least one Ribo-seq and one RNA-seq BAM file, ribotricer learns the cutoff by running one iteration of the algorithm on the provided files with a prespecified cutoff (--phase_score_cutoff, default: 0.428) and then uses the generated output to find the median difference between Ribo-seq and RNA-seq phase scores of only candidate ORFs with transcript_type set to protein_coding (--filter_by_tx_annotation).

ribotricer learn-cutoff --ribo_bams ribo_bam1.bam,ribo_bam2.bam \
--rna_bams rna_1.bam \
--prefix ribo_rna_prefix \
--ribotricer_index {RIBOTRICER_ANNOTATION}

Visualizing ribotricer output

Ribotricer generates a de-noised profile of read counts for each ORF. We can visualize the read distribution for any ORF. For an example, see this notebook.

Contacts and bug reports

https://github.com/smithlabcode/ribotricer/issues

If you found a bug or mistake in this project, we would like to know about it. Before you send us the bug report though, please check the following:

Are you using the latest version? The bug you found may already have been fixed.
Check that your input is in the correct format and you have selected the correct options.
Please reduce your input to the smallest possible size that still produces the bug; we will need your input data to reproduce the problem, and the smaller you can make it, the easier it will be.

LICENSE

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

saketkc

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.5.0

Feb 14, 2026

1.4.0

Apr 13, 2024

1.3.3

Feb 9, 2023

1.3.2

May 4, 2020

1.3.1

Dec 13, 2019

1.3.0

Nov 1, 2019

1.2.0

Oct 30, 2019

1.1.1

Oct 25, 2019

1.1.0

Sep 27, 2019

1.0.3

Jun 13, 2019

1.0.2

May 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ribotricer-1.5.0.tar.gz (56.4 kB view details)

Uploaded Feb 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ribotricer-1.5.0-py3-none-any.whl (62.5 kB view details)

Uploaded Feb 14, 2026 Python 3

File details

Details for the file ribotricer-1.5.0.tar.gz.

File metadata

Download URL: ribotricer-1.5.0.tar.gz
Upload date: Feb 14, 2026
Size: 56.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ribotricer-1.5.0.tar.gz
Algorithm	Hash digest
SHA256	`b5ea72c623257eaeca021b4b69fe0cd763b37b907dc7594936d54e79ad1db915`
MD5	`b5f5041fff6a803f1acd467ef72bd07e`
BLAKE2b-256	`570c8296767bc4ba64d7ae4b9abe7f9a486f2a1845d3f92c52c815e33a1aaf5f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ribotricer-1.5.0.tar.gz:

Publisher: publish.yml on smithlabcode/ribotricer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ribotricer-1.5.0.tar.gz
- Subject digest: b5ea72c623257eaeca021b4b69fe0cd763b37b907dc7594936d54e79ad1db915
- Sigstore transparency entry: 952359022
- Sigstore integration time: Feb 14, 2026
Source repository:
- Permalink: smithlabcode/ribotricer@7a0f3c33eb4ec5eff7258fe25808cb50b5a3b481
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/smithlabcode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7a0f3c33eb4ec5eff7258fe25808cb50b5a3b481
- Trigger Event: release

File details

Details for the file ribotricer-1.5.0-py3-none-any.whl.

File metadata

Download URL: ribotricer-1.5.0-py3-none-any.whl
Upload date: Feb 14, 2026
Size: 62.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ribotricer-1.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a6b525082ac852e0f926094b83a89e9ede73359c8b465130c4d38fae2339b14d`
MD5	`fb12ab0871488c66f1e84c0778e91310`
BLAKE2b-256	`6d73b2815f4514fecfc5680690935a804bcbbbe280b48d0358ceb658642b7061`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ribotricer-1.5.0-py3-none-any.whl:

Publisher: publish.yml on smithlabcode/ribotricer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ribotricer-1.5.0-py3-none-any.whl
- Subject digest: a6b525082ac852e0f926094b83a89e9ede73359c8b465130c4d38fae2339b14d
- Sigstore transparency entry: 952359025
- Sigstore integration time: Feb 14, 2026
Source repository:
- Permalink: smithlabcode/ribotricer@7a0f3c33eb4ec5eff7258fe25808cb50b5a3b481
- Branch / Tag: refs/tags/v1.5.0
- Owner: https://github.com/smithlabcode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7a0f3c33eb4ec5eff7258fe25808cb50b5a3b481
- Trigger Event: release

ribotricer 1.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ribotricer: Accurate detection of short and long active ORFs using Ribo-seq data

Installation

Workflow of ribotricer

Preparing candidate ORFs

Detecting translating ORFs

Definition of ORF types

Learning cutoff empirically from data

Visualizing ribotricer output

Contacts and bug reports

LICENSE

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance