Skip to main content

A package for identifying the translated ORFs using ribosome-profiling data

Project description

RiboCode is a very simple but high-quality computational algorithm to identify genome-wide translated ORFs using ribosome-profiling data.

Dependencies:

  • pysam

  • pyfasta

  • h5py

  • Biopython

  • Numpy

  • Scipy

  • matplotlib

  • setuptools

Installation

RiboCode can be installed like any other Python packages. Here are some popular ways:

  • Install from PyPI:

pip install RiboCode
  • Install from local:

pip install RiboCode-*.tar.gz

If you have not administrator permission, you need to install *RiboCode* locally in you own directory by adding the
option ``--user`` to installation commands. Then, you need to add ``~/.local/bin/`` to the ``PATH`` variable,
and ``~/.local/lib/`` to the ``PYTHONPATH`` variable. For example, if you are using the bash shell, you would do
this by adding the following lines to your ``~/.bashrc`` file:
export PATH=$PATH:$HOME/.local/bin/
export PYTHONPATH=$HOME/.local/lib/python2.7

You then need to source your ~/.bashrc file by this command:

source ~/.bashrc

Tutorial to analyze ribosome-profiling data and run RiboCode

Here, we use the HEK293 dataset as an example to illustrate the use of RiboCode. Please make sure the path of file is correctly.

  1. Required files

    The genome FASTA file, GTF file for annotation can be downloaded from:

    http://www.gencodegenes.org

    or from:

    http://asia.ensembl.org/info/data/ftp/index.html

    http://useast.ensembl.org/info/data/ftp/index.html

    For example, the required files in this tutorial can be downloaded from following URL:

    GTF: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz

    FASTA: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz

    The raw Ribo-seq FASTQ file can be download by using fastq-dump tool from SRA_Toolkit:

    fastq-dump -A <SRR1630831>
  2. Trimming adapter sequence for ribo-seq data

    Using cutadapt program https://cutadapt.readthedocs.io/en/stable/installation.html

    Example:

    cutadapt -m 20 --match-read-wildcards -a (Adapter sequence) -o <Trimmed fastq file> <Input fastq file>

    Here, the adapter sequences for this data had already been trimmed off, so we can skip this step.

  3. Removing ribosomal RNA(rRNA) derived reads

    Align the trimmed reads to rRNA sequences using Bowtie, then select unaligned reads for the next step.

    Bowtie program http://bowtie-bio.sourceforge.net/index.shtml

    rRNA sequences: We provided a rRNA.fa file in data folder of this package.

    Example:

    bowtie-build <rRNA.fa> rRNA
    bowtie -p 8 -norc --un un_aligned.fastq rRNA -q <SRR1630831.fastq> <HEK293_rRNA.align>
  4. Aligning the clean reads to reference genome

    Using STAR program: https://github.com/alexdobin/STAR

    Example:

    (1). Build index

    STAR --runThreadN 8 --runMode genomeGenerate --genomeDir <hg19_STARindex>
    --genomeFastaFiles <hg19_genome.fa> --sjdbGTFfile <gencode.v19.annotation.gtf>

    (2). Alignment:

    STAR --outFilterType BySJout --runThreadN 8 --outFilterMismatchNmax 2 --genomeDir <hg19_STARindex>
    --readFilesIn <un_aligned.fastq>  --outFileNamePrefix (HEK293) --outSAMtype BAM
    SortedByCoordinate --quantMode TranscriptomeSAM GeneCounts --outFilterMultimapNmax 1
    --outFilterMatchNmin 16
  5. Running RiboCode to identify translated ORFs

    (1). Preparing the transcripts annotation files:

    prepare_transcripts -g <gencode.v19.annotation.gtf> -f <hg19_genome.fa> -o <RiboCode_annot>

    (2). Selecting the length range of the RPF reads and identify the P-site locations:

    metaplots -a <RiboCode_annot> -r <HEK293Aligned.toTranscriptome.out.bam>

    This step will generate a PDF file and a predefined P-site parameters file. The PDF file plots the aggregate profiles of the distance between the 5’-end of reads and the annotated start codons or stop codons. The P-site parameters file defines the read lengths which show strong 3-nt periodicity and the P-site locations for each length, users can modify this file according the plots in PDF file.

    (3). Detecting translated ORFs using the ribosome-profiling data:

    RiboCode -a <RiboCode_annot> -c <config.txt> -l no -o <RiboCode_ORFs_result>

    Users can use or modify the config file generated by last step to specify the information of the bam file and P-site parameters, please refer to the example file in data folder.

    Explanation of final result files

    The RiboCode generates two text files as below: The “(output file name).txt” contains the information of predicted ORFs in each transcript; The “(output file name)_collapsed.txt” file combines the ORFs with the same stop codon in different transcript isoforms: the one harboring the most upstream in-frame ATG is chosen. Some column names of the result file:

    - ORF_ID: The identifier of ORFs that predicated.
    - ORF_type: The type of ORF. The following ORF categories are reported:
    
     "annotated" (overlapping annotated CDS, have the same stop with annnotated CDS)
    
     "uORF" (in upstream of annotated CDS, not overlapping annotated CDS)
    
     "dORF" (in downstream of annotated CDS, not overlapping annotated CDS)
    
     "Overlap_uORF" (in upstream of annotated CDS, overlapping annotated CDS)
    
     "Overlap_dORF" (in downstream of annotated CDS, overlapping annotated CDS"
    
     "Internal" (in internal of annotated CDS, but in a different frame relative annotated CDS)
    
     "novel" (in non-coding genes or non-coding transcripts of coding genes).
    
    - ORF_tstart, ORF_tstop: the beginning and end of ORF in RNA transcript (1-based coordinate)
    - ORF_gstart, ORF_gstop: the beginning and end of ORF in genome (1-based coordinate)
    - pval_frame0_vs_frame1: significance levels of P-site densities of frame0 greater than of frame1
    - pval_frame0_vs_frame2: significance levels of P-site densities of frame0 greater than of frame2
    - pval_combined: integrated P-value

    (4). (optional) plot the P-site densities of predicted ORFs

    Users can plot the density of predicted ORFs using the “parsing_plot_orf_density” command, as example below:

    parsing_plot_orf_density -a <RiboCode_annot> -c <config.txt> -t (transcript_id)
    -s (ORF_gstart) -e (ORF_gstop)

For any questions, please contact:

Zhengtao Xiao (xzt13@mails.tsinghua.edu.cn)

Rongyao Huang (THUhry12@163.com)

Xudong Xing (xudonxing_bioinf@sina.com)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

RiboCode-1.2.3.tar.gz (39.6 kB view details)

Uploaded Source

RiboCode-1.2.3-py2.py3-none-any.whl (29.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file RiboCode-1.2.3.tar.gz.

File metadata

  • Download URL: RiboCode-1.2.3.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for RiboCode-1.2.3.tar.gz
Algorithm Hash digest
SHA256 7f6822d51d4d1c8555387db483f4671a1c307bb9f05ec606e6b3814f6f50ef8c
MD5 7d61d49fdb782b23d78bf93beec4e893
BLAKE2b-256 a31adce81b0e092ef0697d4c542556e3c83a7c263fcda001236e652f5095c829

See more details on using hashes here.

File details

Details for the file RiboCode-1.2.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for RiboCode-1.2.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fb3ea3438054646f1d96232950217bb5e3009f763206df09a224992938bfcfa1
MD5 d254135b17fc084b9a06ccb31f9eb094
BLAKE2b-256 3294226b96da864fdff401c7d3db1fc66e147683476b43e89977b1ae7354581f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page