RiboCode

A package for identifying the translated ORFs using ribosome-profiling data

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Environment
- Console
Intended Audience
- Science/Research
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

RiboCode is a very simple but high-quality computational algorithm to identify genome-wide translated ORFs using ribosome-profiling data.

Dependencies:

pysam
pyfasta
h5py
Biopython
Numpy
Scipy
matplotlib
HTSeq

Installation

RiboCode can be installed like any other Python packages. Here are some popular ways:

Install via pypi:
pip install ribocode
Install via conda:
conda install -c bioconda ribocode
Install from source:
git clone https://www.github.com/xzt41/RiboCode
cd RiboCode
python setup.py install
Install from local:
pip install RiboCode-*.tar.gz
If you have not administrator permission, you need to install RiboCode locally in you own directory by adding the option --user to installation commands. Then, you need to add ~/.local/bin/ to the PATH variable, and ~/.local/lib/ to the PYTHONPATH variable. For example, if you are using the bash shell, you would do this by adding the following lines to your ~/.bashrc file:
export PATH=$PATH:$HOME/.local/bin/
export PYTHONPATH=$HOME/.local/lib/python2.7
You then need to source your ~/.bashrc file by this command:
source ~/.bashrc

Tutorial to analyze ribosome-profiling data and run RiboCode

Here, we use the HEK293 dataset as an example to illustrate the use of RiboCode and demonstrate typical workflows. Please make sure the path of file is correct.

Required files

The genome FASTA file, GTF file for annotation can be downloaded from:

http://www.gencodegenes.org

or from:

http://asia.ensembl.org/info/data/ftp/index.html

http://useast.ensembl.org/info/data/ftp/index.html

For example, the required files in this tutorial can be downloaded from following URL:

GTF: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz

FASTA: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz

The raw Ribo-seq FASTQ file can be download by using fastq-dump tool from SRA_Toolkit:
```
fastq-dump -A <SRR1630831>
```
Trimming adapter sequence for ribo-seq data

Using cutadapt program https://cutadapt.readthedocs.io/en/stable/installation.html

Example:
```
cutadapt -m 20 --match-read-wildcards -a (Adapter sequence) -o <Trimmed fastq file> <Input fastq file>
```
Here, the adapter sequences for this data had already been trimmed off, so we can skip this step.
Removing ribosomal RNA(rRNA) derived reads

Align the trimmed reads to rRNA sequences using Bowtie, then select unaligned reads for the next step.

Bowtie program http://bowtie-bio.sourceforge.net/index.shtml

rRNA sequences: We provided a rRNA.fa file in data folder of this package.

Example:
```
bowtie-build <rRNA.fa> rRNA
bowtie -p 8 -norc --un <un_aligned.fastq> -q <SRR1630831.fastq> rRNA <HEK293_rRNA.align>
```

Aligning the clean reads to reference genome

Using STAR program: https://github.com/alexdobin/STAR

Example:

(1). Build index

STAR --runThreadN 8 --runMode genomeGenerate --genomeDir <hg19_STARindex>
--genomeFastaFiles <hg19_genome.fa> --sjdbGTFfile <gencode.v19.annotation.gtf>

(2). Alignment:

STAR --outFilterType BySJout --runThreadN 8 --outFilterMismatchNmax 2 --genomeDir <hg19_STARindex>
--readFilesIn <un_aligned.fastq>  --outFileNamePrefix (HEK293) --outSAMtype BAM
SortedByCoordinate --quantMode TranscriptomeSAM GeneCounts --outFilterMultimapNmax 1
--outFilterMatchNmin 16 --alignEndsType EndToEnd

Running RiboCode to identify translated ORFs

(1). Preparing the transcripts annotation files:
```
prepare_transcripts -g <gencode.v19.annotation.gtf> -f <hg19_genome.fa> -o <RiboCode_annot>
```
(2). Selecting the length range of the RPF reads and identify the P-site locations:
```
metaplots -a <RiboCode_annot> -r <HEK293Aligned.toTranscriptome.out.bam>
```
This step will generate a PDF file and a predefined P-site parameters file. The PDF file plots the aggregate profiles of the distance between the 5’-end of reads and the annotated start codons or stop codons. The P-site parameters file defines the read lengths which show strong 3-nt periodicity and the P-site locations for each length, users can modify this file according the plots in PDF file.

(3). Detecting translated ORFs using the ribosome-profiling data:
```
RiboCode -a <RiboCode_annot> -c <config.txt> -l no -o <RiboCode_ORFs_result>
```
Users can use or modify the config file generated by last step to specify the information of the bam file and P-site parameters, please refer to the example file config.txt in data folder.

Explanation of final result files

The RiboCode generates two text files as below: The “(output file name).txt” contains the information of predicted ORFs in each transcript; The “(output file name)_collapsed.txt” file combines the ORFs with the same stop codon in different transcript isoforms: the one harboring the most upstream in-frame ATG is chosen. Some column names of the result file:
```
- ORF_ID: The identifier of ORFs that predicated.
- ORF_type: The type of ORF. The following ORF categories are reported:

 "annotated" (overlapping annotated CDS, have the same stop with annnotated CDS)

 "uORF" (in upstream of annotated CDS, not overlapping annotated CDS)

 "dORF" (in downstream of annotated CDS, not overlapping annotated CDS)

 "Overlap_uORF" (in upstream of annotated CDS, overlapping annotated CDS)

 "Overlap_dORF" (in downstream of annotated CDS, overlapping annotated CDS"

 "Internal" (in internal of annotated CDS, but in a different frame relative annotated CDS)

 "novel" (in non-coding genes or non-coding transcripts of coding genes).

- ORF_tstart, ORF_tstop: the beginning and end of ORF in RNA transcript (1-based coordinate)
- ORF_gstart, ORF_gstop: the beginning and end of ORF in genome (1-based coordinate)
- pval_frame0_vs_frame1: significance levels of P-site densities of frame0 greater than of frame1
- pval_frame0_vs_frame2: significance levels of P-site densities of frame0 greater than of frame2
- pval_combined: integrated P-value
```
(4). (optional) Plotting the densities of P-sites for predicted ORFs

Users can plot the density of P-sites for a ORF using the “parsing_plot_orf_density” command, as example below:
```
parsing_plot_orf_density -a <RiboCode_annot> -c <config.txt> -t (transcript_id)
-s (ORF_gstart) -e (ORF_gstop)
```
The generated PDF plots can be edited by Adobe Illustrator.

(5). (optional) Counting the number of RPF reads aligned to ORFs

The number of reads mapping to each ORF can be obtained by the “ORF_count” command which relying on HTSeq-count package. The first few codons and last few codons of ORF with length longer than a given value can be excluded by adjusting specific parameters. Only the reads of a given length will be counted. For example, the reads with length between 26-34 nt aligned to predicted ORF can be obtained by using below command:
```
ORFcount -g <RiboCode_ORFs_result.gtf> -r <ribo-seq genomic mapping file> -f 15 -l 5 -e 100 -m 26 -M 34 -o <ORF.counts>
```
The reads aligned to first 15 codons and last 5 codons of ORFs with length longer than 100 nt will be excluded.

Recipes (FAQ):

I have a BAM/SAM file aligned to genome, how do I convert it to transcriptome-based mapping file ?

You can use STAR aligner to generate the transcriptome-based alignment file by specifying the “–quantMode TranscriptomeSAM” parameters, or use the “sam-xlate” command from UNC Bioinformatics Utilities .
How to use multiple BAM/SAM files to identify ORFs?

You can select the read lengths which show strong 3-nt periodicity and the corresponding P-site locations for each BAM/SAM file, then list each file and their information in config.txt file. RiboCode will combine the P-site densities at each nucleotides of these BAM/SAM files together to predict ORFs.
Generating figures with matplotlib when DISPLAY variable is undefined or invalid

When running the “metaplots” or “plot_orf_density” command, some users received errors similar to the following:

raise RuntimeError('Invalid DISPLAY variable')

_tkinter.TclError: no display name and no $DISPLAY environment variable

The main problem is that default backend of matplotlib is unavailable. The solution is to modify the backend. A very simple solution is to set the MPLBACKEND environment variable, either for your current shell or for a single script:
```
export MPLBACKEND="module://my_backend"
```
Giving below are non-interactive backends, capable of writing to a file:

Agg PS PDF SVG Cairo GDK

See also:

http://matplotlib.org/faq/usage_faq.html#what-is-a-backend

http://matplotlib.org/users/customizing.html#the-matplotlibrc-file

http://stackoverflow.com/questions/2801882/generating-a-png-with-matplotlib-when-display-is-undefined

For any questions, please contact:

Zhengtao Xiao (xzt13[at]mails.tsinghua.edu.cn)

Rongyao Huang (THUhry12[at]163.com)

Xudong Xing (xudonxing_bioinf[at]sina.com)

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Environment
- Console
Intended Audience
- Science/Research
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

1.2.15

Jul 29, 2022

1.2.14

Mar 6, 2022

1.2.13

Dec 28, 2021

1.2.12

Oct 28, 2021

1.2.11

Dec 20, 2018

1.2.10

Apr 1, 2018

1.2.9

Mar 5, 2018

1.2.8

Feb 28, 2018

1.2.7

Dec 13, 2017

This version

1.2.6

May 19, 2017

1.2.6.dev0 pre-release

May 19, 2017

1.2.5.1

May 12, 2017

1.2.5

May 12, 2017

1.2.5dev pre-release

May 12, 2017

1.2.4

May 4, 2017

1.2.4.dev2 pre-release

May 11, 2017

1.2.4.dev1 pre-release

May 11, 2017

1.2.4.dev0 pre-release

May 11, 2017

1.2.3

Apr 12, 2017

1.2.2

Mar 8, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RiboCode-1.2.6.tar.gz (46.5 kB view details)

Uploaded May 19, 2017 Source

Built Distribution

RiboCode-1.2.6-py2.py3-none-any.whl (36.4 kB view details)

Uploaded May 19, 2017 Python 2 Python 3

File details

Details for the file RiboCode-1.2.6.tar.gz.

File metadata

Download URL: RiboCode-1.2.6.tar.gz
Upload date: May 19, 2017
Size: 46.5 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for RiboCode-1.2.6.tar.gz
Algorithm	Hash digest
SHA256	`4071ad2de071208439bc12f742dd92da9c2aaea3bc788bf02d70870e1ae32769`
MD5	`e05ef843da351c4588e02bad3d5c3d65`
BLAKE2b-256	`c64b902412b7c9683bcb9096e9359d17c6c0538b084a0c3167603ed69278aeb6`

See more details on using hashes here.

File details

Details for the file RiboCode-1.2.6-py2.py3-none-any.whl.

File metadata

Download URL: RiboCode-1.2.6-py2.py3-none-any.whl
Upload date: May 19, 2017
Size: 36.4 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for RiboCode-1.2.6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`460e7bf339c362c5f6dbec20126da4ef9c0ef5c4226ce0b6e8393e024686a01c`
MD5	`91090e450565c3317a7485bfd40c0dde`
BLAKE2b-256	`f8de08f658a091b8d17eac2c605b6253e6508c29d072a0170cc9365396000f86`