rnasamba

A tool for computing the coding potential of RNA transcript sequences using deep learning.

These details have not been verified by PyPI

Project links

Homepage

Project description

Overview
Citation
Documentation
Installation
Download the pre-trained models
Usage
- rnasamba train
- rnasamba classify
Examples
Using the Docker image

Overview

RNAsamba is a tool for computing the coding potential of RNA sequences using a neural network classification model. A description of the algorithm and benchmarks comparing RNAsamba to other tools can be found in our article.

Citation

If you use RNAsamba in your work, please cite our paper:

Camargo, A. P., Sourkov, V., Pereira, G. A. G. & Carazzolle, M. F.. "RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences" NAR Genomics and Bioinformatics 2, lqz024 (2020).

Web version

RNAsamba can be used through a minimal web interface that is freely available online at https://rnasamba.lge.ibi.unicamp.br/. The source code of the web app can be found at https://github.com/apcamargo/rnasamba-webapp/.

Documentation

A complete documentation for RNAsamba can be found at https://apcamargo.github.io/RNAsamba/.

Installation

There are two ways to install RNAsamba:

Using pip:

pip install rnasamba

Using conda:

conda install -c conda-forge -c bioconda rnasamba

Download the pre-trained models

We provide two HDF5 files containing the weights of classification models trained with human trancript sequences. The first model (full_length_weights.hdf5) was trained exclusively with full-length transcripts and can be used in datasets comprised mostly or exclusively of complete transcript sequences. The second model (partial_length_weights.hdf5) was trained with both complete and truncated transcripts and is prefered in cases where there is a significant fraction of partial-length sequences, such as transcriptomes assembled using de novo approaches.

Both models achieves high classification performance in transcripts from a variety of different species (see reference).

You can download the files by executing the following commands:

curl -O https://raw.githubusercontent.com/apcamargo/RNAsamba/master/data/full_length_weights.hdf5
curl -O https://raw.githubusercontent.com/apcamargo/RNAsamba/master/data/partial_length_weights.hdf5

In case you want to train your own model, you can follow the steps shown in the Examples section.

Usage

RNAsamba provides two commands: rnasamba train and rnasamba classify.

`rnasamba train`

rnasamba train is the command for training a new classification model from a training dataset and saving the network weights into an HDF5 file. The user can specify the batch size (--batch_size) and the number of training epochs (--epochs). The user can also choose to activate early stopping (--early_stopping), which reduces training time and can help avoiding overfitting.

usage: rnasamba train [-h] [-s EARLY_STOPPING] [-b BATCH_SIZE] [-e EPOCHS]
                      [-v {0,1,2,3}]
                      output_file coding_file noncoding_file

Train a new classification model.

positional arguments:
  output_file           output HDF5 file containing weights of the newly
                        trained RNAsamba network.
  coding_file           input FASTA file containing sequences of protein-
                        coding transcripts.
  noncoding_file        input FASTA file containing sequences of noncoding
                        transcripts.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -s EARLY_STOPPING, --early_stopping EARLY_STOPPING
                        number of epochs after lowest validation loss before
                        stopping training (a fraction of 0.1 of the training
                        set is set apart for validation and the model with the
                        lowest validation loss will be saved). (default: 0)
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        number of samples per gradient update. (default: 128)
  -e EPOCHS, --epochs EPOCHS
                        number of epochs to train the model. (default: 40)
  -v {0,1,2,3}, --verbose {0,1,2,3}
                        print the progress of the training. 0 = silent, 1 =
                        current step, 2 = progress bar, 3 = one line per
                        epoch. (default: 0)

`rnasamba classify`

rnasamba classify is the command for computing the coding potential of transcripts contained in an input FASTA file and classifying them into coding or non-coding. Optionally, the user can specify an output FASTA file (--protein_fasta) in which RNAsamba will write the translated sequences of the predicted coding ORFs. If multiple weight files are provided, RNAsamba will ensemble their predictions into a single output.

usage: rnasamba classify [-h] [-p PROTEIN_FASTA] [-v {0,1}]
                         output_file fasta_file weights [weights ...]

Classify sequences from a input FASTA file.

positional arguments:
  output_file           output TSV file containing the results of the
                        classification.
  fasta_file            input FASTA file containing transcript sequences.
  weights               input HDF5 file(s) containing weights of a trained
                        RNAsamba network (if more than a file is provided, an
                        ensembling of the models will be performed).

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -p PROTEIN_FASTA, --protein_fasta PROTEIN_FASTA
                        output FASTA file containing translated sequences for
                        the predicted coding ORFs. (default: None)
  -v {0,1}, --verbose {0,1}
                        print the progress of the classification. 0 = silent,
                        1 = current step. (default: 0)

Examples

Training a new classification model using Mus musculus data downloaded from GENCODE:

rnasamba train -v 2 mouse_model.hdf5 gencode.vM21.pc_transcripts.fa gencode.vM21.lncRNA_transcripts.fa

Classifying sequences using our pre-trained model (partial_length_weights.hdf5) and saving the predicted proteins into a FASTA file:

rnasamba classify -p predicted_proteins.fa classification.tsv input.fa partial_length_weights.hdf5
head classification.tsv

sequence_name	coding_score	classification
ENSMUST00000054910	0.99022	coding
ENSMUST00000059648	0.84718	coding
ENSMUST00000055537	0.99713	coding
ENSMUST00000030975	0.85189	coding
ENSMUST00000050754	0.02638	noncoding
ENSMUST00000008011	0.14949	noncoding
ENSMUST00000061643	0.03456	noncoding
ENSMUST00000059704	0.89232	coding
ENSMUST00000036304	0.03782	noncoding

Using the Docker image

docker pull antoniopcamargo/rnasamba

# Training example:
docker run -ti --rm -u $(id -u) -v "$(pwd):/app" antoniopcamargo/rnasamba train -v 2 mouse_model.hdf5 gencode.vM21.pc_transcripts.fa gencode.vM21.lncRNA_transcripts.fa

# Classification example:
docker run -ti --rm -u $(id -u) -v "$(pwd):/app" antoniopcamargo/rnasamba classify -p predicted_proteins.fa classification.tsv input.fa full_length_weights.hdf5

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.5

Apr 6, 2021

0.2.4

Jan 23, 2020

0.2.3

Nov 22, 2019

0.2.2

Nov 21, 2019

0.2.1

Nov 21, 2019

0.2.0

Oct 21, 2019

0.1.6

Oct 2, 2019

0.1.5

Sep 22, 2019

0.1.4

Aug 29, 2019

0.1.3

Aug 29, 2019

0.1.2

Jun 25, 2019

0.1.1

Jun 25, 2019

0.1.0

May 15, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rnasamba-0.2.5.tar.gz (28.7 kB view details)

Uploaded Apr 6, 2021 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rnasamba-0.2.5-cp39-cp39-manylinux2010_x86_64.whl (2.8 MB view details)

Uploaded Apr 6, 2021 CPython 3.9manylinux: glibc 2.12+ x86-64

rnasamba-0.2.5-cp39-cp39-manylinux1_x86_64.whl (2.8 MB view details)

Uploaded Apr 6, 2021 CPython 3.9

rnasamba-0.2.5-cp39-cp39-macosx_10_9_x86_64.whl (1.3 MB view details)

Uploaded Apr 6, 2021 CPython 3.9macOS 10.9+ x86-64

rnasamba-0.2.5-cp38-cp38-manylinux2010_x86_64.whl (2.8 MB view details)

Uploaded Apr 6, 2021 CPython 3.8manylinux: glibc 2.12+ x86-64

rnasamba-0.2.5-cp38-cp38-manylinux1_x86_64.whl (2.8 MB view details)

Uploaded Apr 6, 2021 CPython 3.8

rnasamba-0.2.5-cp38-cp38-macosx_10_9_x86_64.whl (1.3 MB view details)

Uploaded Apr 6, 2021 CPython 3.8macOS 10.9+ x86-64

rnasamba-0.2.5-cp37-cp37m-manylinux2010_x86_64.whl (2.8 MB view details)

Uploaded Apr 6, 2021 CPython 3.7mmanylinux: glibc 2.12+ x86-64

rnasamba-0.2.5-cp37-cp37m-manylinux1_x86_64.whl (2.8 MB view details)

Uploaded Apr 6, 2021 CPython 3.7m

rnasamba-0.2.5-cp37-cp37m-macosx_10_9_x86_64.whl (1.3 MB view details)

Uploaded Apr 6, 2021 CPython 3.7mmacOS 10.9+ x86-64

rnasamba-0.2.5-cp36-cp36m-manylinux2010_x86_64.whl (2.8 MB view details)

Uploaded Apr 6, 2021 CPython 3.6mmanylinux: glibc 2.12+ x86-64

rnasamba-0.2.5-cp36-cp36m-manylinux1_x86_64.whl (2.8 MB view details)

Uploaded Apr 6, 2021 CPython 3.6m

rnasamba-0.2.5-cp36-cp36m-macosx_10_9_x86_64.whl (1.3 MB view details)

Uploaded Apr 6, 2021 CPython 3.6mmacOS 10.9+ x86-64

File details

Details for the file rnasamba-0.2.5.tar.gz.

File metadata

Download URL: rnasamba-0.2.5.tar.gz
Upload date: Apr 6, 2021
Size: 28.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`23bf656e21eb3e1a1052928e19dc59b80bae65190a11d6f3d3f0e7f19e488011`
MD5	`aa086f07520392d38e7afc00d8ecf8b8`
BLAKE2b-256	`afeaf1e28ba1f90681233fc97d58b523751725c8fcd0820a44d6159109d20a22`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp39-cp39-manylinux2010_x86_64.whl
Upload date: Apr 6, 2021
Size: 2.8 MB
Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp39-cp39-manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`239cf81cc469b2821c435b0bbb20e495df5c8efc93f5e4cd18cc6d44511c23ce`
MD5	`08c63fd532ce4e33554e338d1e53347d`
BLAKE2b-256	`73edca6ed1263126e4fb27b8498fde52a954851a042779dc41ba2a938b3e70cb`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp39-cp39-manylinux1_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp39-cp39-manylinux1_x86_64.whl
Upload date: Apr 6, 2021
Size: 2.8 MB
Tags: CPython 3.9
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp39-cp39-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`349010063276c766b18ac1a282f7f4978bfa6e37030b48a06c54a7b8d26bee42`
MD5	`372975af8d3ce674fa14d61fb99b63df`
BLAKE2b-256	`3e45136f3aab418207883c39f6c222e68761e47c707030f0a755165688b92ff6`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp39-cp39-macosx_10_9_x86_64.whl
Upload date: Apr 6, 2021
Size: 1.3 MB
Tags: CPython 3.9, macOS 10.9+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`5b2dbb9e27660bcae0a996ab255f6a50c662c47cd112c6706a6964937541e2bc`
MD5	`cc4cb3ddbb523ab1bb0f9e2b6d129b2c`
BLAKE2b-256	`04e80be4637013fc62abf97bad6d0368a29977f21ed9ff40997f3b873ae4091c`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp38-cp38-manylinux2010_x86_64.whl
Upload date: Apr 6, 2021
Size: 2.8 MB
Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp38-cp38-manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`3f56a1279053986f720d13cf8b2836b03807a09eb1438bf0ae59342fac4d232a`
MD5	`bf260e63735dbd641819631fa9e5846c`
BLAKE2b-256	`e49c01e8b7acb12026ae87448da9748c300023ad71f1f4a4903add01b2d794bb`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp38-cp38-manylinux1_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp38-cp38-manylinux1_x86_64.whl
Upload date: Apr 6, 2021
Size: 2.8 MB
Tags: CPython 3.8
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp38-cp38-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`03eb273ebf5c4bdcf40457d4b043d38f82e0c513182b668fde8d749d1e5e0a27`
MD5	`c50f4113a952ce4a6c9165447ac0f1b8`
BLAKE2b-256	`c19a491d401f7b3f373dd1528c3f2b94b49d21044fbc7a9f5e8a58fc20b43a05`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp38-cp38-macosx_10_9_x86_64.whl
Upload date: Apr 6, 2021
Size: 1.3 MB
Tags: CPython 3.8, macOS 10.9+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`12b5fc7f5a0c7ba0be5844a24df6fa99d7c1a1a9c60bd468c8f4e6338e2dec9f`
MD5	`d9b5f71c279cb940eb6422f3a2a8eb0b`
BLAKE2b-256	`6d9b8339fbde75e667dd67bb98712e1f428dd9bda0a8df824ee8abb3307cd13f`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp37-cp37m-manylinux2010_x86_64.whl
Upload date: Apr 6, 2021
Size: 2.8 MB
Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`65143099255c602aa2756889e243e5bdeb6e551515336303f0c07511857ed1e5`
MD5	`0b2d1ce0b9929fad69c46e566c72206a`
BLAKE2b-256	`1c1693d56e1acfd20663762a779090f41aae167d02d110bf55e70335729546ac`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp37-cp37m-manylinux1_x86_64.whl
Upload date: Apr 6, 2021
Size: 2.8 MB
Tags: CPython 3.7m
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp37-cp37m-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`729f190997c840e5d47a867980a42382daffd8042d5f67e57e0a5f9cd83ef551`
MD5	`e132ae4ff2638be54c3efcc53ef1cdce`
BLAKE2b-256	`0985acc58929167809abfcce16fc229c57b9add41a71b22b4a46e67fe8e0ca97`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp37-cp37m-macosx_10_9_x86_64.whl
Upload date: Apr 6, 2021
Size: 1.3 MB
Tags: CPython 3.7m, macOS 10.9+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`63ac3464ed024d823867d62792fbd2e6137b461d326aac0b2a914510d4d5fd40`
MD5	`324bc82646840909a96428c7e8d4ad80`
BLAKE2b-256	`830828d61ea0ce25a10d94ef61547c77d456aababa583284ce6c1c2d98b07376`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp36-cp36m-manylinux2010_x86_64.whl
Upload date: Apr 6, 2021
Size: 2.8 MB
Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`859ca9a76417ee2d219884e2b7357edf76b49398d065da45f8cf69a7191e916f`
MD5	`b9a4d22180e3e80fa4d07bbd46daa374`
BLAKE2b-256	`359d396f4469046baeab9290cac5bfb8c7ad00c7b6c277ed0446fbb57add204e`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp36-cp36m-manylinux1_x86_64.whl
Upload date: Apr 6, 2021
Size: 2.8 MB
Tags: CPython 3.6m
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp36-cp36m-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`6e9116022629996bef68fb6d5457201f7d96cbd511edab1b28cf24ecbc6d677f`
MD5	`a1799e5ce30d5769e7a132c408d31e73`
BLAKE2b-256	`bdddb381c857709e7ec6063a8c1cc392fb1b1e2f4dde8c08437f7bcb7a0c8e12`

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

Download URL: rnasamba-0.2.5-cp36-cp36m-macosx_10_9_x86_64.whl
Upload date: Apr 6, 2021
Size: 1.3 MB
Tags: CPython 3.6m, macOS 10.9+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`e301d0a2e8278637becc07b62b36f83d1f44506dc59611d34f75089c1ad0353a`
MD5	`726dd7802345fae8474d235331453f81`
BLAKE2b-256	`c61934baa32dc3c33ecb263f940d30f7259371ddf8f0104855e5e903dc4e3f63`

See more details on using hashes here.

rnasamba 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Citation

Web version

Documentation

Installation

Download the pre-trained models

Usage

rnasamba train

rnasamba classify

Examples

Using the Docker image

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

`rnasamba train`

`rnasamba classify`