Skip to main content

A tool for computing the coding potential of RNA transcript sequences using deep learning.

Project description

DOI PyPI Conda PyPI downloads Conda downloads Docker pulls

Overview

RNAsamba is a tool for computing the coding potential of RNA sequences using a neural network classification model. A description of the algorithm and benchmarks comparing RNAsamba to other tools can be found in our article.

Citation

If you use RNAsamba in your work, please cite our paper:

Camargo, A. P., Sourkov, V., Pereira, G. A. G. & Carazzolle, M. F.. "RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences" NAR Genomics and Bioinformatics 2, lqz024 (2020).

Web version

RNAsamba can be used through a minimal web interface that is freely available online at https://rnasamba.lge.ibi.unicamp.br/. The source code of the web app can be found at https://github.com/apcamargo/rnasamba-webapp/.

Documentation

A complete documentation for RNAsamba can be found at https://apcamargo.github.io/RNAsamba/.

Installation

There are two ways to install RNAsamba:

  • Using pip:
pip install rnasamba
  • Using conda:
conda install -c conda-forge -c bioconda rnasamba

Download the pre-trained models

We provide two HDF5 files containing the weights of classification models trained with human trancript sequences. The first model (full_length_weights.hdf5) was trained exclusively with full-length transcripts and can be used in datasets comprised mostly or exclusively of complete transcript sequences. The second model (partial_length_weights.hdf5) was trained with both complete and truncated transcripts and is prefered in cases where there is a significant fraction of partial-length sequences, such as transcriptomes assembled using de novo approaches.

Both models achieves high classification performance in transcripts from a variety of different species (see reference).

You can download the files by executing the following commands:

curl -O https://raw.githubusercontent.com/apcamargo/RNAsamba/master/data/full_length_weights.hdf5
curl -O https://raw.githubusercontent.com/apcamargo/RNAsamba/master/data/partial_length_weights.hdf5

In case you want to train your own model, you can follow the steps shown in the Examples section.

Usage

RNAsamba provides two commands: rnasamba train and rnasamba classify.

rnasamba train

rnasamba train is the command for training a new classification model from a training dataset and saving the network weights into an HDF5 file. The user can specify the batch size (--batch_size) and the number of training epochs (--epochs). The user can also choose to activate early stopping (--early_stopping), which reduces training time and can help avoiding overfitting.

usage: rnasamba train [-h] [-s EARLY_STOPPING] [-b BATCH_SIZE] [-e EPOCHS]
                      [-v {0,1,2,3}]
                      output_file coding_file noncoding_file

Train a new classification model.

positional arguments:
  output_file           output HDF5 file containing weights of the newly
                        trained RNAsamba network.
  coding_file           input FASTA file containing sequences of protein-
                        coding transcripts.
  noncoding_file        input FASTA file containing sequences of noncoding
                        transcripts.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -s EARLY_STOPPING, --early_stopping EARLY_STOPPING
                        number of epochs after lowest validation loss before
                        stopping training (a fraction of 0.1 of the training
                        set is set apart for validation and the model with the
                        lowest validation loss will be saved). (default: 0)
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        number of samples per gradient update. (default: 128)
  -e EPOCHS, --epochs EPOCHS
                        number of epochs to train the model. (default: 40)
  -v {0,1,2,3}, --verbose {0,1,2,3}
                        print the progress of the training. 0 = silent, 1 =
                        current step, 2 = progress bar, 3 = one line per
                        epoch. (default: 0)

rnasamba classify

rnasamba classify is the command for computing the coding potential of transcripts contained in an input FASTA file and classifying them into coding or non-coding. Optionally, the user can specify an output FASTA file (--protein_fasta) in which RNAsamba will write the translated sequences of the predicted coding ORFs. If multiple weight files are provided, RNAsamba will ensemble their predictions into a single output.

usage: rnasamba classify [-h] [-p PROTEIN_FASTA] [-v {0,1}]
                         output_file fasta_file weights [weights ...]

Classify sequences from a input FASTA file.

positional arguments:
  output_file           output TSV file containing the results of the
                        classification.
  fasta_file            input FASTA file containing transcript sequences.
  weights               input HDF5 file(s) containing weights of a trained
                        RNAsamba network (if more than a file is provided, an
                        ensembling of the models will be performed).

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -p PROTEIN_FASTA, --protein_fasta PROTEIN_FASTA
                        output FASTA file containing translated sequences for
                        the predicted coding ORFs. (default: None)
  -v {0,1}, --verbose {0,1}
                        print the progress of the classification. 0 = silent,
                        1 = current step. (default: 0)

Examples

  • Training a new classification model using Mus musculus data downloaded from GENCODE:
rnasamba train -v 2 mouse_model.hdf5 gencode.vM21.pc_transcripts.fa gencode.vM21.lncRNA_transcripts.fa
  • Classifying sequences using our pre-trained model (partial_length_weights.hdf5) and saving the predicted proteins into a FASTA file:
rnasamba classify -p predicted_proteins.fa classification.tsv input.fa partial_length_weights.hdf5
head classification.tsv

sequence_name	coding_score	classification
ENSMUST00000054910	0.99022	coding
ENSMUST00000059648	0.84718	coding
ENSMUST00000055537	0.99713	coding
ENSMUST00000030975	0.85189	coding
ENSMUST00000050754	0.02638	noncoding
ENSMUST00000008011	0.14949	noncoding
ENSMUST00000061643	0.03456	noncoding
ENSMUST00000059704	0.89232	coding
ENSMUST00000036304	0.03782	noncoding

Using the Docker image

docker pull antoniopcamargo/rnasamba

# Training example:
docker run -ti --rm -u $(id -u) -v "$(pwd):/app" antoniopcamargo/rnasamba train -v 2 mouse_model.hdf5 gencode.vM21.pc_transcripts.fa gencode.vM21.lncRNA_transcripts.fa

# Classification example:
docker run -ti --rm -u $(id -u) -v "$(pwd):/app" antoniopcamargo/rnasamba classify -p predicted_proteins.fa classification.tsv input.fa full_length_weights.hdf5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rnasamba-0.2.5.tar.gz (28.7 kB view details)

Uploaded Source

Built Distributions

rnasamba-0.2.5-cp39-cp39-manylinux2010_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

rnasamba-0.2.5-cp39-cp39-manylinux1_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.9

rnasamba-0.2.5-cp39-cp39-macosx_10_9_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

rnasamba-0.2.5-cp38-cp38-manylinux2010_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

rnasamba-0.2.5-cp38-cp38-manylinux1_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.8

rnasamba-0.2.5-cp38-cp38-macosx_10_9_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

rnasamba-0.2.5-cp37-cp37m-manylinux2010_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

rnasamba-0.2.5-cp37-cp37m-manylinux1_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.7m

rnasamba-0.2.5-cp37-cp37m-macosx_10_9_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

rnasamba-0.2.5-cp36-cp36m-manylinux2010_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

rnasamba-0.2.5-cp36-cp36m-manylinux1_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.6m

rnasamba-0.2.5-cp36-cp36m-macosx_10_9_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file rnasamba-0.2.5.tar.gz.

File metadata

  • Download URL: rnasamba-0.2.5.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5.tar.gz
Algorithm Hash digest
SHA256 23bf656e21eb3e1a1052928e19dc59b80bae65190a11d6f3d3f0e7f19e488011
MD5 aa086f07520392d38e7afc00d8ecf8b8
BLAKE2b-256 afeaf1e28ba1f90681233fc97d58b523751725c8fcd0820a44d6159109d20a22

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp39-cp39-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 239cf81cc469b2821c435b0bbb20e495df5c8efc93f5e4cd18cc6d44511c23ce
MD5 08c63fd532ce4e33554e338d1e53347d
BLAKE2b-256 73edca6ed1263126e4fb27b8498fde52a954851a042779dc41ba2a938b3e70cb

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 349010063276c766b18ac1a282f7f4978bfa6e37030b48a06c54a7b8d26bee42
MD5 372975af8d3ce674fa14d61fb99b63df
BLAKE2b-256 3e45136f3aab418207883c39f6c222e68761e47c707030f0a755165688b92ff6

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5b2dbb9e27660bcae0a996ab255f6a50c662c47cd112c6706a6964937541e2bc
MD5 cc4cb3ddbb523ab1bb0f9e2b6d129b2c
BLAKE2b-256 04e80be4637013fc62abf97bad6d0368a29977f21ed9ff40997f3b873ae4091c

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3f56a1279053986f720d13cf8b2836b03807a09eb1438bf0ae59342fac4d232a
MD5 bf260e63735dbd641819631fa9e5846c
BLAKE2b-256 e49c01e8b7acb12026ae87448da9748c300023ad71f1f4a4903add01b2d794bb

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 03eb273ebf5c4bdcf40457d4b043d38f82e0c513182b668fde8d749d1e5e0a27
MD5 c50f4113a952ce4a6c9165447ac0f1b8
BLAKE2b-256 c19a491d401f7b3f373dd1528c3f2b94b49d21044fbc7a9f5e8a58fc20b43a05

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 12b5fc7f5a0c7ba0be5844a24df6fa99d7c1a1a9c60bd468c8f4e6338e2dec9f
MD5 d9b5f71c279cb940eb6422f3a2a8eb0b
BLAKE2b-256 6d9b8339fbde75e667dd67bb98712e1f428dd9bda0a8df824ee8abb3307cd13f

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 65143099255c602aa2756889e243e5bdeb6e551515336303f0c07511857ed1e5
MD5 0b2d1ce0b9929fad69c46e566c72206a
BLAKE2b-256 1c1693d56e1acfd20663762a779090f41aae167d02d110bf55e70335729546ac

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 729f190997c840e5d47a867980a42382daffd8042d5f67e57e0a5f9cd83ef551
MD5 e132ae4ff2638be54c3efcc53ef1cdce
BLAKE2b-256 0985acc58929167809abfcce16fc229c57b9add41a71b22b4a46e67fe8e0ca97

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 63ac3464ed024d823867d62792fbd2e6137b461d326aac0b2a914510d4d5fd40
MD5 324bc82646840909a96428c7e8d4ad80
BLAKE2b-256 830828d61ea0ce25a10d94ef61547c77d456aababa583284ce6c1c2d98b07376

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 859ca9a76417ee2d219884e2b7357edf76b49398d065da45f8cf69a7191e916f
MD5 b9a4d22180e3e80fa4d07bbd46daa374
BLAKE2b-256 359d396f4469046baeab9290cac5bfb8c7ad00c7b6c277ed0446fbb57add204e

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6e9116022629996bef68fb6d5457201f7d96cbd511edab1b28cf24ecbc6d677f
MD5 a1799e5ce30d5769e7a132c408d31e73
BLAKE2b-256 bdddb381c857709e7ec6063a8c1cc392fb1b1e2f4dde8c08437f7bcb7a0c8e12

See more details on using hashes here.

File details

Details for the file rnasamba-0.2.5-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: rnasamba-0.2.5-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for rnasamba-0.2.5-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e301d0a2e8278637becc07b62b36f83d1f44506dc59611d34f75089c1ad0353a
MD5 726dd7802345fae8474d235331453f81
BLAKE2b-256 c61934baa32dc3c33ecb263f940d30f7259371ddf8f0104855e5e903dc4e3f63

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page