Skip to main content

An end-to-end deep neural network based on Transformers for pre-miRNA prediction

Project description

miRe2e

This package contains the original methods proposed in:

[1] J. Raad, L. A. Bugnon, D. H. Milone and G. Stegmayer, "miRe2e: a full
end-to-end deep model based on  Transformers for prediction
of pre-miRNAs from raw genome-wide data", 2021.

miRe2e is a novel deep learning model based on Transformers that allows finding pre-miRNA sequences in raw genome-wide data. This model is a full end-to-end neural architecture, using only the raw sequences as inputs. This way, there is no need to use other libraries for preprocessing RNA sequences.

The model has 3 stages, as depicted in the figure:

  1. Structure prediction model: predicts RNA secondary structure using only the input sequence.
  2. MFE estimation model: estimates the minimum free energy when folding (MFE) the secondary structure.
  3. Pre-miRNA classifier: uses the input RNA sequence and the outputs of the two previous models to give a score to the input sequence in order to determine if it is a pre-miRNA candidate.

Abstract

This repository provides a miRe2e model pre-trained with known pre-miRNAs from H. sapiens. It is open sourced and free to use. If you use any of the following, please cite them properly.

An easy to try online demo is available at https://sinc.unl.edu. ar/web-demo/miRe2e/. This demo runs a pre-trained model on small RNA sequences. To use larger datasets, or train your oun model, see the following instructions.

Installation

You need a Python>=3.7 distribution to use this package. You can install the package from PyPI:

pip install miRe2e

depending on your system configuration. You can also clone this repository and install with:

git clone git@github.com:sinc-lab/miRe2e.git
cd miRe2e
pip install .

Using the trained models

When using miRe2e, pre-trained weights will be automatically downloaded. The model receives a fasta file with a raw RNA sequence. The sequence is analyzed with a sliding window, and a pre-miRNA score is assigned to each part.

You can find a complete demonstration of usage in miRe2e usage.

The notebook is also in this repository: miRe2e_usage.ipynb.

Training the models

Training the models may take several hours and requires GPU processing capabilities beyond the ones provided freely by google colab. In the following, there are instructions for training each stage of this model.

Each one of the following steps will train a stage of the model, replacing the current model during the rest of the program. New models are saved as pickle files (*.pkl). These files can be loaded using

from miRe2e import MiRe2e
new_model = MiRe2E(mfe_model_file="trained_mfe_predictor.pkl",
                   structure_model_file="trained_structure_predictor.pkl",
                   predictor_model_file="trained_predictor.pkl")

Structure prediction model

To train the Structure prediction model, run:

from miRe2e import MiRe2e
model = MiRe2e(device="cuda")
model.fit_structure("hairpin_examples.fa")

The fasta file should contain sequences of hairpins and it's secondary structure.

MFE estimation model

To train the Structure prediction model, run:

from miRe2e import MiRe2e
model = MiRe2e(device="cuda")
model.fit_mfe("mfe_examples.fa")

The fasta file should contain sequences of pre-miRNAs, hairpin and flats with the target MFE.

Pre-miRNA classifier model

To train the pre-miRNA classifier model, you need at least one set of positive samples (known pre-miRNA sequences) and a set of negative samples. Each sample must be a trimmed to 100 nt in length to use the current model configuration. These should be stored in a single FASTA file, one sample per row. Furthermore, since the pre-miRNAs have an average length of less than 100nt, it is necessary to randomly trim negative training sequences to match the positive distribution. This prevents that training got
biased by the length of the sequences.

To train this stage, run:

from miRe2e import MiRe2e
model = MiRe2e(device="cuda")
model.fit(pos_fname="positive_examples.fa", 
          neg_fname="negative_examples.fa")

Project details


Release history Release notifications | RSS feed

This version

0.17

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miRe2e-0.17.tar.gz (30.3 kB view details)

Uploaded Source

Built Distribution

miRe2e-0.17-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file miRe2e-0.17.tar.gz.

File metadata

  • Download URL: miRe2e-0.17.tar.gz
  • Upload date:
  • Size: 30.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for miRe2e-0.17.tar.gz
Algorithm Hash digest
SHA256 47b44daaee628108dce780a4f7b3e2038efe4a9e1d509bc3ce23ad1c283f5e01
MD5 8b0ca0340f292aafe3303ca8a4d35710
BLAKE2b-256 bad9281be7a02acfd7286ddf6ca341d7a83a3d3da6ed9cf70ee1d3cc959322ac

See more details on using hashes here.

File details

Details for the file miRe2e-0.17-py3-none-any.whl.

File metadata

  • Download URL: miRe2e-0.17-py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for miRe2e-0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 d94350441191e19a76448b1b2733fe8cecb2a42899f19b81d690a93c906d43c1
MD5 e1869a0789e12e8c3e6e8dee053b7384
BLAKE2b-256 bc04c643e4ee9c9c682b22a0c57a5f3ce08baa51d69a237f2cb7904a11d52f18

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page