An end-to-end deep neural network based on Transformers for pre-miRNA prediction
Project description
miRe2e
This package contains the original methods proposed in:
[1] J. Raad, L. A. Bugnon, D. H. Milone and G. Stegmayer, "miRe2e: a full
end-to-end deep model based on Transformers for prediction
of pre-miRNAs from raw genome-wide data", 2021.
miRe2e is a novel deep learning model based on Transformers that allows finding pre-miRNA sequences in raw genome-wide data. This model is a full end-to-end neural architecture, using only the raw sequences as inputs. This way, there is no need to use other libraries for preprocessing RNA sequences.
The model has 3 stages, as depicted in the figure:
- Structure prediction model: predicts RNA secondary structure using only the input sequence.
- MFE estimation model: estimates the minimum free energy when folding (MFE) the secondary structure.
- Pre-miRNA classifier: uses the input RNA sequence and the outputs of the two previous models to give a score to the input sequence in order to determine if it is a pre-miRNA candidate.
This repository provides a miRe2e model pre-trained with known pre-miRNAs from H. sapiens. It is open sourced and free to use. If you use any of the following, please cite them properly.
An easy to try online demo is available at https://sinc.unl.edu. ar/web-demo/miRe2e/. This demo runs a pre-trained model on small RNA sequences. To use larger datasets, or train your oun model, see the following instructions.
Installation
You need a Python>=3.7 distribution to use this package. You can install the package from PyPI:
pip install miRe2e
depending on your system configuration. You can also clone this repository and install with:
git clone git@github.com:sinc-lab/miRe2e.git
cd miRe2e
pip install .
Using the trained models
When using miRe2e, pre-trained weights will be automatically downloaded. The model receives a fasta file with a raw RNA sequence. The sequence is analyzed with a sliding window, and a pre-miRNA score is assigned to each part.
You can find a complete demonstration of usage in miRe2e usage.
The notebook is also in this repository: miRe2e_usage.ipynb.
Training the models
Training the models may take several hours and requires GPU processing capabilities beyond the ones provided freely by google colab. In the following, there are instructions for training each stage of this model.
Each one of the following steps will train a stage of the model, replacing the current model during the rest of the program. New models are saved as pickle files (*.pkl). These files can be loaded using
from miRe2e import MiRe2e
new_model = MiRe2E(mfe_model_file="trained_mfe_predictor.pkl",
structure_model_file="trained_structure_predictor.pkl",
predictor_model_file="trained_predictor.pkl")
Structure prediction model
To train the Structure prediction model, run:
from miRe2e import MiRe2e
model = MiRe2e(device="cuda")
model.fit_structure("hairpin_examples.fa")
The fasta file should contain sequences of hairpins and it's secondary structure.
MFE estimation model
To train the Structure prediction model, run:
from miRe2e import MiRe2e
model = MiRe2e(device="cuda")
model.fit_mfe("mfe_examples.fa")
The fasta file should contain sequences of pre-miRNAs, hairpin and flats with the target MFE.
Pre-miRNA classifier model
To train the pre-miRNA classifier model, you need at least one set of
positive samples (known pre-miRNA sequences) and a set of negative samples.
Each sample must be a trimmed to 100 nt in length to use the current
model configuration. These should be stored in a single FASTA file, one sample
per row. Furthermore, since the pre-miRNAs have an average length of less
than 100nt, it is necessary to randomly trim negative training sequences
to match the positive distribution. This prevents that training got
biased by
the length of the sequences.
To train this stage, run:
from miRe2e import MiRe2e
model = MiRe2e(device="cuda")
model.fit(pos_fname="positive_examples.fa",
neg_fname="negative_examples.fa")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file miRe2e-0.17.tar.gz
.
File metadata
- Download URL: miRe2e-0.17.tar.gz
- Upload date:
- Size: 30.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47b44daaee628108dce780a4f7b3e2038efe4a9e1d509bc3ce23ad1c283f5e01 |
|
MD5 | 8b0ca0340f292aafe3303ca8a4d35710 |
|
BLAKE2b-256 | bad9281be7a02acfd7286ddf6ca341d7a83a3d3da6ed9cf70ee1d3cc959322ac |
File details
Details for the file miRe2e-0.17-py3-none-any.whl
.
File metadata
- Download URL: miRe2e-0.17-py3-none-any.whl
- Upload date:
- Size: 30.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d94350441191e19a76448b1b2733fe8cecb2a42899f19b81d690a93c906d43c1 |
|
MD5 | e1869a0789e12e8c3e6e8dee053b7384 |
|
BLAKE2b-256 | bc04c643e4ee9c9c682b22a0c57a5f3ce08baa51d69a237f2cb7904a11d52f18 |