RNAformer
Project description
RNAformer
RNAformer is a simple yet effective deep learning model for RNA secondary structure prediction. We describe RNAformer in the preprint RNAformer: A Simple Yet Effective Deep Learning Model for RNA Secondary Structure Prediction and the preceding workshop paper Scalable Deep Learning for RNA Secondary Structure Prediction presented at the 2023 ICML Workshop on Computational Biology.
Abstract
Ribonucleic acid (RNA) is a major biopolymer with key roles as a regulatory molecule in cellular differentiation, gene expression, and various diseases. The prediction of the secondary structure of RNA is a challenging research problem crucial for understanding its functionality and developing RNA-based treatments. Despite the recent success of deep learning in structural biology, applying deep learning to RNA secondary structure prediction remains contentious. A primary concern is the control of homology between training and test data. Moreover, deep learning approaches often incorporate complex multi-model systems, ensemble strategies, or require external data. Here, we present the RNAformer, an attention-based deep learning model designed to predict the secondary structure from a single RNA sequence. Our deep learning model, in combination with a novel data curation pipeline, addresses previously reported caveats and can effectively learn a biophysical model across RNA families. The RNAformer achieves state-of-the-art performance on experimentally derived secondary structures while considering data homologies by training on a family-based split.
Reproduce results
You may install the RNAformer using pip
pip install RNAformer
or directly from the source below.
Clone the repository
git clone https://github.com/automl/RNAformer.git
cd RNAformer
Install virtual environment
The Flash Attention package currently requires a Ampere, Ada, or Hopper GPU (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install flash-attn==2.3.4
pip install -e .
Alternatively, you may install RNAformer for inference without Flash Attention or a GPU:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .
Download datasets
bash download_all_datasets.sh
Download pretrained models
bash download_all_models.sh
Reproduce results from the paper
bash run_evaluation.sh
Infer RNAformer for RNA sequence:
An example of a inference, the script outputs position indexes in the adjacency matrix that are predicted to be paired.
python infer_RNAformer.py -c 6 -s GCCCGCAUGGUGAAAUCGGUAAACACAUCGCACUAAUGCGCCGCCUCUGGCUUGCCGGUUCAAGUCCGGCUGCGGGCACCA --state_dict models/RNAformer_32M_state_dict_intra_family_finetuned.pth --config models/RNAformer_32M_config_intra_family_finetuned.yml
Model Checkpoints
Please find here the state dictionaries and configs for the models used in the paper:
RNAformer 32M from the biophysical model experiment:
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_state_dict_biophysical.pth
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_config_biophysical.yml
RNAformer 32M from the bprna model experiment:
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_state_dict_bprna.pth
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_config_bprna.yml
RNAformer 32M from the intra family finetuning experiment:
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_state_dict_intra_family_finetuned.pth
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_config_intra_family_finetuned.yml
RNAformer 32M from the inter family finetuning experiment:
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_state_dict_inter_family_finetuned.pth
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_config_inter_family_finetuned.yml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rnaformer-0.0.1.tar.gz
.
File metadata
- Download URL: rnaformer-0.0.1.tar.gz
- Upload date:
- Size: 29.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36374aa476ffd0be681e8ad29fec64cb074b6e341db1dc8179fcbd43cdf5da55 |
|
MD5 | dfecb7f3c7b14bebfeb9794a3782e767 |
|
BLAKE2b-256 | 57191fc85e4b301f39994f6f7e14657e856a6e2efd200f484eeed0e0e5e2e2b0 |
File details
Details for the file RNAformer-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: RNAformer-0.0.1-py3-none-any.whl
- Upload date:
- Size: 33.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec5ad745ec8de67e500cad65c2649b0a9193c5d832bb7cad0f3cfb8df774896c |
|
MD5 | 0a55f472ef3a207ba9c93008b36c900d |
|
BLAKE2b-256 | c53d22ef46ca07c5b688435f8f6055f349b9a7938cc6c3150f09241172da3626 |