Skip to main content

RNAformer

Project description

RNAformer

RNAformer is a simple yet effective deep learning model for RNA secondary structure prediction. We describe RNAformer in the preprint RNAformer: A Simple Yet Effective Deep Learning Model for RNA Secondary Structure Prediction and the preceding workshop paper Scalable Deep Learning for RNA Secondary Structure Prediction presented at the 2023 ICML Workshop on Computational Biology.

Abstract

Ribonucleic acid (RNA) is a major biopolymer with key roles as a regulatory molecule in cellular differentiation, gene expression, and various diseases. The prediction of the secondary structure of RNA is a challenging research problem crucial for understanding its functionality and developing RNA-based treatments. Despite the recent success of deep learning in structural biology, applying deep learning to RNA secondary structure prediction remains contentious. A primary concern is the control of homology between training and test data. Moreover, deep learning approaches often incorporate complex multi-model systems, ensemble strategies, or require external data. Here, we present the RNAformer, an attention-based deep learning model designed to predict the secondary structure from a single RNA sequence. Our deep learning model, in combination with a novel data curation pipeline, addresses previously reported caveats and can effectively learn a biophysical model across RNA families. The RNAformer achieves state-of-the-art performance on experimentally derived secondary structures while considering data homologies by training on a family-based split.

Reproduce results

You may install the RNAformer using pip

pip install RNAformer

or directly from the source below.

Clone the repository

git clone https://github.com/automl/RNAformer.git
cd RNAformer

Install virtual environment

The Flash Attention package currently requires a Ampere, Ada, or Hopper GPU (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon.

python3 -m venv venv

source venv/bin/activate

pip install -r requirements.txt
pip install flash-attn==2.3.4
pip install -e .

Alternatively, you may install RNAformer for inference without Flash Attention or a GPU:

python3 -m venv venv

source venv/bin/activate

pip install -r requirements.txt
pip install -e .

Download datasets

bash download_all_datasets.sh

Download pretrained models

bash download_all_models.sh

Reproduce results from the paper

bash run_evaluation.sh

Infer RNAformer for RNA sequence:

An example of a inference, the script outputs position indexes in the adjacency matrix that are predicted to be paired.

python infer_RNAformer.py -c 6 -s GCCCGCAUGGUGAAAUCGGUAAACACAUCGCACUAAUGCGCCGCCUCUGGCUUGCCGGUUCAAGUCCGGCUGCGGGCACCA --state_dict models/RNAformer_32M_state_dict_intra_family_finetuned.pth --config models/RNAformer_32M_config_intra_family_finetuned.yml

Model Checkpoints

Please find here the state dictionaries and configs for the models used in the paper:

RNAformer 32M from the biophysical model experiment:

https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_state_dict_biophysical.pth
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_config_biophysical.yml

RNAformer 32M from the bprna model experiment:

https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_state_dict_bprna.pth
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_config_bprna.yml

RNAformer 32M from the intra family finetuning experiment:

https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_state_dict_intra_family_finetuned.pth
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_config_intra_family_finetuned.yml

RNAformer 32M from the inter family finetuning experiment:

https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_state_dict_inter_family_finetuned.pth
https://ml.informatik.uni-freiburg.de/research-artifacts/RNAformer/models/RNAformer_32M_config_inter_family_finetuned.yml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rnaformer-0.0.1.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

RNAformer-0.0.1-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file rnaformer-0.0.1.tar.gz.

File metadata

  • Download URL: rnaformer-0.0.1.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for rnaformer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 36374aa476ffd0be681e8ad29fec64cb074b6e341db1dc8179fcbd43cdf5da55
MD5 dfecb7f3c7b14bebfeb9794a3782e767
BLAKE2b-256 57191fc85e4b301f39994f6f7e14657e856a6e2efd200f484eeed0e0e5e2e2b0

See more details on using hashes here.

File details

Details for the file RNAformer-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: RNAformer-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for RNAformer-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ec5ad745ec8de67e500cad65c2649b0a9193c5d832bb7cad0f3cfb8df774896c
MD5 0a55f472ef3a207ba9c93008b36c900d
BLAKE2b-256 c53d22ef46ca07c5b688435f8f6055f349b9a7938cc6c3150f09241172da3626

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page