Skip to main content

A library to build our DMS signal and RNAstructure prediction models.

Project description

eFold

This repo contains the pytorch code for our paper “Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction”

[BioRXiv] [Data]

Install

pip install efold

Inference mode

Using the command line

From a sequence:

efold AAACAUGAGGAUUACCCAUGU -o seq.txt
cat seq.txt

AAACAUGAGGAUUACCCAUGU
..(((((.((....)))))))

or a fasta file:

efold --fasta example.fasta

Using different formats:

efold AAACAUGAGGAUUACCCAUGU -bp # base pairs
efold AAACAUGAGGAUUACCCAUGU -db # dotbracket (default)

Output can be .json, .csv or .txt

efold AAACAUGAGGAUUACCCAUGU -o output.csv

Run help:

efold -h

Using python

>>> from efold import inference
>>> inference('AACUGUGCUA', fmt='dotbracket')
..(((((.((....)))))))

File structure

efold/
    api/    # for inference calls
    core/   # backend 
    models/ # where we define eFold and other models
    resources/
        efold_weights.py # our best model weights
scripts/
    efold_training.py # our training script
    [...]
LICENSE
requirements.txt
pyproject.toml

Data

List of the datasets we used

A breakdown of the data we used is summarized here. All the data is stored on the HuggingFace.

Get the data

You can download our datasets using rouskinHF:

pip install rouskinhf

And in your code, write:

>>> import rouskinhf
>>> data = rouskinhf.get_dataset('ribo500-blast') # look at the dataset names on huggingface

Reproducing our results

Run the training script:

git clone https://github.com/rouskinlab/eFold
python eFold/scripts/efold_training.py

Citation

Plain text:

Albéric A. de Lajarte, Yves J. Martin des Taillades, Colin Kalicki, Federico Fuchs Wightman, Justin Aruda, Dragui Salazar, Matthew F. Allan, Casper L’Esperance-Kerckhoff, Alex Kashi, Fabrice Jossinet, Silvi Rouskin. “Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction”. bioRxiv 2024.01.24.577093; doi: https://doi.org/10.1101/2024.01.24.577093. 2024

BibTex:

@article {Lajarte_Martin_2024,
	title = {Diverse Database and Machine Learning Model to narrow the generalization gap in RNA structure prediction},
	author = {Alb{\'e}ric A. de Lajarte and Yves J. Martin des Taillades and Colin Kalicki and Federico Fuchs Wightman and Justin Aruda and Dragui Salazar and Matthew F. Allan and Casper L{\textquoteright}Esperance-Kerckhoff and Alex Kashi and Fabrice Jossinet and Silvi Rouskin},
	year = {2024},
	doi = {10.1101/2024.01.24.577093},
	URL = {https://www.biorxiv.org/content/early/2024/01/25/2024.01.24.577093},
	journal = {bioRxiv}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

efold-0.1.2.tar.gz (10.3 MB view hashes)

Uploaded Source

Built Distribution

efold-0.1.2-py3-none-any.whl (10.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page