Skip to main content

A a tool to predict RNA secondary structure and minimum free energy

Project description

This codebase replaces the now deprecated version: https://github.com/abentu0101/LinearFold. This version fixes many bugs and design problems in the old version.

LinearFold: Linear-Time Prediction for RNA Secondary Structures

This repository contains the C++ source code for the LinearFold project, the first linear-time prediction algorithm/software for RNA secondary structures.

LinearFold: Linear-Time Approximate RNA Folding by 5’-to-3’ Dynamic Programming and Beam Search. Bioinformatics, Volume 35, Issue 14, July 2019, Pages i295–i304. ISMB 2019

Liang Huang*, He Zhang**, Dezhong Deng**, Kai Zhao, Kaibo Liu, David Hendrix, David Mathews

* corresponding author

** contributed equally

Web server: http://linearfold.org

Dependencies

GCC 4.8.5 or above; python2.7

To Compile

make

To Run

The LinearFold parser can be run with:

echo SEQUENCE | ./linearfold [OPTIONS]

OR

cat SEQ_OR_FASTA_FILE | ./linearfold [OPTIONS]

Both FASTA format and pure-sequence format are supported for input.

Run as a python module available through pypi/pip

To install:

pip install LinearFold

To import and use in other python code:

import LinearFold as lf
print(lf.fold('UGAGUUCUCGAUCUCUAAAAUCG'))

And the output is a tuple containing the predicted structure and mfe:

['.(((........)))........', -1.8]

OPTIONS:

-b BEAM_SIZE

The beam size (default 100). Use 0 for infinite beam.

-V

Switches LinearFold-C (by default) to LinearFold-V.

--verbose

Prints out energy of each loop in the structure. (default False)

--sharpturn

Enable sharpturn in prediction. (default False)

--eval

Enable eval mode, which can calculate free energy for a given structure of a sequence. (default False)

--constraints

Enable adding specific constraints in prediction (default False). The constraint sequence should have the same length as the RNA sequence. "? . ( )" indicates a position for which the proper matching is unknown, unpaired, left or right parenthesis respectively. The parentheses must be well-banlanced and non-crossing.

--zuker

output Zuker suboptimal structures, (DEFAULT=FALSE)

--delta

compute Zuker suboptimal structures with scores or energies(-V, kcal/mol) in a centain range of the optimum, (DEFAULT=5.0)

--shape <filename>

use SHAPE reactivity data to guide structure predictions
Please refer to this link for the SHAPE data format: https://rna.urmc.rochester.edu/Text/File_Formats.html#SHAPE

To Visualize

LinearFold is able to visualize the structure using a circular plot.

To draw a circular plot, run command:

cat TARGET_FILE | ./draw_circular_plot 

TARGET_FILE contains one sequence and its structure; see "ecoli_tRNA" file as an example.

Example Run Predict

cat testseq | ./linearfold
UGAGUUCUCGAUCUCUAAAAUCG
....................... (-0.22)
AAAACGGUCCUUAUCAGGACCAAACA
.....((((((....))))))..... (4.91)
AUUCUUGCUUCAACAGUGUUUGAACGGAAU
.............................. (-0.29)
UCGGCCACAAACACACAAUCUACUGUUGGUCGA
(((((((...................))))))) (0.99)
GUUUUUAUCUUACACACGCUUGUGUAAGAUAGUUA
.....(((((((((((....))))))))))).... (6.66)

echo GGGCUCGUAGAUCAGCGGUAGAUCGCUUCCUUCGCAAGGAAGCCCUGGGUUCAAAUCCCAGCGAGUCCACCA | ./linearfold -V -b 20
GGGCUCGUAGAUCAGCGGUAGAUCGCUUCCUUCGCAAGGAAGCCCUGGGUUCAAAUCCCAGCGAGUCCACCA
(((((((..((((.......))))((((((((...)))))))).(((((.......)))))))))))).... (-31.50)

Example Run Predict with constraints

cat testcons | ./linearfold --constraints
AACUCCGCCAGGCCUGGAAGGGAGCAACGGUAGUGACACUCUCUGUGUGCGUAGGUUGCCUAGCUACCAUUU
??(???(??????)?(????????)???(??????(???????)?)???????????)??.???????????
..(.(((......)((........))(((......(.......).))).....))..).............. (-27.33)
GCCUGGUGACCAUAGCGAGUCGGUACCACCCCUUCCCAUCCCGAACAGGACCGUGAAACGACUCCGCGCCGAUGAUAGUGCGGAUUCCCGUGUGAAAGUAGGUCAUCGCCAGGC
??(??(???(??????)???????????????(????)???(???????????(??(????.)??????????(??????)?)??????)????????????)????????)??
(((((((..(......).........(((...(....)...((....(((...(..(.....)..........(......).)..))))).))).......))))......))) (-44.00)

echo -e "GAACCCCGUCAGGUCCGGAAGGAAGCAGCGGUAAGU\n??????????????????(????????????????)" | ./linearfold --constraints
GAACCCCGUCAGGUCCGGAAGGAAGCAGCGGUAAGU
??????????????????(????????????????)
..................(................) (-8.85)

echo -e "GAACCCCGUCAGGUCCGGAAGGAAGCAGCGGUAAGU\n??????????????????(????????????????)" | ./linearfold --constraints -V
GAACCCCGUCAGGUCCGGAAGGAAGCAGCGGUAAGU
??????????????????(????????????????)
.....(((.......)))(.....((....))...) (3.70)

Example Run Predict and output suboptimal structures

echo GCCUGGUGACCAUAGCGAGUCGGUACCACCCCUUCCCAUCCCGAACAGGACCGUGAAACGACUCCGCGCCGAUGAUAGUGCGGAUUCCCGUGUGAAAGUAGGUCAUCGCCAGGC | ./linearfold -V --zuker --delta 2.0
(((((((((.....((((((((....(((.(((((.......))..)))...)))...)))))).))(((.........((((....)))).........)))..))))))))) (-35.50)
Zuker suboptimal structures...
(((((((((.....((((((((....(((.(((((.......))..)))...)))...)))))).))(((.........((((....)))).........)))..))))))))) (-35.50)
(((((((((.....((((((((....(((.(((((.......))..)))...)))...)))))).))(((.((......((((....))))......)).)))..))))))))) (-35.40)
(((((((((.....((((((((....(((....(((...........)))..)))...)))))).))(((.........((((....)))).........)))..))))))))) (-34.90)
(((((((((.....((((((((....(((.(((((.......))..)))...)))...)))))).))(((.((..(..(((((....)))))..)..)).)))..))))))))) (-34.70)
(((((((((((.....((((((....(((.(((((.......))..)))...)))...))))))(((((........)))))..................))))...))))))) (-34.50)
(((((((((((.....((((((....(((.(((((.......))..)))...)))...))))))(((((........)))))..(((......)))....))))...))))))) (-34.40)
(((((((((.......((((((....(((.(((((.......))..)))...)))...))))))(((((........)))))((((..(........)..)))).))))))))) (-34.20)
(((((((((((...((((((((....(((.(((((.......))..)))...)))...)))))((((((........)))))).....))).........))))...))))))) (-34.00)
(((((((((((...((((((((....(((.(((((.......))..)))...)))...))))))(((((........)))))...............)).))))...))))))) (-33.90)
(((((((((.....((((((((.(((..((..(((.......)))..))...)))...)))))).))(((.........((((....)))).........)))..))))))))) (-33.70)
(((((((((.....((((((((....(((.........(((......)))..)))...)))))).))(((.........((((....)))).........)))..))))))))) (-33.70)
(((((((((.......((((((....(((.(((((.......))..)))...)))...))))))(((((........))))).....((...........))...))))))))) (-33.60)

Example Run SHAPE-guided structure prediction

echo GCCUGGUGACCAUAGCGAGUCGGUACCACCCCUUCCCAUCCCGAACAGGACCGUGAAACGACUCCGCGCCGAUGAUAGUGCGGAUUCCCGUGUGAAAGUAGGUCAUCGCCAGGC | ./linearfold -V --shape example.shape
GCCUGGUGACCAUAGCGAGUCGGUACCACCCCUUCCCAUCCCGAACAGGACCGUGAAACGACUCCGCGCCGAUGAUAGUGCGGAUUCCCGUGUGAAAGUAGGUCAUCGCCAGGC
((((((........((((((((.(((............(((......)))..)))...))))).)))..(((((((..(((...(((......))).))).))))))))))))) (-66.30)

Example Run Eval

cat testeval | ./linearfold --eval
UGAGUUCUCGAUCUCUAAAAUCG
.(((........)))........ (-1.80)
AAAACGGUCCUUAUCAGGACCAAACA
.....((((((....))))))..... (-9.30)
AUUCUUGCUUCAACAGUGUUUGAACGGAAU
(((((...(((((......))))).))))) (-6.80)
UCGGCCACAAACACACAAUCUACUGUUGGUCGA
(((((((((..............))).)))))) (-7.80)
GUUUUUAUCUUACACACGCUUGUGUAAGAUAGUUA
....((((((((((((....))))))))))))... (-13.00)

echo -e "GGGCUCGUAGAUCAGCGGUAGAUCGCUUCCUUCGCAAGGAAGCCCUGGGUUCAAAUCCCAGCGAGUCCACCA\n(((((((..((((.......))))((((((((...)))))))).(((((.......))))))))))))....\n" | ./linearfold --eval --verbose
Hairpin loop ( 13, 21) CG : 4.50
Interior loop ( 12, 22) UA; ( 13, 21) CG : -2.40
Interior loop ( 11, 23) AU; ( 12, 22) UA : -1.10
Interior loop ( 10, 24) GC; ( 11, 23) AU : -2.40
Hairpin loop ( 32, 36) UA : 5.90
Interior loop ( 31, 37) UA; ( 32, 36) UA : -0.90
Interior loop ( 30, 38) CG; ( 31, 37) UA : -2.10
Interior loop ( 29, 39) CG; ( 30, 38) CG : -3.30
Interior loop ( 28, 40) UA; ( 29, 39) CG : -2.40
Interior loop ( 27, 41) UA; ( 28, 40) UA : -0.90
Interior loop ( 26, 42) CG; ( 27, 41) UA : -2.10
Interior loop ( 25, 43) GC; ( 26, 42) CG : -3.40
Hairpin loop ( 49, 57) GC : 4.40
Interior loop ( 48, 58) GC; ( 49, 57) GC : -3.30
Interior loop ( 47, 59) GC; ( 48, 58) GC : -3.30
Interior loop ( 46, 60) UA; ( 47, 59) GC : -2.10
Interior loop ( 45, 61) CG; ( 46, 60) UA : -2.10
Multi loop ( 7, 62) GC : 1.40
Interior loop ( 6, 63) CG; ( 7, 62) GC : -2.40
Interior loop ( 5, 64) UA; ( 6, 63) CG : -2.40
Interior loop ( 4, 65) CG; ( 5, 64) UA : -2.10
Interior loop ( 3, 66) GU; ( 4, 65) CG : -2.50
Interior loop ( 2, 67) GC; ( 3, 66) GU : -1.50
Interior loop ( 1, 68) GC; ( 2, 67) GC : -3.30
External loop : -1.70
GGGCUCGUAGAUCAGCGGUAGAUCGCUUCCUUCGCAAGGAAGCCCUGGGUUCAAAUCCCAGCGAGUCCACCA
(((((((..((((.......))))((((((((...)))))))).(((((.......)))))))))))).... (-31.50)

Example: Draw Circular Plot

cat ecoli_tRNA | ./draw_circular_plot

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LinearFold-1.1.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

LinearFold-1.1-cp39-cp39-macosx_10_9_universal2.whl (187.9 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file LinearFold-1.1.tar.gz.

File metadata

  • Download URL: LinearFold-1.1.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for LinearFold-1.1.tar.gz
Algorithm Hash digest
SHA256 023b5b03a7a7d40c89dac809086ad84b53f16007bbe653d1c7dc80c93c8bc52c
MD5 f0428482177447afd9e58d73b4eff3ba
BLAKE2b-256 60c1ddd3b6d86f5253f49cfc34af1afdf1c818b166b6d7895c9d54cb023771be

See more details on using hashes here.

File details

Details for the file LinearFold-1.1-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for LinearFold-1.1-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 03c7d09e9525d4e8da1d42c82c6ac9d0b9e6b9e44b07d1951ccd8401ae266b5c
MD5 2f46ca3ca9987aac26bc0f5ec07a6924
BLAKE2b-256 079001f88547ba2851e4a1b9c921e06ba42455b6707c70e0b6fcbef57970b02b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page