Skip to main content

Predict the minimum free energy structure of nucleic acids

Project description


Predict the minimum free energy structure of nucleic acids.

seqfold is an implementation of the Zuker, 1981 dynamic programming algorithm, the basis for UNAFold/mfold, with energy functions from SantaLucia, 2004 (DNA) and Turner, 2009 (RNA).


pip install seqfold



from seqfold import dg, dg_cache, fold, Struct

# just returns minimum free energy

# `fold` returns a list of `seqfold.Struct` from the minimum free energy structure
print(sum(s.e for s in structs))  # -12.94; same as calc_dg()
for struct in structs:
    print(struct) # prints the i, j, dg, and description of each structure

# `dg_cache` returns a 2D array where each (i,j) combination returns the MFE from i to j inclusive


usage: seqfold [-h] [-t FLOAT] [-v] [-l] [--version] SEQ

Predict the minimum free energy (kcal/mol) of a nucleic acid sequence

positional arguments:
  SEQ            nucleic acid sequence to fold

optional arguments:
  -h, --help     show this help message and exit
  -t FLOAT       temperature in Celsius
  -v, --verbose  log a dot-bracket of the MFE structure
  -l, --log      log each substructure in the MFE folding
  --version      show program's version number and exit


   i    j     dg  description
   0   48   -2.2  STACK:GG/CC
   1   47   -2.2  STACK:GG/CC
   2   46   -1.4  STACK:GA/CT
   3   45   -1.4  STACK:AG/TC
   4   44   -2.2  STACK:GG/CC
   5   43   -1.6  STACK:GT/CA
   6   42   -1.4  STACK:TC/AG
   7   41   -0.5  BIFURCATION:4n/3h
   9   22   -1.1  STACK:TT/AA
  10   21   -1.0  STACK:TA/AT
  11   20   -1.6  STACK:AC/TG
  12   19    3.0  HAIRPIN:CA/GG
  25   39   -2.2  STACK:CC/GG
  26   38   -2.3  STACK:CG/GC
  27   37   -2.2  STACK:GG/CC
  28   36    3.2  HAIRPIN:GT/CT


  • The type of nucleic acid, DNA or RNA, is inferred from the input sequence.
  • seqfold is case-insensitive with the input sequence.
  • The default temperature is 37 degrees Celsius for both the Python and CLI interface.


Secondary structure prediction is used for selecting primers for PCR, designing oligos for MAGE, and tuning RBS expression rates.

While UNAFold and mfold are the most widely used applications for nucleic acid secondary structure prediction, their format and license are restrictive. seqfold is meant to be an open-source, minimalist alternative for predicting minimum free energy secondary structure.

seqfold mfold UNAFold
License MIT Academic Non-commercial $200-36,000
OS Linux, MacOS, Windows Linux, MacOS Linux, MacOS, Windows
Format python, CLI python CLI binary CLI binary
Dependencies none (mfold_util) Perl, (gnuplot, glut/OpenGL)
Graphical no yes (output) yes (output)
Heterodimers no yes yes
Constraints no yes yes


That papers and others that were used to develop this library are below. Each paper is listed along with how it relates to seqfold.

Nussinov, 1980

Nussinov, Ruth, and Ann B. Jacobson. "Fast algorithm for predicting the secondary structure of single-stranded RNA." Proceedings of the National Academy of Sciences 77.11 (1980): 6309-6313.

Framework for the dynamic programming approach. It has a conceptually helpful "Maximal Matching" example that demonstrates the approach on a simple sequence with only matched or unmatched bp.

Zuker, 1981

Zuker, Michael, and Patrick Stiegler. "Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information." Nucleic acids research 9.1 (1981): 133-148.

The most cited paper in this space. Extends further than Nussinov, 1980 with a nearest neighbor approach to energies and a consideration of each of stack, bulge, internal loop, and hairpin. Their data structure and traceback method are both more intuitive than Nussinov, 1980.

Jaeger, 1989

Jaeger, John A., Douglas H. Turner, and Michael Zuker. "Improved predictions of secondary structures for RNA." Proceedings of the National Academy of Sciences 86.20 (1989): 7706-7710.

Zuker and colleagues expand on the 1981 paper to incorporate penalties for multibranched loops and dangling ends.

SantaLucia, 2004

SantaLucia Jr, John, and Donald Hicks. "The thermodynamics of DNA structural motifs." Annu. Rev. Biophys. Biomol. Struct. 33 (2004): 415-440.

The paper from which almost every DNA energy function in seqfold comes from (with the exception of multibranch loops). Provides neighbor entropies and enthalpies for stacks, mismatching stacks, terminal stacks, and dangling stacks. Ditto for bulges, internal loops, and hairpins.

Turner, 2009

Turner, Douglas H., and David H. Mathews. "NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure." Nucleic acids research 38.suppl_1 (2009): D280-D282.

Source of RNA nearest neighbor change in entropy and enthalpy parameter data. In /data.

Ward, 2017

Ward, M., Datta, A., Wise, M., & Mathews, D. H. (2017). Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best. Nucleic acids research, 45(14), 8541-8550.

An investigation of energy functions for multibranch loops that validates the simple linear approach employed by Jaeger, 1989 that keeps runtime at O(n³).

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for seqfold, version 0.7.4
Filename, size File type Python version Upload date Hashes
Filename, size seqfold-0.7.4-py3-none-any.whl (29.5 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size seqfold-0.7.4.tar.gz (37.7 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page