Skip to main content

Predict the minimum free energy structure of nucleic acids

Project description

seqfold

Predict the minimum free energy structure of nucleic acids.

seqfold is an implementation of the Zuker, 1981 dynamic programming algorithm, the basis for UNAFold/mfold, plus energy functions from SantaLucia, 2004.

Installation

pip install seqfold

Usage

Python

from seqfold import calc_dg

# a bifurcated DNA structure
calc_dg("GGGAGGTCGTTACATCTGGGTAACACCGGTACTGATCCGGTGACCTCCC")  # -12.94

CLI

$ seqfold TAGCTCAGCTGGGAGAGCGCCTGCTTTGCACGCAGGAGGT -t 32
-6.58

Motivation

Knowing nucleic acid sequences' secondary structures is essential in synbio for selecting primers for PCR, designing oligos for MAGE, and tuning RBS expression rates.

While UNAFold and mfold are the most widely used applications for nucleic acid secondary structure prediction, their format and license are restrictive. seqfold is meant to be a more open-source, but minimal, application for predicting minimum free energy secondary structure.

seqfold mfold UNAFold
License MIT Academic Non-commercial $200-36,000
OS Linux, MacOS, Windows Linux, MacOS Linux, MacOS, Windows
Format python, CLI python CLI binary CLI binary
Dependencies none (mfold_util) Perl, (gnuplot, glut/OpenGL)
Graphical no yes (output) yes (output)
Heterodimers no yes yes
Constraints no yes yes

Citations

That papers and others that were used to develop this library are below. Each paper is listed along with how it relates to seqfold.

Nussinov, 1980

Nussinov, Ruth, and Ann B. Jacobson. "Fast algorithm for predicting the secondary structure of single-stranded RNA." Proceedings of the National Academy of Sciences 77.11 (1980): 6309-6313.

Framework for the dynamic programming approach. It has a conceptually helpful "Maximal Matching" example that demonstrates the approach on a simple sequence with only matched or unmatched bp.

Zuker, 1981

Zuker, Michael, and Patrick Stiegler. "Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information." Nucleic acids research 9.1 (1981): 133-148.

The most cited paper in this space. Extends further than Nussinov, 1980 with a nearest neighbor approach to energies and a consideration of each of stack, bulge, internal loop, and hairpin. Their data structure and traceback method are both more intuitive than Nussinov, 1980.

Jaeger, 1989

Jaeger, John A., Douglas H. Turner, and Michael Zuker. "Improved predictions of secondary structures for RNA." Proceedings of the National Academy of Sciences 86.20 (1989): 7706-7710.

Zuker and colleagues expand on the 1981 paper to incorporate penalties for multibranched loops and dangling ends.

SantaLucia, 2004

SantaLucia Jr, John, and Donald Hicks. "The thermodynamics of DNA structural motifs." Annu. Rev. Biophys. Biomol. Struct. 33 (2004): 415-440.

The paper from which almost every DNA energy function in seqfold comes from (with the exception of multibranch loops). Provides neighbor entropies and enthalpies for stacks, mismatching stacks, terminal stacks, and dangling stacks. Ditto for bulges, internal loops, and hairpins.

Turner, 2009

Turner, Douglas H., and David H. Mathews. "NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure." Nucleic acids research 38.suppl_1 (2009): D280-D282.

Source of RNA nearest neighbor change in entropy and enthalpy parameter data. In /data.

Ward, 2017

Ward, M., Datta, A., Wise, M., & Mathews, D. H. (2017). Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best. Nucleic acids research, 45(14), 8541-8550.

An investigation of energy functions for multibranch loops that validates the simple linear approach employed by Jaeger, 1989 that keeps runtime at O(n³).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqfold-0.3.2.tar.gz (30.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

seqfold-0.3.2-py3.8.egg (45.1 kB view details)

Uploaded Egg

seqfold-0.3.2-py2.py3-none-any.whl (23.3 kB view details)

Uploaded Python 2Python 3

File details

Details for the file seqfold-0.3.2.tar.gz.

File metadata

  • Download URL: seqfold-0.3.2.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.1

File hashes

Hashes for seqfold-0.3.2.tar.gz
Algorithm Hash digest
SHA256 111c8c801a4a29a1c2ae8e929e8269779e9f843acad15e15d3118cb5440349ff
MD5 1fab0608eea2158fc4c7344f8e1b53ca
BLAKE2b-256 5fb95cec75cc0d7ed0d976e5b2af349d1e6522001bcde49530f2fef73d33e1f1

See more details on using hashes here.

File details

Details for the file seqfold-0.3.2-py3.8.egg.

File metadata

  • Download URL: seqfold-0.3.2-py3.8.egg
  • Upload date:
  • Size: 45.1 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.8.0

File hashes

Hashes for seqfold-0.3.2-py3.8.egg
Algorithm Hash digest
SHA256 f965ffb4b39501718d747e36674e1361395f2756d008f0d9ca3f78968dc0df5a
MD5 91f6f4d2bba4d3345473801163b0b92b
BLAKE2b-256 0162427a941b36965ad61660826358c452441509493eae14c03f85dc5e562e3d

See more details on using hashes here.

File details

Details for the file seqfold-0.3.2-py2.py3-none-any.whl.

File metadata

  • Download URL: seqfold-0.3.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.1

File hashes

Hashes for seqfold-0.3.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 52df302fff15f859ca24fa99c1da4e115bbd42fd6836451996b15fe151426245
MD5 bf40f8fae069f734db9ccf8ae9b98ede
BLAKE2b-256 05d1b2620e9e013f433375517ad7d660a51d1b53a68cb6d55d2010fa2a4ac0dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page