Skip to main content

A Multi-Objective algorithm for DNA Design and Assembly

Project description

MOODA: Multi-Objective Optimization for DNA design and assembly

Current version: 0.11.0

build platform anaconda

MOODA is a multi-objective optimisation algorithm for DNA sequence design and assembly.

It takes in input an annotated sequence in GenBank format, and optimize it with respect to user-defined objectives.

Currently, some of the most common common operations in synthetic biology are built-in, including:

  • The GCOptimizationOperator introduces silent mutation in coding regions to obtain DNA constructs with a user-defined GC content.

  • The CodonUsageOperator probabilistically recodes coding regions by probabilistically selecting the most frequent codon for an aminoacid in a host organism.

  • The BlockJoin and BlockSplit operators allow the division of a sequence into fragments (or blocks). After the optimisation, each block is then adapted to the assembly method selected by the user. Currently, only the Gibson assembly is supported.

New operators, objective functions or assembly method can be added by extending the Operator, ObjectiveFunction and Assembly classes.

Installation

The easiest and fastest method to use mooda is using Docker:

    docker pull ghcr.io/stracquadaniolab/mooda

You can also install mooda using conda:

    $ conda install -c stracquadaniolab -c bioconda -c conda-forge mooda

or using pip:

    $ pip install mooda-dna

Please note, that pip will not install non Python requirements.

Getting started

A typical mooda analysis consists of 3 steps:

  1. Select a DNA sequence in Genbank format.

  2. Write a MOODA configuration file. A .yaml file defining operators, objective functions, assemblies strategy and their parameters.

  3. Run MOODA.

Example: optimizing GC content, E. coli codon usage, number of fragments and the variance of their length

Create a test directory as follows:

    $ mkdir example-run

Move to your test directory as follows:

    $ cd example-run

Download test data from Github as follows:

    $ curl -LO https://github.com/stracquadaniolab/mooda/raw/master/examples/mooda-example1.tar.gz

Extract test data as follows:

    $ tar xvzf mooda-example1.tar.gz

Run mooda as follows:

    $ docker run -it --rm -v $PWD:$PWD -w $PWD ghcr.io/stracquadaniolab/mooda -i seq_5_5.gb  -c config.yaml -p 10 -it 20 -a 100 -mns 200 -mxs 2000 -bss 50 -js 40 -dir example-opt -gf True

Results will be available in the example-opt directory, where you will find:

  • Genbank files of the Pareto optimal sequence.
  • FASTA files with the fragments for Gibson assembly for each Pareto optimal sequence.
  • _logfile.yaml file with information about the analysis.
  • _mooda_result.csv file with objective function value information for each sequence.

Command line options

  • -i: Input DNA sequence to process.

  • -c: Configuration file to set operators, objective functions and their parameters.

  • -p: Pool size. Number of candidate solutions sampled at each iteration. The pool size should increase with the length and complexity of the input sequence.

  • -it: Number of iterations. The number of iterations should increase with the length and complexity of the input sequence, although it will take longer to run.

  • -a: Archive size. The number of non-dominated solutions to store at each iteration, which allows to use smaller pools for improved efficiency.

  • -mns: Block minimum size.

  • -mxs: Block maximum size.

  • -bss: Sequence block step size, define the minimum variance between block size. Default: 50.

  • -js: Sequence block assembly overlap size, define the amount of overlap between blocks. Default: 40.

  • -dir: Output directory for MOODA results.

  • -gf: Allow the writing of FASTA and GenBank files. Default: False.

Authors

Citation

Design and assembly of DNA molecules using multi-objective optimization. A Gaeta, V Zulkower, G Stracquadanio - Synthetic Biology, 2021

@article{10.1093/synbio/ysab026,
    author = {Gaeta, Angelo and Zulkower, Valentin and Stracquadanio, Giovanni},
    title = "{Design and assembly of DNA molecules using multi-objective optimization}",
    journal = {Synthetic Biology},
    volume = {6},
    number = {1},
    year = {2021},
    month = {10},
    issn = {2397-7000},
    doi = {10.1093/synbio/ysab026},
    url = {https://doi.org/10.1093/synbio/ysab026},
    note = {ysab026},
    eprint = {https://academic.oup.com/synbio/article-pdf/6/1/ysab026/40977182/ysab026.pdf},
}

Issues

Please post an issue to report a bug or request new features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mooda-dna-0.11.0.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

mooda_dna-0.11.0-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file mooda-dna-0.11.0.tar.gz.

File metadata

  • Download URL: mooda-dna-0.11.0.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for mooda-dna-0.11.0.tar.gz
Algorithm Hash digest
SHA256 960592e64e5e52ba22aa3c07f171755e62b5897b1cf6c751e8d813dc907eb57c
MD5 09ea1615e21d28078bb9f34d603dcca4
BLAKE2b-256 bab4451a47efa86110d1d13e3837283ddc064d1cd1f99ee74d2e3972164d764e

See more details on using hashes here.

File details

Details for the file mooda_dna-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: mooda_dna-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 24.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for mooda_dna-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19e16e39dc182f6d8dc7241f737a816682bad01cf011866e6ed8aaa19ec7deb7
MD5 3de0c9e2b6d29cfbee7669ebabadd27e
BLAKE2b-256 1b73a4c0bb0d62a36990a74bc7101574b4892399774f8455d2e7e694654d556b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page