Skip to main content

A Multi-Objective algorithm for DNA Design and Assembly

Project description

MOODA: Multi-Objective Optimization for DNA sequence Design and Assembly

Current version: 0.7.2-dev

build platform anaconda

MOODA is a multi-objective optimisation algorithm for sequence Design and Assembly.

It takes as input an annotated sequence in GenBank format, and optimize it with respect to user-specified objectives.

Currently, some of the most common common operations in synthetic biology are implemented:

  • The GC content operator reduces the difference between the GC content of a sequence and the GC content set as the target. It introduces silent mutation inside CDSs, to increase or decrease the GC content.

  • The Codon usage operator allows the recoding of CDSs according to the specified codon distribution. At each iteration, a specified number of codons is replaced by synonymous

  • The Block Join and Block split operators allow the division of the sequence into blocks, given a minimum and maximum size. After the optimisation, each block is then adapted to the selected assembly method. Currently, only Gibson assembly is supported.

New operators, objective functions or assembly method can be integrated into the algorithm as python sub-classes.

Installation

The easiest and fastest way to install mooda using conda:

$ conda install -c stracquadaniolab -c bioconda -c conda-forge mooda

Alternatively, you can install mooda through pip:

$ pip install mooda

Please note, that pip will not install non Python requirements.

Getting started

A typical mooda analysis consists of 3 steps:

  1. Select a DNA sequence in Genbank format.

  2. Write a MOODA configuration file. A .yaml file defining operators, objective functions, assemblies strategy and their parameters, this is how a MOODA configuration file looks like:

    Algorithm :

        operators :
            mooda.operator.SplitBlockOperator :
                min_block_size : 200
                max_block_size : 2000
                step_size : 50

            mooda.operator.JoinBlockOperator :
                min_block_size : 200
                max_block_size : 2000
                junction_size : 40
                step_size : 50

            mooda.operator.GCOptimizationOperator :
                codon_GC_table: "e_coli_codon_usage.yaml"
                target_gc : 50
                step_size : 0.05


            mooda.operator.CodonUsageOperator :
                step_size : 0.05
                codon_usage_table : "e_coli_codon_usage.yaml"

        objective_functions :

                mooda.objective_function.GCContentObjective :
                    target_gc : 50
                    junction_size : 40

                mooda.objective_function.BlockVarianceObjective:
                    junction_size : 40

                mooda.objective_function.BlockNumberObjective:

                mooda.objective_function.CodonUsageObjective :
                    codon_usage_table:"e_coli_codon_usage.yaml"

        assemblies :
                mooda.assembly.Gibson:
                    junction_size : 40
  1. Run MOODA.

Example

Test data are provided in test/mooda_test.zip.

You can run mooda on the test data as follows:

$ mooda -ag mo -i seq_5_5.gb  -c gc_codonusage_blockvariance_blocknumber.yaml -p 10 -it 20 -a 100 -mns 200 -mxs 2000 -bss 50 -js 40 -dir mooda_results_dir -gf True

-ag Algorithm to run can be either mo for Multi-Objective, either mc for Monte Carlo, mo is suggested for long sequences, Monte Carlo for small sequences and codon usage optimization. Default=mo.

-i Input DNA sequence to process.

-c Configuration file to set MOODA operators, objective functions and their parameters.

-p Pool size. The -p parameter should increase with the sequence size. It improves solution quality, however the computing time increase as well.

-it Number of iterations. The -it parameter should increase with the sequence size. It improves solution quality more than -p parameter, however the computing time increase as well 

-a Archive size, amount of non-dominated solutions to store at each algorithm iteration, allow to use smaller values for the pool size.

-mns Sequence block minimum size.

-mxs Sequence block maximum size.

-bss Sequence block step size, define the minimum variance between block lengths. Default: 50.

-js Sequence block assembly overlap size, define the amount of overlap between sequence blocks. Default: 40.

-dir Output directory for MOODA results.

-gf Allow the writing of FASTA and GenBank files, related to MOODA solution if set as True. Default=False.

Authors

Citation

Design and assembly of DNA molecules using multi-objective optimisation. Angelo Gaeta, Valentin Zulkower and Giovanni Stracquadanio. bioRxiv XX; doi: XX

Issues

Please post an issue to report a bug or request new features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mooda-dna-0.7.2.dev0.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mooda_dna-0.7.2.dev0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file mooda-dna-0.7.2.dev0.tar.gz.

File metadata

  • Download URL: mooda-dna-0.7.2.dev0.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for mooda-dna-0.7.2.dev0.tar.gz
Algorithm Hash digest
SHA256 240427982e22aac90187c64ffc5822dde3fb5ec33846949e80acc7f5789c560d
MD5 34bbfa6fa41cae3a64f607669cf1831b
BLAKE2b-256 896ebf0527ebe9bc1d505da36cb00752c0e0d89957bcea564c713fbe21531769

See more details on using hashes here.

File details

Details for the file mooda_dna-0.7.2.dev0-py3-none-any.whl.

File metadata

  • Download URL: mooda_dna-0.7.2.dev0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for mooda_dna-0.7.2.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 32621cd822e00610486e4f2ccac236e0592b099687df79118213d2822a41ef99
MD5 6dd3aeec805af866573d3bd7f0d4003d
BLAKE2b-256 77e8aedaf01c77e2c7417465b2abbcb81528d563fbab19fdadffdddbd88c2f00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page