Optimizing Codon Usage with a Quasispecies Model
We provide a library that enables us to select a number of reference genes to which codon usage should be optimized. Furthermore, we allow for input of a variable amount of fitness factors: translation speed of codons, tRNA abundance, etc. Given these contributing fitness factors the result is displayed as the strength of the respective fitness factors that lead to the best resemblance between simulated and reference codon usage. In a next step, the strengths can be tuned and a codon usage can be generated that can afterwards be used to adapt a gene sequence with the help of classic codon optimization tools as OPTIMIZER.
In an example workflow you might want to select a fasta file that contains the genes you want use. You can either select them from a file or a url. In both cases a histogram of codon usage and amino acid usage is generated.
You can then (optionally) load a list of highly expressed genes, we support the format from the HEG database. Visualizing the codon usage bias for e.g. checking if the CUB as you expect can be done by plotting various methods of dimensionality reduction.
If you do not want to use all the genes you can enter a number n. The first n genes will only be analysed.
You now have to select a fitness matrix which gives the probability of one amino acid to be represented by another one.
Additionally, you can select a number of fitnessfunctions that assign to each codon a fitness. These functions will be normalized! If you want to perform a test run you have to enter the parameters: alpha,beta,selection,t_i for every testfunction. alpha and beta are parameters for the <todo> model of codon substitution and are related to transition/transversion bias. Input is either comma or whitespace/tab separated (or a combination of those).
You can compare the absolute codon usage and relative (normalized for each amino acid) codon usage by plot comparison. For optimizing the distance you can try optimizing the first gene and again regard the comparison to see if the algorithm works at all.
In a last step you can optimize all genes you have read in. Returned are the optimal parameters, a goodness of fit and the RSCU that you can use for optimizing with the help of, e.g., OPTIMIZER.