Amino acid reverse translation and DNA optimization tool based on species-specific codon-use distributions.
Project description
Codon Harmony
Amino acid reverse translation and DNA optimization tool based on species-specific codon-use distributions. Species-specifc data can be found on the Codon Usage Database using the NCBI Taxonomy database id (e.g. 413997) or the organism’s Latin name (e.g. Escherichia coli B). Mapping species names to Taxonomy IDs can be done here.
Documentation: https://codon-harmony.readthedocs.io
Features
Reverse translates input amino acid sequence to DNA.
Calculates the host’s per-AA codon usage profile – codons used less than a specified threshold (defaults to 10%) are dropped.
Compares the reverse-translated DNA sequence to the host profile, determines which codons are overused/underused.
Stochastically mutates codons according to host profile.
Ranks sequences by codon adaptation index relative to host
Processes DNA to remove unwanted features:
high GC content within a sliding window and across the entire sequence
unwanted restriction sites
alternate start positions (GA-rich regions 18 bp upstream of ATG/GTG/TTG)
3-consecutive identical codons and 9-mer repeat chunks
areas with more than 4 (variable) consecutive identical bps (“local homopolymers”)
RNA hairpins, detected by looking for 10-mers with reverse complements (including wobble bases) in the sequence
RNA splice sites, detected by similarity to consensus donor and acceptor site sequences
The process is repeated from step 3 for a specified number of cycles (defaults to 1000) OR until the per-AA codon profile of current DNA and host profile matches (within tolerance).
Future work
More advanced RNA-structure removal
CONTRAfold – overkill for now
nupack – overkill for now
History
0.9.2 (2019-02-06)
First release on PyPI.
0.9.4 (2019-02-20)
Full suite of tests added, bugs uncovered and fixed
Adjustments to the packaging setup – actaully installable now
0.9.5 (2019-02-25)
Adding support for RNA splice site detection and removal
0.9.6 (2019-02-28)
Updating the way optimization failures are reported and displayed
Parallelizing via a process pool
1.0.0 (2019-03-06)
Added ability to use offline tables in addition to fetching from the internet
Full suite of tests and documentation
Tested on real-world sequences to
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.