Skip to main content

Amino acid reverse translation and DNA optimization tool based on species-specific codon-use distributions.

Project description

Codon Harmony

https://img.shields.io/pypi/v/codon_harmony.svg MIT License https://img.shields.io/travis/weitzner/codon_harmony.svg Documentation status Coverage report Updates Code style: black

Amino acid reverse translation and DNA optimization tool based on species-specific codon-use distributions. Species-specifc data can be found on the Codon Usage Database using the NCBI Taxonomy database id (e.g. 413997) or the organism’s Latin name (e.g. Escherichia coli B). Mapping species names to Taxonomy IDs can be done here.

Features

  1. Reverse translates input amino acid sequence to DNA.

  2. Calculates the host’s per-AA codon usage profile – codons used less than a specified threshold (defaults to 10%) are dropped.

  3. Compares the reverse-translated DNA sequence to the host profile, determines which codons are overused/underused.

  4. Stochastically mutates codons according to host profile.

  5. Ranks sequences by codon adaptation index relative to host

  6. Processes DNA to remove unwanted features:

    • high GC content within a sliding window and across the entire sequence

    • unwanted restriction sites

    • alternate start positions (GA-rich regions 18 bp upstream of ATG/GTG/TTG)

    • 3-consecutive identical codons and 9-mer repeat chunks

    • areas with more than 4 (variable) consecutive identical bps (“local homopolymers”)

    • RNA hairpins, detected by looking for 10-mers with reverse complements (including wobble bases) in the sequence

    • RNA splice sites, detected by similarity to consensus donor and acceptor site sequences

The process is repeated from step 3 for a specified number of cycles (defaults to 1000) OR until the per-AA codon profile of current DNA and host profile matches (within tolerance).

Future work

  • More advanced RNA-structure removal

History

0.9.2 (2019-02-06)

  • First release on PyPI.

0.9.4 (2019-02-20)

  • Full suite of tests added, bugs uncovered and fixed

  • Adjustments to the packaging setup – actaully installable now

0.9.5 (2019-02-25)

  • Adding support for RNA splice site detection and removal

0.9.6 (2019-02-28)

  • Updating the way optimization failures are reported and displayed

  • Parallelizing via a process pool

1.0.0 (2019-03-06)

  • Added ability to use offline tables in addition to fetching from the internet

  • Full suite of tests and documentation

  • Tested on real-world sequences to

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codon_harmony-1.0.0.tar.gz (31.1 kB view details)

Uploaded Source

File details

Details for the file codon_harmony-1.0.0.tar.gz.

File metadata

  • Download URL: codon_harmony-1.0.0.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for codon_harmony-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b75d4f67e08868de02a6b2da2884fddd91fc27d8ef9311100bfdcf41133bcb0b
MD5 fb4e6a931a5cdce092e664682154cbf1
BLAKE2b-256 1ccbea0fe9b327f78b4f419bd8549284ac30da6f2eb4a758f25547ba8f85ccf0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page