Skip to main content

Routines for the extraction of degenerate sides and estimation of numbers neutral substitutions from sequences and alignments.

Project description

codon-degeneracy

Python application License

This python package provides routines for the extraction of degenerate sites from sequences and alignments. The latter is particularly useful for estimations of rates of neutral evolution.

Dependencies

This code uses biopython and scikit-bio internally. In order to installl via pip, numpy has to be installed.

Installing

Simply clone this repo:

git clone https://github.com/nickmachnik/codon-degeneracy.git [TARGET DIR]

and then install via pip

pip install [TARGET DIR]

or install directly from PyPI (this won't include unreleased changes as specified in the changelog):

pip install codon-degeneracy

Testing

Test the cloned package:

cd [TARGET DIR]
python -m unittest

Usage

There are more useful and well documented functions under the hood than shown here, which I enourage to explore by browsing the code.

Counting substitutions per four fold degenerate site

One of the main features of the package is the counting of neutral substitutions at four fold degenerate sites. This is best done with known orthologue pairs between species. substitution_rate_at_ffds provides that functionality and is easy to use like so:

from codon_degeneracy import substitution_rate_at_ffds as nsr
seq_a = (
    "ATACCCATGGCCAACCTCCTACTCCTCATTGTACCCATTC"
    "TAATCGCAATGGCATTCCTAATGCTTACCGAACGA")
seq_b = (
    "ATGACCACAGTAAATCTCCTACTTATAATCATACCCACAT"
    "TAGCCGCCATAGCATTTCTCACACTCGTTGAACGA")
(number_of_substitutions, number_of_sites), (orf_a, orf_b) = nsr(
    # the input sequences
    seq_a,
    seq_b,
    # NCBI codon table names as used in Bio.Data.CodonTable
    "Vertebrate Mitochondrial",
    "Vertebrate Mitochondrial")

The ORFs returned are there for sanity checks. The default behaviour is to select the first ATG codon as start.

NOTE: The numbers of neutral substitutions per site reported by this function are merely a lower bound, as they do not include the possibility of multiple substitutions per site.

Substitutions at four fold degenerate sites separated by CpG context

In certain situations, it may be useful to differentiate between four fold degenerate sites that could potentially exist in a CpG context and could therefore exhibit an elevated mutation rate and those that do not. substitutions_per_ffds_by_cpg_context provides that functionality. It differentiates between four CpG contexts. Sites that are: - preceded by C and not followed by G (nonCpG) - preceded by C but not followed by G (postC) - followed by G but not preceded by C (preG) - preceded by C and followed by G (postCpreG)

Note: the number of sites considered here may be substantially lower than in substitutions_per_ffds, as this function requires the sites preceeding and following site of a four fold degenerate site to be identical in the two aligned sequences.

The function can be used in exactly the same that is shown for substitutions_per_ffds above.

License

MIT license (LICENSE or https://opensource.org/licenses/MIT)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codon-degeneracy-0.1.3.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

codon_degeneracy-0.1.3-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file codon-degeneracy-0.1.3.tar.gz.

File metadata

  • Download URL: codon-degeneracy-0.1.3.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for codon-degeneracy-0.1.3.tar.gz
Algorithm Hash digest
SHA256 603f778319d9e0c0564755f060a933f26d8ba60d723030320af99fbe975d811f
MD5 92fa931160c352f3d0bd192cbb9a58c7
BLAKE2b-256 8ed5ecbe1b7cfe2d85a421e52f889d7920bbd72d4624768e79c859be2286ea8c

See more details on using hashes here.

File details

Details for the file codon_degeneracy-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: codon_degeneracy-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for codon_degeneracy-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 063847c63a0762557727768f9c01b6317206d2c5a1bfaca2582937f2b0768a37
MD5 dab3937d4aa938325ae57e1ff424fb6c
BLAKE2b-256 53cb22cfbe3789e0e4ccc31dcdf0e399ce238a0cc8aafd888f2fad698cd77284

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page