Efficiently computing dN/dS according to the NJ86 method
Project description
comp_dnds
Efficiently computing dN/dS according to the Nei-Gojobori(1986)[^1] method
from comp_dnds import dnds
d = dnds()
ref_seq = "ATG AAA CCC GGG TTT TAA".replace(" ", "")
obs_seq = "ATG AAA CGC GGC TAC TAA".replace(" ", "")
dn, ds, z, p = d.compute(ref_seq, obs_seq)
ω = dn/ds
print(round(ω, 2))
# 0.12
For longer more realistic sequences, it is also possible to compute the significance of ω, using bootstrap resampling. The Z-score and the p-value are computed using the method described by Nei&Kumar(2000)[^2]
d = dnds()
ref_seq = "GCC GGG GGA AGG ACA TAT CTC GCT CCA CCT AAT GGA ATC ATC GGT".replace(" ", "")
obs_seq = "GAC GGC CGA GGG GCA AAT CGC ACT ACA ACT ACT GAA GTC ATC AGT".replace(" ", "")
dn, ds, z, p = d.compute(ref_seq, obs_seq, bootstrap=1000)
print(f"ω= {round(dn/ds,2)} - z-score= {round(z,2)} - p-val= {round(p, 5)}")
# ω= 6.91 - z-score= 3.72 - p-val= 0.0002
Installation
pip install comp_dnds
Background
To compute the dN/dS ratio using the Nei-Gojobori(1986)[^1] method, one needs an observed, and a reference, or ancestral sequence. The observed sequence is typically the one from your sample/experiment, while the reference/ancestral sequence can be inferred with ancestral state reconstruction, for example using IQ-TREE.
The dN/dS ratio, often represented as ω (omega), is a metric used in molecular biology and evolutionary biology to measure the selective pressure acting on a protein-coding gene. It compares the rate of non-synonymous substitutions (dN) to the rate of synonymous substitutions (dS) in coding sequences.
Non-synonymous substitutions (dN): These are mutations that change the amino acid in a protein. They can potentially affect the protein's structure and function, and therefore they can be subject to natural selection.
Synonymous substitutions (dS): These are mutations that do not change the amino acid in a protein. Because they don't change the protein's structure or function, they are often considered to be selectively neutral, or at least under less selective pressure than non-synonymous changes.
The dN/dS ratio provides insight into the evolutionary forces acting on a gene:
-
dN/dS = 1: The rate of non-synonymous substitutions is about the same as the rate of synonymous substitutions. This suggests that the gene is evolving under neutral evolution.
-
dN/dS < 1: The rate of non-synonymous changes is lower than the rate of synonymous changes. This suggests that the gene is under purifying or negative selection. Negative selection acts to remove deleterious mutations.
-
dN/dS > 1: The rate of non-synonymous changes is higher than the rate of synonymous changes. This suggests that the gene is under positive or adaptive selection. This means that non-synonymous changes provide some advantage and are being selected for.
Benchmark
Figure 1: Average time taken by biopython cal_dn_ds
and comp_dnds
to compute the dN and dS value for a sequence of length 999 nucleotides. The lower the better.
comp_dnds
produces identical results to the cal_dn_ds
implementation of biopython, while being on average 32X faster than biopython. Detailed benchmark in notebooks/benchmark.ipynb
[^1]: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions
[^2]: Nei, M., & Kumar, S. (2000). Molecular evolution and phylogenetics. Oxford University Press, USA. Page 55
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for comp_dnds-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ba75d61aef9f9e0600a823534ca9fc48b424ae047ac4e1b36307b18a3abdd37 |
|
MD5 | b2df83988575b294168535c8df025fc8 |
|
BLAKE2b-256 | 35aeb791537a197896da3f2fdf91675fbc65eb6ec016b0502e6a951714b5c952 |