Skip to main content

Modern Python implementation of the McDonald-Kreitman test toolkit

Project description

MKado 御門

A modern Python implementation of the McDonald-Kreitman test toolkit for detecting selection in molecular evolution.

Features

  • Standard MK test: Classic 2x2 contingency table with Fisher's exact test
  • Polarized MK test: Uses a third outgroup to assign mutations to lineages
  • Asymptotic MK test: Frequency-bin α estimates with exponential extrapolation (Messer & Petrov 2013)
  • Batch processing: Process multiple genes with parallel execution
  • Multiple output formats: Pretty-print, TSV, and JSON

Installation

# Clone the repository
git clone https://github.com/andrewkern/mkado.git
cd mkado

# Install with uv
uv sync

# Or install with pip
pip install .

Quick Start

# Standard MK test (combined alignment file)
mkado test alignment.fa -i "dmel" -o "dsim"

# Asymptotic MK test
mkado test alignment.fa -i "dmel" -o "dsim" -a

# Polarized MK test
mkado test alignment.fa -i "dmel" -o "dsim" --polarize-match "dyak"

# Batch process a directory
mkado batch alignments/ -i "dmel" -o "dsim"

# Batch with asymptotic test and 8 parallel workers
mkado batch alignments/ -i "dmel" -o "dsim" -a -w 8

# Get file info
mkado info sequences.fa

Usage Modes

mkado supports two modes for specifying ingroup/outgroup sequences:

Combined File Mode (Recommended)

Use -i and -o to filter sequences by name pattern from a single alignment file:

mkado test alignment.fa -i "speciesA" -o "speciesB"
mkado batch alignments/ -i "speciesA" -o "speciesB"

Separate Files Mode

Provide separate FASTA files for ingroup and outgroup:

mkado test ingroup.fa outgroup.fa
mkado batch genes/ --ingroup-pattern "*_in.fa" --outgroup-pattern "*_out.fa"

Commands

mkado test

Run MK test on a single alignment.

mkado test FASTA [OUTGROUP_FILE] [OPTIONS]

Key Options:

Option Short Description
--ingroup-match -i Ingroup sequence name pattern (combined mode)
--outgroup-match -o Outgroup sequence name pattern (combined mode)
--asymptotic -a Use asymptotic MK test
--polarize -p Second outgroup file (separate files mode)
--polarize-match Second outgroup pattern (combined mode)
--bins -b Frequency bins for asymptotic test (default: 10)
--format -f Output format: pretty, tsv, json
--reading-frame -r Reading frame 1-3 (default: 1)

Examples:

# Combined file mode
mkado test alignment.fa -i "dmel" -o "dsim"
mkado test alignment.fa -i "dmel" -o "dsim" -a -b 20
mkado test alignment.fa -i "dmel" -o "dsim" --polarize-match "dyak"

# Separate files mode
mkado test ingroup.fa outgroup.fa
mkado test ingroup.fa outgroup.fa -a
mkado test ingroup.fa outgroup.fa -p outgroup2.fa

mkado batch

Run MK test on multiple alignment files.

mkado batch DIRECTORY [OPTIONS]

Key Options:

Option Short Description
--ingroup-match -i Ingroup pattern (enables combined file mode)
--outgroup-match -o Outgroup pattern (required with -i)
--asymptotic -a Use asymptotic MK test
--aggregate/--per-gene Aggregate results or per-gene (asymptotic)
--pattern File glob pattern (default: auto-detect *.fa, *.fasta, *.fna)
--workers -w Parallel workers (0=auto, 1=sequential)
--bins -b Frequency bins for asymptotic test
--format -f Output format: pretty, tsv, json

Examples:

# Combined file mode (recommended)
mkado batch alignments/ -i "dmel" -o "dsim"
mkado batch alignments/ -i "dmel" -o "dsim" -a
mkado batch alignments/ -i "dmel" -o "dsim" -a --per-gene
mkado batch alignments/ -i "dmel" -o "dsim" -w 8

# Separate files mode
mkado batch genes/ --ingroup-pattern "*_in.fa" --outgroup-pattern "*_out.fa"

mkado info

Display information about a FASTA file.

mkado info FASTA [-r READING_FRAME]

Example Output

$ mkado test alignment.fa -i "kreitman" -o "mauritiana"

Found 11 ingroup, 1 outgroup sequences
MK Test Results:
  Divergence:    Dn=6, Ds=8
  Polymorphism:  Pn=1, Ps=8
  Fisher's exact p-value: 0.176
  Neutrality Index (NI):  0.1667
  Alpha (α):              0.8333

Python API

from mkado import mk_test, asymptotic_mk_test, SequenceSet

# Run MK test
result = mk_test("ingroup.fa", "outgroup.fa")
print(f"Alpha: {result.alpha}")
print(f"P-value: {result.p_value}")

# Run asymptotic MK test
result = asymptotic_mk_test("ingroup.fa", "outgroup.fa")
print(f"Asymptotic Alpha: {result.alpha_asymptotic}")
print(f"95% CI: {result.ci_low} - {result.ci_high}")

# Combined file mode - filter by sequence name
all_seqs = SequenceSet.from_fasta("combined.fa")
ingroup = all_seqs.filter_by_name("dmel")
outgroup = all_seqs.filter_by_name("dsim")
result = mk_test(ingroup, outgroup)

Interpretation

Neutrality Index (NI)

  • NI = 1: Neutral evolution
  • NI > 1: Excess polymorphism (negative/purifying selection)
  • NI < 1: Excess divergence (positive selection)

Alpha (α)

  • α = 0: No adaptive substitutions
  • α > 0: Proportion of substitutions driven by positive selection
  • α < 0: Excess polymorphism relative to divergence

Development

# Install dev dependencies
uv sync

# Run tests
uv run pytest

# Run linter
uv run ruff check src/

# Run formatter
uv run ruff format src/

References

  • McDonald, J. H., & Kreitman, M. (1991). Adaptive protein evolution at the Adh locus in Drosophila. Nature, 351(6328), 652-654.
  • Messer, P. W., & Petrov, D. A. (2013). Frequent adaptation and the McDonald–Kreitman test. PNAS, 110(21), 8615-8620.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mkado-0.1.0.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mkado-0.1.0-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file mkado-0.1.0.tar.gz.

File metadata

  • Download URL: mkado-0.1.0.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for mkado-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f7eabb22611e9a4916543d5ddfcba0274cd4034d976351087d8999ed9d763c95
MD5 9448b4accd00422fb7c77e33e80c5260
BLAKE2b-256 1e6a0b5a3d8f8b258eefb16a75b6f80da082f55c8382cff888d31ef8d3c9b587

See more details on using hashes here.

File details

Details for the file mkado-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mkado-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for mkado-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 149239d6133a944cc67e9ae76175eba8c1bdd7aab7c83cae68d261bd8f6276aa
MD5 78b77e12695212e80e5c4e1327baf454
BLAKE2b-256 ea29ab37cc2f4f963c16eb531d13d05258afe1a4830e25c20a9837f832167a78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page