Skip to main content

Modern Python implementation of the McDonald-Kreitman test toolkit

Project description

MKado 御門

Documentation Status bioRxiv

A modern Python implementation of the McDonald-Kreitman test toolkit for detecting selection in molecular evolution.

Documentation | PyPI

Features

  • Standard MK test: Classic 2x2 contingency table with Fisher's exact test
  • Polarized MK test: Uses a second outgroup to assign mutations to lineages
  • Asymptotic MK test: Frequency-bin α estimates with exponential extrapolation (Messer & Petrov 2013)
  • Tarone-Greenland α_TG: Weighted multi-gene estimator that corrects for sample size heterogeneity (Stoletzki & Eyre-Walker 2011)
  • Alternate genetic codes: 24 NCBI genetic code tables (mitochondrial, plastid, etc.) selectable by name
  • VCF input: Go directly from VCF + reference + GFF3 annotation to MK test results (no FASTA alignment needed)
  • Batch processing: Process multiple genes with parallel execution and Benjamini-Hochberg correction for multiple testing
  • Volcano plots: Visualize batch results with publication-ready volcano plots
  • Multiple output formats: Pretty-print, TSV, and JSON

Installation

# Install with uv (recommended)
uv pip install mkado

# Or install with pip
pip install mkado

Development Installation

# Clone the repository
git clone https://github.com/andrewkern/mkado.git
cd mkado

# Install with uv
uv sync

Quick Start

# Standard MK test (combined alignment file)
mkado test alignment.fa -i "dmel" -o "dsim"

# Asymptotic MK test
mkado test alignment.fa -i "dmel" -o "dsim" -a

# Polarized MK test
mkado test alignment.fa -i "dmel" -o "dsim" --polarize-match "dyak"

# Batch process a directory
mkado batch alignments/ -i "dmel" -o "dsim"

# Batch with asymptotic test and 8 parallel workers
mkado batch alignments/ -i "dmel" -o "dsim" -a -w 8

# Run MK test directly from VCF files
mkado vcf --vcf pop.vcf.gz --outgroup-vcf out.vcf.gz --ref genome.fa --gff genes.gff3

# Get file info
mkado info sequences.fa

Usage Modes

mkado supports two modes for specifying ingroup/outgroup sequences:

Combined File Mode (Recommended)

Use -i and -o to filter sequences by name pattern from a single alignment file:

mkado test alignment.fa -i "speciesA" -o "speciesB"
mkado batch alignments/ -i "speciesA" -o "speciesB"

Separate Files Mode

Provide separate FASTA files for ingroup and outgroup:

mkado test ingroup.fa outgroup.fa
mkado batch genes/ --ingroup-pattern "*_in.fa" --outgroup-pattern "*_out.fa"

Commands

mkado test

Run MK test on a single alignment.

mkado test FASTA [OUTGROUP_FILE] [OPTIONS]

Key Options:

Option Short Description
--ingroup-match -i Ingroup sequence name pattern (combined mode)
--outgroup-match -o Outgroup sequence name pattern (combined mode)
--asymptotic -a Use asymptotic MK test
--polarize -p Second outgroup file (separate files mode)
--polarize-match Second outgroup pattern (combined mode)
--bins -b Frequency bins for asymptotic test (default: 10)
--plot-asymptotic Generate alpha(x) plot for asymptotic test (PNG, PDF, or SVG)
--code-table Genetic code (e.g. vertebrate-mito, or NCBI ID)
--format -f Output format: pretty, tsv, json
--reading-frame -r Reading frame 1-3 (default: 1)

Examples:

# Combined file mode
mkado test alignment.fa -i "dmel" -o "dsim"
mkado test alignment.fa -i "dmel" -o "dsim" -a -b 20
mkado test alignment.fa -i "dmel" -o "dsim" -a --plot-asymptotic alpha_fit.png
mkado test alignment.fa -i "dmel" -o "dsim" --polarize-match "dyak"

# Separate files mode
mkado test ingroup.fa outgroup.fa
mkado test ingroup.fa outgroup.fa -a
mkado test ingroup.fa outgroup.fa -p outgroup2.fa

mkado batch

Run MK test on multiple alignment files.

mkado batch DIRECTORY [OPTIONS]

Key Options:

Option Short Description
--ingroup-match -i Ingroup pattern (enables combined file mode)
--outgroup-match -o Outgroup pattern (required with -i)
--asymptotic -a Use asymptotic MK test
--alpha-tg Compute weighted α_TG (Stoletzki & Eyre-Walker 2011)
--aggregate/--per-gene Aggregate results or per-gene (asymptotic)
--pattern File glob pattern (default: auto-detect *.fa, *.fasta, *.fna)
--workers -w Parallel workers (0=auto, 1=sequential)
--bins -b Frequency bins for asymptotic test
--code-table Genetic code (e.g. vertebrate-mito, or NCBI ID)
--format -f Output format: pretty, tsv, json
--volcano Generate volcano plot (PNG, PDF, or SVG)
--plot-asymptotic Generate alpha(x) plot for aggregated asymptotic test

Examples:

# Combined file mode (recommended)
mkado batch alignments/ -i "dmel" -o "dsim"
mkado batch alignments/ -i "dmel" -o "dsim" -a
mkado batch alignments/ -i "dmel" -o "dsim" -a --per-gene
mkado batch alignments/ -i "dmel" -o "dsim" --alpha-tg
mkado batch alignments/ -i "dmel" -o "dsim" -w 8

# Generate a volcano plot
mkado batch alignments/ -i "dmel" -o "dsim" --volcano results.png

# Generate asymptotic alpha(x) plot
mkado batch alignments/ -i "dmel" -o "dsim" -a --plot-asymptotic alpha_fit.png

# Separate files mode
mkado batch genes/ --ingroup-pattern "*_in.fa" --outgroup-pattern "*_out.fa"

mkado vcf

Run MK test from VCF + reference genome + GFF3 annotation. No pre-aligned FASTA files needed.

mkado vcf --vcf VCF --outgroup-vcf VCF --ref FASTA --gff GFF3 [OPTIONS]

Key Options:

Option Short Description
--vcf Ingroup VCF (multi-sample, bgzipped+tabix recommended)
--outgroup-vcf Single-sample outgroup VCF
--ref Reference FASTA (plain or bgzipped, faidx-indexed)
--gff GFF3 annotation (plain or gzipped)
--gene Analyze a single gene by ID
--gene-list File with gene IDs to analyze
--asymptotic -a Use asymptotic MK test
--alpha-tg Compute weighted α_TG
--imputed Use imputed MK test
--aggregate/--per-gene Aggregate or per-gene results
--volcano Generate volcano plot (PNG, PDF, or SVG)
--plot-asymptotic Generate alpha(x) plot for aggregated asymptotic test
--workers -w Parallel workers (0=auto)
--format -f Output format: pretty, tsv, json
--verbose Show htslib/VCF parsing warnings

Examples:

# Standard MK test across all genes
mkado vcf --vcf pop.vcf.gz --outgroup-vcf out.vcf.gz --ref genome.fa --gff genes.gff3

# Asymptotic MK test
mkado vcf --vcf pop.vcf.gz --outgroup-vcf out.vcf.gz --ref genome.fa --gff genes.gff3 -a

# Single gene analysis
mkado vcf --vcf pop.vcf.gz --outgroup-vcf out.vcf.gz --ref genome.fa --gff genes.gff3 --gene BRCA1

# With parallel workers and TSV output
mkado vcf --vcf pop.vcf.gz --outgroup-vcf out.vcf.gz --ref genome.fa --gff genes.gff3 -w 8 -f tsv

mkado codes

List available genetic code tables.

mkado codes

mkado info

Display information about a FASTA file.

mkado info FASTA [-r READING_FRAME]

Example Output

$ mkado test alignment.fa -i "kreitman" -o "mauritiana"

Found 11 ingroup, 1 outgroup sequences
MK Test Results:
  Divergence:    Dn=6, Ds=8
  Polymorphism:  Pn=1, Ps=8
  Fisher's exact p-value: 0.176
  Neutrality Index (NI):  0.1667
  Alpha (α):              0.8333
  DoS:                    0.3175

Python API

from mkado import mk_test, asymptotic_mk_test, SequenceSet

# Run MK test
result = mk_test("ingroup.fa", "outgroup.fa")
print(f"Alpha: {result.alpha}")
print(f"P-value: {result.p_value}")

# Run asymptotic MK test
result = asymptotic_mk_test("ingroup.fa", "outgroup.fa")
print(f"Asymptotic Alpha: {result.alpha_asymptotic}")
print(f"95% CI: {result.ci_low} - {result.ci_high}")

# Combined file mode - filter by sequence name
all_seqs = SequenceSet.from_fasta("combined.fa")
ingroup = all_seqs.filter_by_name("dmel")
outgroup = all_seqs.filter_by_name("dsim")
result = mk_test(ingroup, outgroup)

Interpretation

Neutrality Index (NI)

  • NI = 1: Neutral evolution
  • NI > 1: Excess polymorphism (segregating weakly deleterious variants)
  • NI < 1: Excess divergence (positive selection)

Alpha (α)

  • α = 0: No adaptive substitutions
  • α > 0: Proportion of substitutions driven by positive selection
  • α < 0: Excess polymorphism relative to divergence

Direction of Selection (DoS)

From Stoletzki & Eyre-Walker (2011), DoS = Dn/(Dn+Ds) - Pn/(Pn+Ps):

  • DoS = 0: Neutral evolution
  • DoS > 0: Positive selection (excess adaptive substitutions)
  • DoS < 0: Slightly deleterious polymorphisms

DoS is bounded [-1, +1] and symmetric around 0, making it easier to interpret than NI.

Development

# Install dev dependencies
uv sync

# Run tests
uv run pytest

# Run linter
uv run ruff check src/

# Run formatter
uv run ruff format src/

Examples

Example data and tutorials are available in the examples/ directory:

# Run batch MK test on example data
mkado batch examples/anopheles_batch/ -i gamb -o afun

# Run asymptotic MK test
mkado batch examples/anopheles_batch/ -i gamb -o afun -a

See the documentation for detailed tutorials and API reference.

Citation

If you use MKado in your research, please cite:

Rivera-Colón, A. G., Rehmann, C. T., & Kern, A. D. (2026). MKado: a toolkit for McDonald-Kreitman tests of natural selection. bioRxiv. https://doi.org/10.64898/2026.03.02.709122

References

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mkado-0.5.0.tar.gz (64.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mkado-0.5.0-py3-none-any.whl (78.6 kB view details)

Uploaded Python 3

File details

Details for the file mkado-0.5.0.tar.gz.

File metadata

  • Download URL: mkado-0.5.0.tar.gz
  • Upload date:
  • Size: 64.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mkado-0.5.0.tar.gz
Algorithm Hash digest
SHA256 f49e25911774c6252efe36ecac4f4f701b86d36d35bc51a28ebabc283f263f70
MD5 65d41815b9ef3da0845c20c54283d033
BLAKE2b-256 72d3bc4638dbbaba8e93d45a05f2815a399b10630d01cc30ae0264715693dc89

See more details on using hashes here.

Provenance

The following attestation bundles were made for mkado-0.5.0.tar.gz:

Publisher: publish.yml on kr-colab/mkado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mkado-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: mkado-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 78.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mkado-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a85b32ae96c8b02fca6b9139eb87bd795c25bc9c53c4466b19c2ea265c91ddb8
MD5 1e1ec0f7748b92a93e2eeaaf8d166655
BLAKE2b-256 8af0547e7ef85a8978f24d62b22e172c58f6c4a98f5ef938c71e4d496eadd858

See more details on using hashes here.

Provenance

The following attestation bundles were made for mkado-0.5.0-py3-none-any.whl:

Publisher: publish.yml on kr-colab/mkado

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page