Skip to main content

Python implementation of GAPIT: Genome Association and Prediction Integrated Tool (GLM, MLM, CMLM, MLMM, FarmCPU, BLINK, gBLUP, cBLUP, sBLUP)

Project description

pyGAPIT — Genome Association and Prediction Integrated Tool (Python)

A complete Python reimplementation of the R GAPIT package by Jiabo Wang & Zhiwu Zhang.

Supports all GWAS models (GLM, MLM, CMLM, MLMM, FarmCPU, BLINK) and genomic selection methods (gBLUP, cBLUP, sBLUP) with the same interface as R GAPIT.


Installation

pip install -e .           # from source (this repo)

Dependencies are automatically installed: numpy, scipy, pandas, matplotlib, seaborn, plotly, scikit-learn, joblib, biopython, jinja2.


Quick start

import pandas as pd
from pygapit import GAPIT

# Load data (same format as R GAPIT)
Y  = pd.read_csv("mdp_traits.txt",         sep="\t")  # phenotype
GD = pd.read_csv("mdp_numeric.txt",         sep="\t")  # numeric genotype
GM = pd.read_csv("mdp_SNP_information.txt", sep="\t")  # SNP map

# Run GWAS (BLINK = default, highest power)
result = GAPIT(Y=Y, GD=GD, GM=GM, model="BLINK", PCA_total=3)

print(result.GWAS.head())          # full GWAS results table
print(f"h²    = {result.h2:.3f}")  # heritability
print(f"λ     = {result.lambda_gc:.3f}")  # genomic inflation factor
print(f"QTNs  = {len(result.QTNs)}")     # multi-locus hits

Equivalent R code:

myGAPIT <- GAPIT(Y=myY, GD=myGD, GM=myGM, model="Blink", PCA.total=3)

Input data formats

pyGAPIT accepts the same file formats as R GAPIT:

Phenotype file (Y)

Tab-delimited. First column = Taxa names, remaining columns = trait values.

Taxa    EarHT   dpoll
33-16   64.75   64.5
38-11   69.12   61.0
4226    65.5    59.5

Numeric genotype (GD) + map (GM)

GD: First column = taxa names, remaining = SNP dosages (0/1/2).

taxa        PZB00859.1  PZA01271.1  ...
33-16       2           0           ...
38-11       2           2           ...

GM: Three columns: SNP name, Chromosome, Position (bp).

SNP         Chromosome  Position
PZB00859.1  1           157104
PZA01271.1  1           1947984

HapMap genotype (G)

Standard HapMap format with IUPAC allele codes.

result = GAPIT(Y=Y, G=hapmap_df, model="BLINK")

GWAS models

Model Method type Uses kinship Multi-QTN Power Speed
GLM Single-locus No (PCs) No Low Fastest
MLM Single-locus Yes (global) No Medium Fast
CMLM Single-locus Compressed No Medium+ Fast
MLMM Multi-locus Yes (global) Yes High Moderate
FarmCPU Multi-locus Pseudo-QTN Yes High Moderate
BLINK Multi-locus No Yes Highest Fast
# Run multiple models simultaneously
result = GAPIT(Y=Y, GD=GD, GM=GM,
               model=["GLM", "MLM", "FarmCPU", "BLINK"])
# Returns a dict keyed by "EarHT_GLM", "EarHT_MLM", etc.

Genomic selection

# gBLUP — best for polygenic traits
result = GAPIT(Y=Y, GD=GD, GM=GM, model="gBLUP")

# sBLUP — best for oligogenic traits (uses GWAS-identified QTNs)
result = GAPIT(Y=Y, GD=GD, GM=GM, model="BLINK", buspred=True)

# Access prediction results
print(result.Pred)
#      Taxa    BLUE    BLUP     PEV   gBreedingValue  Prediction
# 0  33-16   67.4   -2.65   89.3      -2.65          64.75

Output files

When file_output=True (default), pyGAPIT writes to output_dir:

File Content
GAPIT.BLINK.EarHT.GWAS.Results.csv Full GWAS table: SNP, Chr, Pos, P.value, maf, effect, FDR
GAPIT.BLINK.EarHT.Prediction.csv BLUE, BLUP, PEV, GEBV per individual
GAPIT.Kinship.csv VanRaden kinship matrix
GAPIT.PCA.csv PC scores per individual
GAPIT.BLINK.EarHT.Manhattan.pdf Manhattan plot
GAPIT.BLINK.EarHT.QQ.pdf QQ plot with λ annotation
GAPIT.Kinship.pdf Kinship heatmap
GAPIT.PCA.pdf 2D PCA scatter

Parameter reference

All R GAPIT parameters are supported with underscores replacing dots:

R parameter Python parameter Default Description
model model "BLINK" Model(s) to run
PCA.total PCA_total 3 Number of PCs as covariates
maf.threshold maf_threshold 0.05 Minimum MAF filter
SNP.impute SNP_impute "middle" Missing genotype imputation
file.output file_output True Write result files
cutOff cutOff Bonferroni Significance threshold
LD LD 0.7 LD threshold for BLINK pruning
group.from group_from 1 Min groups for CMLM
group.to group_to n Max groups for CMLM
bin.size bin_size 5000000 Bin size (bp) for FarmCPU
h2 h2 None Heritability for simulation
NQTN NQTN None QTNs for simulation
buspred buspred False Run GS after GWAS

Command-line interface

# Basic GWAS
pygapit --Y traits.txt --GD geno.txt --GM map.txt --model BLINK

# Multiple models, custom output directory
pygapit --Y traits.txt --GD geno.txt --GM map.txt \
        --model GLM MLM BLINK FarmCPU \
        --PCA_total 5 --output_dir results/

# Genomic prediction
pygapit --Y traits.txt --GD geno.txt --GM map.txt --model gBLUP

# Phenotype simulation
pygapit --Y traits.txt --GD geno.txt --GM map.txt \
        --model BLINK --h2 0.7 --NQTN 20

Using individual functions

from pygapit import (
    vanraden_kinship, compute_pca, build_covariate_matrix,
    emma_remle, bonferroni_threshold, genomic_inflation_factor,
    glm_gwas, mlm_gwas, blink_gwas, farmcpu_gwas,
    gblup, manhattan_plot, qq_plot,
)
import numpy as np

# Compute kinship
K = vanraden_kinship(GD_array)   # (n, n) VanRaden matrix

# PCA for structure control
pca = compute_pca(GD_array, n_components=3)
X0  = build_covariate_matrix(pca, n_pcs=3)

# REML variance components
remle = emma_remle(y, X0, K)
print(f"h² = {remle.h2:.3f}")

# Run BLINK GWAS
result = blink_gwas(y, X0, GD_array, max_iterations=10, ld_threshold=0.7)
lam = genomic_inflation_factor(result.p_values)
thresh = bonferroni_threshold(len(result.p_values))
sig = (result.p_values <= thresh).sum()
print(f"λ = {lam:.3f},  {sig} significant SNPs")

# Genomic prediction
gs = gblup(y, X0, K)
print(f"Prediction accuracy (r): {np.corrcoef(y, gs.prediction)[0,1]:.3f}")

# Plots
manhattan_plot(snp_names, chromosomes, positions, result.p_values,
               save_path="manhattan.pdf")
qq_plot(result.p_values, save_path="qq.pdf")

Mathematical models

Mixed Linear Model (MLM)

y = X·β + u + e
u ~ N(0, K·σ²g),   e ~ N(0, I·σ²e)

Variance components estimated by REML via EMMA (Kang et al. 2008): spectral decomposition of K → grid search + Brent's method for optimal δ = σ²e/σ²g. P3D approximation: δ estimated once from null model, fixed for all m SNP tests.

VanRaden Kinship (2009)

K = ZZ' / [2 · Σⱼ pⱼ(1-pⱼ)]
Z = GD - 1 - P       (centered 0/1/2 coding)
p = allele frequencies

BLINK iteration

Loop until convergence:
  1. GLM-1: sort markers by p-value
             LD-prune candidates (r² > threshold)
             select cofactors by BIC minimization
  2. GLM-2: test all m markers with cofactor set as fixed effects
             → updated p-values

BIC = -2·logL + k·log(n) — replaces expensive REML from FarmCPU.

Henderson's MME (gBLUP)

[X'X        X'Z        ] [β]   [X'y]
[Z'X   Z'Z + δ·K⁻¹     ] [u] = [Z'y]

BLUP = û,   BLUE = X·β̂
PEV  = diag(C⁻¹)ᵤᵤ · σ²g

Citation

If you use pyGAPIT, please also cite the original GAPIT papers:


License

GPL-3.0 — consistent with original R GAPIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygapit-1.1.0.tar.gz (55.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygapit-1.1.0-py3-none-any.whl (49.1 kB view details)

Uploaded Python 3

File details

Details for the file pygapit-1.1.0.tar.gz.

File metadata

  • Download URL: pygapit-1.1.0.tar.gz
  • Upload date:
  • Size: 55.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pygapit-1.1.0.tar.gz
Algorithm Hash digest
SHA256 049ae3c51322c18846fdfe5ea65f47cc99b91e52c1014e33ca5b520a746e943c
MD5 3dec1777e429776572124fbe9c53103d
BLAKE2b-256 0ad760c399ff4335e78f94098d9edd2b8561f7e32f9d592bf61aacea1aad319f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygapit-1.1.0.tar.gz:

Publisher: publish.yml on Lalitgis/pygapit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pygapit-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: pygapit-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 49.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pygapit-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 adfbc94f6b014fab1a4c4d479c06753c70f374b7b4cd4b07ad3a45b8834fd0e4
MD5 436e90dddd9e7d469945131a4240ca28
BLAKE2b-256 0efbcd5f6ed4ba415b38086ea1ecf5e94be8166e9f48d77018de5cb2e5f28bfa

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygapit-1.1.0-py3-none-any.whl:

Publisher: publish.yml on Lalitgis/pygapit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page