Subgenome-aware scalable LMM + DL prior conditional lift for allopolyploid GWAS

These details have not been verified by PyPI

Project links

Project description

HomoeoGWAS

Subgenome-aware mixed-model GWAS for allopolyploid crops, with an optional zero-shot deep-learning prior.

HomoeoGWAS runs GWAS on allopolyploid crops (wheat, cotton, rapeseed, oat, strawberry, …) by modelling each subgenome explicitly. A new species is added through a single YAML config — no framework code changes. The only requirement is that the subgenomes are distinguishable (a homoeologous chromosome naming or a chrom_map); the optional deep-learning prior additionally needs a reference FASTA.

It combines:

Subgenome-partitioned linear mixed model — y = Xβ + u_A + u_B [+ u_D …] + ε, with a per-subgenome GRM fit by REML and an optional leave-one-chromosome-out (LOCO) correction.
Optional homoeolog interaction kernel K_hom = K_A ⊙ K_B [⊙ K_D] for cross-subgenome epistasis.
Optional zero-shot deep-learning prior — PlantCaduceus + AgroNT log-likelihoods fused with the GWAS p-value to re-rank candidate loci.
CPU and dual-GPU backends for the per-SNP scan, scaling to tens of millions of markers.

Quick start

# 1. Install (CPU)
pip install homoeogwas            # or: pip install -e ".[dev]" from a checkout

# 2. Verify the install end-to-end (~2 s): synthesise a tiny dataset + run a fit
homoeogwas demo --keep            # prints acceptance checks + lists the outputs

# 3. Run on your own data
homoeogwas validate -c my_run.yaml    # check config + input paths first
homoeogwas fit -c my_run.yaml -o results/my_run

# (Optional) GPU extras for the per-SNP scan + DL prior
pip install "homoeogwas[gpu]"

See examples/minimal/ for the demo dataset + an annotated config, and the I/O contract for input/output formats. CLI subcommands: fit, validate, demo, split, interact.

Containers

# Docker — CPU (bundles plink2 + bcftools, so split/VCF -> fit all work)
docker build -t homoeogwas:cpu .
docker run --rm homoeogwas:cpu demo
docker run --rm -v "$PWD":/work -w /work homoeogwas:cpu fit -c run.yaml

# Docker — GPU (per-SNP scan + DL prior; CUDA 12.1)
docker build -f Dockerfile.gpu -t homoeogwas:gpu .
docker run --rm --gpus all -v "$PWD":/work -w /work homoeogwas:gpu fit -c run.yaml --backend gpu

# Apptainer / Singularity (HPC, no root) — convert the Docker image
apptainer build homoeogwas.sif docker-daemon://homoeogwas:cpu
apptainer run homoeogwas.sif demo

Pass --build-arg PIP_INDEX_URL=<mirror> to build through a faster pip mirror.

How it works

flowchart LR
    G["VCF / PLINK genotypes"] -->|split| S["per-subgenome<br/>genotype sets"]
    S --> K["per-subgenome GRM<br/>(+ optional K_hom)"]
    K --> R["multi-kernel REML<br/>mixed model"]
    R --> SC["per-SNP scan<br/>(CPU / GPU, LOCO)"]
    SC --> O["sumstats + QQ +<br/>Manhattan + &lambda;_GC"]
    SC -. optional .-> D["DL-prior<br/>re-ranking"]
    D --> RR["re-ranked candidates"]

Adding a new species

Any allopolyploid is supported through configuration alone:

Copy an existing configs/species/*.yaml and edit subgenomes, the chromosome naming / chrom_map, the reference assembly path, and ploidy. The schema in src/homoeogwas/species_config.py validates it.
homoeogwas split --species <yaml> --vcf <in.vcf.gz> --out-dir ... splits the markers into per-subgenome genotype sets. K_hom auto-selects its form for the subgenome count (full Hadamard for 2–3; pairwise-mean for 4+ to stay full-rank).
homoeogwas fit --config <run.yaml> runs the mixed-model scan; the optional DL-prior step additionally needs the species reference FASTA.

No Python is edited at any step. Diploids can run the mixed model, but the homoeolog kernel K_hom is not meaningful for them.

Tested species

The framework has been run end-to-end on five crops spanning ploidy 2n–8n through the same code path; this list is illustrative, not a limit on supported species.

Species	Subgenomes	Reference assembly
Wheat (Triticum aestivum)	AABBDD (6n)	IWGSC RefSeq v1.0
Cotton (Gossypium hirsutum)	AADD (4n)	HBAU NDM8
Rapeseed (Brassica napus)	AACC (4n)	Darmor v4.1
Oat (Avena sativa)	AACCDD (6n)	OT3098 v2
Strawberry (Fragaria × ananassa)	octoploid (4 subgenomes)	NIHHS Seolhyang

Package layout

src/homoeogwas/
├── species_config.py   # config schema (pydantic)
├── species_split.py    # VCF -> per-subgenome genotype splitter
├── grm.py              # per-subgenome and LOCO GRMs
├── kernel.py           # K_pool (additive) and K_hom (homoeolog) kernels
├── lmm.py              # multi-kernel REML mixed model
├── gp.py               # GBLUP prediction + cross-validation
├── scan.py             # per-SNP scan (CPU + dual-GPU, LOCO)
├── diagnostics.py      # lambda_GC, QQ, retained-fraction checks
├── calibration.py      # null-simulation type-I error
├── sim.py              # power-vs-FDR simulation
├── interact.py         # homoeolog-pair interaction scan
├── cli.py              # command-line interface
└── io.py               # genotype I/O

Testing

pytest -m "not gpu and not slow"   # CPU suite (~3-5 min): 287 passed + 1 skipped
pytest -m "not slow"               # + GPU tests (needs torch)
pytest                             # full suite incl. simulation benchmarks

CI runs ruff + the CPU test suite on Python 3.10 / 3.11 / 3.12.

Reproducing the paper

The analysis code, configs, and figure pipeline for the manuscript live under reproducibility/. Large inputs (data/) and intermediate outputs (results/) are not tracked; see reproducibility/paper/ for how to fetch the public datasets and regenerate the figures, and reproducibility/paper/scripts/reproduce_baselines.sh to clone the external benchmark tools.

Status

This is research software released alongside a manuscript in preparation (target Nature Communications). The package and its tests are stable; the biological associations in the paper are the subject of that manuscript and should be cited from it once published.

Citation

@unpublished{homoeogwas2026,
  title  = {HomoeoGWAS: subgenome-aware mixed-model GWAS for allopolyploid crops},
  author = {Yang, Shipeng},
  year   = {2026},
  note   = {Manuscript in preparation},
  url    = {https://github.com/Shipeng-Yang/HomoeoGWAS},
}

See CITATION.cff for machine-readable metadata.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Jun 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

homoeogwas-1.0.1.tar.gz (194.8 kB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

homoeogwas-1.0.1-py3-none-any.whl (143.7 kB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file homoeogwas-1.0.1.tar.gz.

File metadata

Download URL: homoeogwas-1.0.1.tar.gz
Upload date: Jun 12, 2026
Size: 194.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for homoeogwas-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e216e5e3b251ac49e4465bab0f686888826e001c3f4f3189dcaa301e53bcf43b`
MD5	`c5c8757345f776264005140a916a53f0`
BLAKE2b-256	`6d8e56c472742f254ca4de7f9563f4e96eb250b027d7007f4691a84f859ed3a7`

See more details on using hashes here.

File details

Details for the file homoeogwas-1.0.1-py3-none-any.whl.

File metadata

Download URL: homoeogwas-1.0.1-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 143.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for homoeogwas-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`93efff8980eff175f6614e37d26bb7bdf94c242c99e01af160dceff63e62eade`
MD5	`8b5f18f2549c7c296be1d5ead567ed32`
BLAKE2b-256	`30fe20d16849dbbf1dbaa7b31cc0bb2bdb7b3f4322c1ea6246d276f59d052ddb`

See more details on using hashes here.

homoeogwas 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HomoeoGWAS

Quick start

Containers

How it works

Adding a new species

Tested species

Package layout

Testing

Reproducing the paper

Status

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes