Genomic Selection Model Benchmarking CLI for Plant Breeding
Project description
gsbench
Genomic Selection Model Benchmarking CLI for Plant Breeding
gsbench cross-validates genomic selection models on your genotype/phenotype data and produces a comparison report with prediction accuracy, bias diagnostics, and plots — from a single command.
Installation
pip install gsbench
With gradient-boosting models (XGBoost, LightGBM):
pip install gsbench[full]
From source, for development:
git clone https://github.com/josh45-source/gsbench.git
cd gsbench
pip install -e ".[dev]"
Quick Start
gsbench ships with a small simulated example dataset (100 samples x 500 markers, two traits) so you can try it immediately:
# Copy the example genotype/phenotype files to the current directory
gsbench example
# Benchmark models on the example data
gsbench run example_geno.csv example_pheno.csv --trait yield --models GBLUP,BRR,RF --folds 5
This writes gsbench_output/report.html with the full comparison report,
gsbench_output/summary.csv, and diagnostic plots under
gsbench_output/plots/.
CLI Reference
gsbench run
gsbench run GENO PHENO --trait TRAIT [OPTIONS]
| Argument / Option | Default | Description |
|---|---|---|
GENO |
— | Path to the genotype file (CSV/TSV, HapMap, or numeric matrix; format auto-detected) |
PHENO |
— | Path to the phenotype file (CSV/TSV, first column = sample IDs) |
--trait |
— | Phenotype column to benchmark against (required) |
--models |
all |
all, or a comma-separated list of abbreviations, e.g. GBLUP,BRR,RF |
--folds |
5 |
Number of cross-validation folds |
--repeats |
1 |
Number of times to repeat k-fold CV (uses RepeatedKFold when > 1) |
--maf |
0.05 |
Minimum minor allele frequency; markers below this are dropped |
--max-missing |
0.2 |
Maximum per-marker missingness fraction before a marker is dropped |
--impute |
mean |
Missing-genotype imputation: mean or median |
--scale |
center |
Genotype scaling: center, standardize, or none |
--output |
gsbench_output |
Output directory for the report, summary CSV, and plots |
--seed |
42 |
Random seed for cross-validation splits |
--format |
auto |
Genotype format override: auto, csv, tsv, hapmap, numeric |
gsbench list-models
Prints a table of all registered models and whether their dependencies are installed.
gsbench example
gsbench example [--output DIR]
Copies the bundled example genotype/phenotype CSVs into DIR (defaults to
the current directory) and prints the gsbench run command to benchmark
them.
Models
| Abbreviation | Model | Notes |
|---|---|---|
| GBLUP | Genomic BLUP | Kernel ridge regression on the genomic relationship matrix G = ZZ'/p |
| BRR | Bayesian Ridge Regression | sklearn.linear_model.BayesianRidge on marker dosages |
| BL | Bayesian LASSO | sklearn.linear_model.ARDRegression, a sparse approximation of BayesB/BayesC |
| RKHS | RKHS (Gaussian Kernel) | Kernel ridge regression with an RBF kernel; bandwidth chosen by internal CV |
| RF | Random Forest | sklearn.ensemble.RandomForestRegressor (500 trees) |
| XGB | XGBoost | Requires pip install gsbench[full] |
| LGBM | LightGBM | Requires pip install gsbench[full] |
Every model implements the same two-method interface (fit / predict), so
adding a new one is a matter of subclassing gsbench.models.base.GSModel.
Metrics
Each fold reports r (Pearson correlation), r2, rmse, mae, bias,
slope (regression of observed on predicted — should be ~1), spearman
(rank correlation), and nrmse. Breeders care most about r (prediction
accuracy) and spearman (does the model rank genotypes correctly for
selection?).
Example Output
Model comparison (prediction accuracy per model, with fold-to-fold error bars):
Predicted vs. observed phenotypes per model:
The full HTML report also includes a boxplot of per-fold accuracy, a bias diagnostic, a runtime comparison, and per-model detail tables.
Companion Tools
gsbench is part of a small plant-breeding data pipeline:
- brapiR2 — pull data from BrAPI servers
- phenoQC — QC for phenotypic trial data
- vcf2dosage — VCF to dosage matrix conversion
- gsbench — benchmark genomic selection models
Pipeline: retrieve → clean → prepare genotypes → benchmark models
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gsbench-0.1.0.tar.gz.
File metadata
- Download URL: gsbench-0.1.0.tar.gz
- Upload date:
- Size: 41.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bcb8d5ae85d7d1950ff90acf270b3c6c6ff49df80ef378e77d6d4078d69c0ec
|
|
| MD5 |
e769617c8540b0936e1d86c3c5aedb70
|
|
| BLAKE2b-256 |
ae2a61d12f3f8c78225fcdc5dc5453218be0f77125379ec6e4c493fa45ce66b5
|
Provenance
The following attestation bundles were made for gsbench-0.1.0.tar.gz:
Publisher:
publish.yaml on josh45-source/gsbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gsbench-0.1.0.tar.gz -
Subject digest:
9bcb8d5ae85d7d1950ff90acf270b3c6c6ff49df80ef378e77d6d4078d69c0ec - Sigstore transparency entry: 2048188200
- Sigstore integration time:
-
Permalink:
josh45-source/gsbench@9396c8421943b845610b2d26b2fdee681d5f73ae -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/josh45-source
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@9396c8421943b845610b2d26b2fdee681d5f73ae -
Trigger Event:
release
-
Statement type:
File details
Details for the file gsbench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gsbench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a079652bbd4ad47a0b8b41beea284eb22643e49bda9494791ba693259ea9d18
|
|
| MD5 |
5a5bfd05eb84f37062e13903fbd785ab
|
|
| BLAKE2b-256 |
e539c8aaa9c9daa8a8dece6a20eb71b734ff6da69c5996d42abe15ca31c309a7
|
Provenance
The following attestation bundles were made for gsbench-0.1.0-py3-none-any.whl:
Publisher:
publish.yaml on josh45-source/gsbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gsbench-0.1.0-py3-none-any.whl -
Subject digest:
8a079652bbd4ad47a0b8b41beea284eb22643e49bda9494791ba693259ea9d18 - Sigstore transparency entry: 2048188428
- Sigstore integration time:
-
Permalink:
josh45-source/gsbench@9396c8421943b845610b2d26b2fdee681d5f73ae -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/josh45-source
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@9396c8421943b845610b2d26b2fdee681d5f73ae -
Trigger Event:
release
-
Statement type: