Skip to main content

Genomic Selection Model Benchmarking CLI for Plant Breeding

Project description

gsbench

Genomic Selection Model Benchmarking CLI for Plant Breeding

Tests Python 3.9+ License: MIT

gsbench cross-validates genomic selection models on your genotype/phenotype data and produces a comparison report with prediction accuracy, bias diagnostics, and plots — from a single command.

Installation

pip install gsbench

With gradient-boosting models (XGBoost, LightGBM):

pip install gsbench[full]

From source, for development:

git clone https://github.com/josh45-source/gsbench.git
cd gsbench
pip install -e ".[dev]"

Quick Start

gsbench ships with a small simulated example dataset (100 samples x 500 markers, two traits) so you can try it immediately:

# Copy the example genotype/phenotype files to the current directory
gsbench example

# Benchmark models on the example data
gsbench run example_geno.csv example_pheno.csv --trait yield --models GBLUP,BRR,RF --folds 5

This writes gsbench_output/report.html with the full comparison report, gsbench_output/summary.csv, and diagnostic plots under gsbench_output/plots/.

CLI Reference

gsbench run

gsbench run GENO PHENO --trait TRAIT [OPTIONS]
Argument / Option Default Description
GENO Path to the genotype file (CSV/TSV, HapMap, or numeric matrix; format auto-detected)
PHENO Path to the phenotype file (CSV/TSV, first column = sample IDs)
--trait Phenotype column to benchmark against (required)
--models all all, or a comma-separated list of abbreviations, e.g. GBLUP,BRR,RF
--folds 5 Number of cross-validation folds
--repeats 1 Number of times to repeat k-fold CV (uses RepeatedKFold when > 1)
--maf 0.05 Minimum minor allele frequency; markers below this are dropped
--max-missing 0.2 Maximum per-marker missingness fraction before a marker is dropped
--impute mean Missing-genotype imputation: mean or median
--scale center Genotype scaling: center, standardize, or none
--output gsbench_output Output directory for the report, summary CSV, and plots
--seed 42 Random seed for cross-validation splits
--format auto Genotype format override: auto, csv, tsv, hapmap, numeric

gsbench list-models

Prints a table of all registered models and whether their dependencies are installed.

gsbench example

gsbench example [--output DIR]

Copies the bundled example genotype/phenotype CSVs into DIR (defaults to the current directory) and prints the gsbench run command to benchmark them.

Models

Abbreviation Model Notes
GBLUP Genomic BLUP Kernel ridge regression on the genomic relationship matrix G = ZZ'/p
BRR Bayesian Ridge Regression sklearn.linear_model.BayesianRidge on marker dosages
BL Bayesian LASSO sklearn.linear_model.ARDRegression, a sparse approximation of BayesB/BayesC
RKHS RKHS (Gaussian Kernel) Kernel ridge regression with an RBF kernel; bandwidth chosen by internal CV
RF Random Forest sklearn.ensemble.RandomForestRegressor (500 trees)
XGB XGBoost Requires pip install gsbench[full]
LGBM LightGBM Requires pip install gsbench[full]

Every model implements the same two-method interface (fit / predict), so adding a new one is a matter of subclassing gsbench.models.base.GSModel.

Metrics

Each fold reports r (Pearson correlation), r2, rmse, mae, bias, slope (regression of observed on predicted — should be ~1), spearman (rank correlation), and nrmse. Breeders care most about r (prediction accuracy) and spearman (does the model rank genotypes correctly for selection?).

Example Output

Model comparison (prediction accuracy per model, with fold-to-fold error bars):

Model comparison barplot

Predicted vs. observed phenotypes per model:

Predicted vs observed

The full HTML report also includes a boxplot of per-fold accuracy, a bias diagnostic, a runtime comparison, and per-model detail tables.

Companion Tools

gsbench is part of a small plant-breeding data pipeline:

  • brapiR2 — pull data from BrAPI servers
  • phenoQC — QC for phenotypic trial data
  • vcf2dosage — VCF to dosage matrix conversion
  • gsbench — benchmark genomic selection models

Pipeline: retrievecleanprepare genotypesbenchmark models

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gsbench-0.1.0.tar.gz (41.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gsbench-0.1.0-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file gsbench-0.1.0.tar.gz.

File metadata

  • Download URL: gsbench-0.1.0.tar.gz
  • Upload date:
  • Size: 41.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gsbench-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9bcb8d5ae85d7d1950ff90acf270b3c6c6ff49df80ef378e77d6d4078d69c0ec
MD5 e769617c8540b0936e1d86c3c5aedb70
BLAKE2b-256 ae2a61d12f3f8c78225fcdc5dc5453218be0f77125379ec6e4c493fa45ce66b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for gsbench-0.1.0.tar.gz:

Publisher: publish.yaml on josh45-source/gsbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gsbench-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gsbench-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gsbench-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a079652bbd4ad47a0b8b41beea284eb22643e49bda9494791ba693259ea9d18
MD5 5a5bfd05eb84f37062e13903fbd785ab
BLAKE2b-256 e539c8aaa9c9daa8a8dece6a20eb71b734ff6da69c5996d42abe15ca31c309a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for gsbench-0.1.0-py3-none-any.whl:

Publisher: publish.yaml on josh45-source/gsbench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page