GENBoostGPU provides a scalable framework for running elastic net regression with boosting across thousands of CpG sites, leveraging GPU acceleration with RAPIDS cuML, CuPy, and cuDF.
Project description
GENBoostGPU
Genomic Elastic Net Boosting on GPU (GENBoostGPU)
GENBoostGPU provides a scalable framework for running elastic net regression with
boosting across thousands of CpG sites, leveraging GPU acceleration with RAPIDS cuML,
CuPy, and cuDF.
It supports SNP preprocessing, cis-window filtering, LD clumping, missing data
imputation, and phenotype integration — all optimized for large-scale epigenomics.
Features
- Window-based orchestration:
run_windows_with_daskcoordinates execution across one or more GPUs using Dask.- Handles batch scheduling of thousands of genomic windows.
- Single-window analysis:
run_single_windowexecutes boosting elastic net on one genomic region.- Accepts pre-loaded arrays (CuPy) or file paths (PLINK, phenotype tables).
- GPU-accelerated boosting elastic net:
- Iterative boosting with cuML ElasticNet and final Ridge refit.
- Early stopping based on stability of variance explained.
- Automated SNP preprocessing:
- Zero-variance SNP filtering
- Missing genotype imputation
- LD clumping (PLINK-like) with CuPy
- Cis-window SNP filtering
- Hyperparameter optimization:
- Optuna-based tuning of ElasticNet (
alpha,l1_ratio) - Ridge regression tuning with delayed evaluation
- Optional manual cross-validation for custom grids
- Optuna-based tuning of ElasticNet (
- Scalability:
- Dask orchestration for multiple GPUs (
LocalCUDACluster) - Single-GPU fallback for smaller jobs
- Dask orchestration for multiple GPUs (
- Flexible outputs:
- SNP betas, heritability estimates, variance explained
- Window-level summary tables (
.parquet) - Intermediate ridge/elastic net models for reproducibility
Installation
GENBoostGPU is available on PyPI.
It requires Python ≥3.10 and an NVIDIA GPU with CUDA 12.x.
pip install genboostgpu
For development (from source):
git clone https://github.com/heart-gen/GENBoostGPU.git
cd GENBoostGPU
poetry install
Usage
GENBoostGPU can be used either for large-scale orchestration (many genomic windows across one or more GPUs) or for single-window testing/debugging.
Example 1: Run a Single Window
The simplest entry point is run_single_window, which takes either:
- File paths (PLINK genotypes + phenotype file + phenotype ID), or
- Pre-loaded CuPy arrays for genotypes and phenotypes.
from genboostgpu.vmr_runner import run_single_window
result = run_single_window(
chrom=21,
start=10_000,
end=510_000,
geno_path="data/chr21_subset.bed",
pheno_path="data/phenotypes.tsv",
pheno_id="pheno_379",
outdir="results",
n_iter=50,
n_trials=10
)
print(result)
Output is a Python dictionary, e.g.:
{
"chrom": 21,
"start": 10000,
"end": 510000,
"num_snps": 742,
"final_r2": 0.34,
"h2_unscaled": 0.29,
"n_iter": 37
}
This produces:
- Window-level summary (Python dict)
- Saved results (
.parquet, betas, heritability estimates) inresults/
Example 2: Running on VMR Data
REGION=caudate python examples/vmr_test_caudate.py
Script outline (examples/vmr_test_caudate.py):
from genboostgpu.orchestration import run_windows_with_dask
df = run_windows_with_dask(
windows, error_regions=error_regions,
outdir="results", window_size=500_000,
n_iter=100, n_trials=20, use_window=True,
save=True, prefix="vmr"
)
This runs boosting elastic net across all VMR-defined windows for the chosen region.
Example 3: Running on Simulated Data
NUM_SAMPLES=100 python examples/simu_test_100n.py
Script outline (examples/simu_test_100n.py):
from genboostgpu.orchestration import run_windows_with_dask
df = run_windows_with_dask(
windows, outdir="results", window_size=500_000,
n_iter=100, n_trials=10, use_window=False,
save=True, prefix="simu_100"
)
This runs boosting elastic net across synthetic SNP–phenotype pairs for benchmarking.
GPU Scaling
- On a single GPU: runs without a Dask cluster.
- On multiple GPUs:
run_windows_with_daskautomatically launches aLocalCUDAClusterand distributes windows across devices.
Citation
If you use GENBoostGPU in your research, please cite:
Alexis Bennett and Kynon J.M. Benjamin GENBoostGPU: GPU-accelerated elastic net boosting for large-scale epigenomics DOI: 10.5281/zenodo.17238798
License
GENBoostGPU is licensed under the GPL-3.0 license. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genboostgpu-0.2.0.tar.gz.
File metadata
- Download URL: genboostgpu-0.2.0.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d4909b84e22b92f75b6ea161a39bd2439e6a6702c8c1761c90ac06e02f922e1
|
|
| MD5 |
b116dd4f2c946c8227fc9e81e8c00d24
|
|
| BLAKE2b-256 |
667b85365479e2bab9c3837ab4185485885450d245821ecf8e1cb06e4e8deb15
|
File details
Details for the file genboostgpu-0.2.0-py3-none-any.whl.
File metadata
- Download URL: genboostgpu-0.2.0-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b27b2a44654af12989c0f1a38af446e626e54f5736821980a93b5a6ef8d6d7c4
|
|
| MD5 |
14714e6ce4bced5729ba6b47ed45a2da
|
|
| BLAKE2b-256 |
b64c98f7e5472c8b707ee426921f22f7335de2e925afa474ec0778e31ec3db7c
|