GPU-accelerated population genetics statistics
Project description
pg_gpu
GPU-accelerated population genetics statistics using CuPy.
Installation
pg_gpu requires a Linux x86_64 machine with an NVIDIA GPU and a CUDA 12 driver.
Nothing else is needed -- the full GPU runtime, including the CUDA toolkit
headers cupy uses to JIT-compile its kernels, is pulled from PyPI via the
cupy-cuda12x[ctk] dependency.
With pixi (recommended)
The pinned, reproducible environment is managed with pixi and is the recommended way to install pg_gpu:
pixi install
pixi shell
Into an existing conda / venv environment
To use pg_gpu from your own workflow (Snakemake, Jupyter, an existing conda env), install it with pip:
pip install "git+https://github.com/kr-colab/pg_gpu"
This pulls the full runtime stack (cupy-cuda12x with toolkit headers, bio2zarr,
kvikio, nvcomp) as declared in pyproject.toml. For development against a local
checkout, use an editable install:
pip install -e ".[dev]"
Quick Start
from pg_gpu import HaplotypeMatrix, diversity, divergence, selection, sfs
# Load from VCF
hm = HaplotypeMatrix.from_vcf("data.vcf.gz", region="chr1:1-1000000")
hm.load_pop_file("populations.txt")
# Diversity
diversity.pi(hm, population="pop1")
diversity.tajimas_d(hm, population="pop1")
# Divergence
divergence.fst_hudson(hm, "pop1", "pop2")
divergence.dxy(hm, "pop1", "pop2")
# Selection scans
selection.ihs(hm)
selection.nsl(hm)
# Windowed statistics (fused CUDA kernels)
from pg_gpu import windowed_analysis
results = windowed_analysis(hm, statistics=["pi", "theta_w", "tajimas_d"],
window_size=50000)
Documentation
Full documentation at https://pg-gpu.readthedocs.io/.
Interactive walkthrough: examples/pg_gpu_tour.ipynb.
Statistics
| Category | Functions |
|---|---|
| Diversity | pi, theta_w, theta_h, theta_l, tajimas_d, fay_wus_h, normalized_fay_wus_h, zeng_e, zeng_dh, segregating_sites, singleton_count, haplotype_diversity, haplotype_count, heterozygosity_expected, heterozygosity_observed, inbreeding_coefficient, allele_frequency_spectrum, max_daf, daf_histogram, diplotype_frequency_spectrum, diversity_stats |
| Divergence | fst_hudson, fst_weir_cockerham, fst_nei, dxy, da, pbs, pairwise_fst |
| Distance-based two-pop | snn, dxy_min, gmin, dd, dd_rank, zx |
| Distance moments | pairwise_diffs, dist_var, dist_skew, dist_kurt, dist_moments |
| Selection scans | ihs, nsl, xpehh, xpnsl, garud_h, moving_garud_h, ehh_decay |
| LD | r, r_squared, dd (LD), dz, pi2, zns, omega, mu_ld |
| SFS | sfs, sfs_folded, sfs_scaled, sfs_folded_scaled, joint_sfs, joint_sfs_folded, joint_sfs_scaled, joint_sfs_folded_scaled, project_joint_sfs, fold_sfs, fold_joint_sfs |
| Admixture / F-stats | patterson_f2, patterson_f3, patterson_d, moving_patterson_f3, moving_patterson_d, average_patterson_f3, average_patterson_d |
| Resampling | block_jackknife, block_bootstrap |
| Decomposition | pca, randomized_pca, pairwise_distance, pcoa, local_pca, local_pca_jackknife, pc_dist, corners |
| Relatedness | grm, ibs |
| Windowed pipeline | windowed_analysis — fused GPU windowing for any of the above |
| Biobank-scale streaming | HaplotypeMatrix.from_zarr(streaming='always') walks VCZ stores chunk by chunk; every per-window / SFS / LD / pairwise relatedness statistic dispatches transparently. See tutorials/biobank_streaming. |
Development
pixi run pytest tests/
pixi run -e lint ruff check pg_gpu/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pg_gpu-0.1.0.tar.gz.
File metadata
- Download URL: pg_gpu-0.1.0.tar.gz
- Upload date:
- Size: 410.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1cfb2b3aa84b8a2cbb655410d631757d4a1edf91cc610e9f2c82829945baa9f
|
|
| MD5 |
d0caaff152ff892fd466e5f08b1d36f5
|
|
| BLAKE2b-256 |
f722c04ba9862cf3b6051aa50fe8409834d3b736ca9eeafae963c71232f627f3
|
Provenance
The following attestation bundles were made for pg_gpu-0.1.0.tar.gz:
Publisher:
publish.yml on kr-colab/pg_gpu
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pg_gpu-0.1.0.tar.gz -
Subject digest:
a1cfb2b3aa84b8a2cbb655410d631757d4a1edf91cc610e9f2c82829945baa9f - Sigstore transparency entry: 1758107980
- Sigstore integration time:
-
Permalink:
kr-colab/pg_gpu@e2072b3d77834749f530f32fb5242eb501619334 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kr-colab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e2072b3d77834749f530f32fb5242eb501619334 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pg_gpu-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pg_gpu-0.1.0-py3-none-any.whl
- Upload date:
- Size: 212.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ea724445b9b300dad06d71374cc428033c76206be94ac4db5527ff6ad07ab29
|
|
| MD5 |
e3b00e0edcc71fb057b5e99ae6e2a7aa
|
|
| BLAKE2b-256 |
f457b648452074e73cb0fe8c0cbaef32633bbe95481cf1a47bf9c29d971d10aa
|
Provenance
The following attestation bundles were made for pg_gpu-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on kr-colab/pg_gpu
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pg_gpu-0.1.0-py3-none-any.whl -
Subject digest:
0ea724445b9b300dad06d71374cc428033c76206be94ac4db5527ff6ad07ab29 - Sigstore transparency entry: 1758108151
- Sigstore integration time:
-
Permalink:
kr-colab/pg_gpu@e2072b3d77834749f530f32fb5242eb501619334 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kr-colab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e2072b3d77834749f530f32fb5242eb501619334 -
Trigger Event:
push
-
Statement type: