Skip to main content

GPU-accelerated population genetics statistics

Project description

pg_gpu

GPU-accelerated population genetics statistics using CuPy.

Documentation Status

Installation

pg_gpu requires a Linux x86_64 machine with an NVIDIA GPU and a CUDA 12 driver. Nothing else is needed -- the full GPU runtime, including the CUDA toolkit headers cupy uses to JIT-compile its kernels, is pulled from PyPI via the cupy-cuda12x[ctk] dependency.

With pixi (recommended)

The pinned, reproducible environment is managed with pixi and is the recommended way to install pg_gpu:

pixi install
pixi shell

Into an existing conda / venv environment

To use pg_gpu from your own workflow (Snakemake, Jupyter, an existing conda env), install it with pip:

pip install "git+https://github.com/kr-colab/pg_gpu"

This pulls the full runtime stack (cupy-cuda12x with toolkit headers, bio2zarr, kvikio, nvcomp) as declared in pyproject.toml. For development against a local checkout, use an editable install:

pip install -e ".[dev]"

Quick Start

from pg_gpu import HaplotypeMatrix, diversity, divergence, selection, sfs

# Load from VCF
hm = HaplotypeMatrix.from_vcf("data.vcf.gz", region="chr1:1-1000000")
hm.load_pop_file("populations.txt")

# Diversity
diversity.pi(hm, population="pop1")
diversity.tajimas_d(hm, population="pop1")

# Divergence
divergence.fst_hudson(hm, "pop1", "pop2")
divergence.dxy(hm, "pop1", "pop2")

# Selection scans
selection.ihs(hm)
selection.nsl(hm)

# Windowed statistics (fused CUDA kernels)
from pg_gpu import windowed_analysis
results = windowed_analysis(hm, statistics=["pi", "theta_w", "tajimas_d"],
                            window_size=50000)

Documentation

Full documentation at https://pg-gpu.readthedocs.io/.

Interactive walkthrough: examples/pg_gpu_tour.ipynb.

Statistics

Category Functions
Diversity pi, theta_w, theta_h, theta_l, tajimas_d, fay_wus_h, normalized_fay_wus_h, zeng_e, zeng_dh, segregating_sites, singleton_count, haplotype_diversity, haplotype_count, heterozygosity_expected, heterozygosity_observed, inbreeding_coefficient, allele_frequency_spectrum, max_daf, daf_histogram, diplotype_frequency_spectrum, diversity_stats
Divergence fst_hudson, fst_weir_cockerham, fst_nei, dxy, da, pbs, pairwise_fst
Distance-based two-pop snn, dxy_min, gmin, dd, dd_rank, zx
Distance moments pairwise_diffs, dist_var, dist_skew, dist_kurt, dist_moments
Selection scans ihs, nsl, xpehh, xpnsl, garud_h, moving_garud_h, ehh_decay
LD r, r_squared, dd (LD), dz, pi2, zns, omega, mu_ld
SFS sfs, sfs_folded, sfs_scaled, sfs_folded_scaled, joint_sfs, joint_sfs_folded, joint_sfs_scaled, joint_sfs_folded_scaled, project_joint_sfs, fold_sfs, fold_joint_sfs
Admixture / F-stats patterson_f2, patterson_f3, patterson_d, moving_patterson_f3, moving_patterson_d, average_patterson_f3, average_patterson_d
Resampling block_jackknife, block_bootstrap
Decomposition pca, randomized_pca, pairwise_distance, pcoa, local_pca, local_pca_jackknife, pc_dist, corners
Relatedness grm, ibs
Windowed pipeline windowed_analysis — fused GPU windowing for any of the above
Biobank-scale streaming HaplotypeMatrix.from_zarr(streaming='always') walks VCZ stores chunk by chunk; every per-window / SFS / LD / pairwise relatedness statistic dispatches transparently. See tutorials/biobank_streaming.

Development

pixi run pytest tests/
pixi run -e lint ruff check pg_gpu/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_gpu-0.1.0.tar.gz (410.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pg_gpu-0.1.0-py3-none-any.whl (212.7 kB view details)

Uploaded Python 3

File details

Details for the file pg_gpu-0.1.0.tar.gz.

File metadata

  • Download URL: pg_gpu-0.1.0.tar.gz
  • Upload date:
  • Size: 410.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pg_gpu-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a1cfb2b3aa84b8a2cbb655410d631757d4a1edf91cc610e9f2c82829945baa9f
MD5 d0caaff152ff892fd466e5f08b1d36f5
BLAKE2b-256 f722c04ba9862cf3b6051aa50fe8409834d3b736ca9eeafae963c71232f627f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for pg_gpu-0.1.0.tar.gz:

Publisher: publish.yml on kr-colab/pg_gpu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pg_gpu-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pg_gpu-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 212.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pg_gpu-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0ea724445b9b300dad06d71374cc428033c76206be94ac4db5527ff6ad07ab29
MD5 e3b00e0edcc71fb057b5e99ae6e2a7aa
BLAKE2b-256 f457b648452074e73cb0fe8c0cbaef32633bbe95481cf1a47bf9c29d971d10aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for pg_gpu-0.1.0-py3-none-any.whl:

Publisher: publish.yml on kr-colab/pg_gpu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page