Skip to main content

Fast Python/Cython implementation of the PCAone Halko algorithm

Project description

Cython/Python implementation of Halko algorithm

This is a fast implementation of Halko algorithm in Python/Cython for genotype data. It takes binary PLINK format (*.bed, *.bim, *.fam) as input. For simplicity, mean imputation is performed for missing data.

It is inspired by the lovely PCAone software! Have a look here.

Installation

# Option 1: Build and install via PyPI
pip install halkoSVD

# Option 2: Download source and install via pip
git clone https://github.com/Rosemeis/halkoSVD.git
cd halkoSVD
pip install .

# Option 3: Download source and install in a new Conda environment
git clone https://github.com/Rosemeis/halkoSVD.git
conda env create -f halkoSVD/environment.yml
conda activate halkoSVD

You can now run the program with the halkoSVD command.

Quick usage

Provide halkoSVD with the file prefix of the PLINK files.

# Check help message of the program
halkoSVD -h

# Extract the top 10 PCs
halkoSVD --bfile input --threads 32 --pca 10 --out halko

Options

  • --pcaone, perform fast PCAone block iterations
  • --seed, set random seed for reproducibility (42)
  • --power, specify the number of power iterations (11)
  • --batch, specify the batch size to process SNPs (8192)
  • --loadings, save the SNP loadings
  • --raw, only output eigenvectors without FID/IID

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

halkosvd-0.4.0.tar.gz (166.3 kB view details)

Uploaded Source

File details

Details for the file halkosvd-0.4.0.tar.gz.

File metadata

  • Download URL: halkosvd-0.4.0.tar.gz
  • Upload date:
  • Size: 166.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for halkosvd-0.4.0.tar.gz
Algorithm Hash digest
SHA256 dad902fdb2b16fadff89e94d43de9da102e4e898579ae2c5c50e2eca1cb888eb
MD5 d679cfd803a3b940342483c0cdd89c46
BLAKE2b-256 30c23d2a9fa5b85a01c51a764493e287db2e146a5df7b5ab45673b9ff88e88ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page