Skip to main content

Fast Python/Cython implementation of the PCAone Halko algorithm

Project description

Cython/Python implementation of Halko algorithm (PCAone)

This is a fast implementation of the PCAone Halko algorithm in Python/Cython for genotype data. It takes binary PLINK format (*.bed, *.bim, *.fam) as input. For simplicity, mean imputation is performed for missing data.

It is inspired by the lovely PCAone software! Have a look here.

Installation

# Option 1: Build and install via PyPI
pip install halkoSVD

# Option 2: Download source and install via pip
git clone https://github.com/Rosemeis/halkoSVD.git
cd halkoSVD
pip install .

# Option 3: Download source and install in a new Conda environment
git clone https://github.com/Rosemeis/halkoSVD.git
conda env create -f halkoSVD/environment.yml
conda activate halkoSVD

You can now run the program with the halkoSVD command.

Quick usage

Provide halkoSVD with the file prefix of the PLINK files.

# Check help message of the program
halkoSVD -h

# Extract the top 10 PCs
halkoSVD --bfile input --threads 32 --pca 10 --out halko

Options

  • --pcaone, perform fast PCAone block iterations
  • --seed, set random seed for reproducibility (42)
  • --power, specify the number of power iterations (11)
  • --batch, specify the batch size to process SNPs (8192)
  • --loadings, save the SNP loadings
  • --raw, only output eigenvectors without FID/IID

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

halkosvd-0.3.0.tar.gz (167.4 kB view details)

Uploaded Source

File details

Details for the file halkosvd-0.3.0.tar.gz.

File metadata

  • Download URL: halkosvd-0.3.0.tar.gz
  • Upload date:
  • Size: 167.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for halkosvd-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f7f121e8cc3620b957f1297cc3aa8b71a11bd033d6c67b88007c0b643c34b706
MD5 72ca42295fa3ed9c95e8a2a68b1a3e3c
BLAKE2b-256 c2ba565027d6f30b64658a9672f6d9a927985344fb752e6de3fd916b44c335a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page