EM-PCA for inferring population structure in the presence of missingness
Project description
EMU (v1.2.1)
EMU is a software for performing principal component analysis (PCA) in the presence of missingness for genetic datasets. EMU can handle both random and non-random missingness by modelling it directly through a truncated SVD approach. EMU uses binary PLINK files as input.
Citation
Please cite our paper in Bioinformatics
Installation
# Option 1: Build and install via PyPI
pip install emu-popgen
# Option 2: Download source and install via pip
git clone https://github.com/Rosemeis/emu.git
cd emu
pip install .
# Option 3: Download source and install in a new Conda environment
git clone https://github.com/Rosemeis/emu.git
conda env create -f emu/environment.yml
conda activate emu
You can now run the program with the emu command.
Quick usage
Running EMU
Provide emu with the file prefix of the PLINK files.
# Check help message of the program
emu -h
# Model and extract 2 eigenvectors using the EM-PCA algorithm
emu --bfile test --eig 2 --threads 64 --out test.emu
Memory-efficient variant
Very memory-efficient variant of emu for large-scale datasets.
# Example run using '--mem' argument
emu --mem --bfile test -eig 2 -threads 64 -out test.emu.mem
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
emu_popgen-1.2.1.tar.gz
(328.0 kB
view details)
File details
Details for the file emu_popgen-1.2.1.tar.gz.
File metadata
- Download URL: emu_popgen-1.2.1.tar.gz
- Upload date:
- Size: 328.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e2d3a670b489dbae2b2fc512ce00646d011f0b44b5cd291bfe933152b3d55f1
|
|
| MD5 |
e458ce9150d05d8e93261931ef9e37dd
|
|
| BLAKE2b-256 |
b80c7a2786178c051f52c39e376ca57a2d34b8160042aa897df7f952941be119
|