EM-PCA for inferring population structure in the presence of missingness
Project description
EMU (v1.2.0)
EMU is a software for performing principal component analysis (PCA) in the presence of missingness for genetic datasets. EMU can handle both random and non-random missingness by modelling it directly through a truncated SVD approach. EMU uses binary PLINK files as input.
Citation
Please cite our paper in Bioinformatics
Installation
# Option 1: Build and install via PyPI
pip install emu-popgen
# Option 2: Download source and install via pip
git clone https://github.com/Rosemeis/emu.git
cd emu
pip install .
# Option 3: Download source and install in a new Conda environment
git clone https://github.com/Rosemeis/emu.git
conda env create -f emu/environment.yml
conda activate emu
You can now run the program with the emu command.
Quick usage
Running EMU
Provide emu with the file prefix of the PLINK files.
# Check help message of the program
emu -h
# Model and extract 2 eigenvectors using the EM-PCA algorithm
emu --bfile test --eig 2 --threads 64 --out test.emu
Memory-efficient variant
Very memory-efficient variant of emu for large-scale datasets.
# Example run using '--mem' argument
emu --mem --bfile test -eig 2 -threads 64 -out test.emu.mem
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
emu_popgen-1.2.0.tar.gz
(326.5 kB
view details)
File details
Details for the file emu_popgen-1.2.0.tar.gz.
File metadata
- Download URL: emu_popgen-1.2.0.tar.gz
- Upload date:
- Size: 326.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd51001be069a301dd9e97eaf4196e8c0e97503b8a32b43bcf9e1bbd809b5d1d
|
|
| MD5 |
09cef6d5ae839458f33d784f8615261f
|
|
| BLAKE2b-256 |
ab7d1b217040212e570d806c7f0b7428d1f5dc1b58ff223f97022ddf5233f7ba
|