Skip to main content

EM-PCA for inferring population structure in the presence of missingness

Project description

EMU (v1.6.0)

EMU is a software for performing principal component analysis (PCA) in the presence of missingness for genetic datasets. EMU can handle both random and non-random missingness by modelling it directly through a truncated SVD approach. EMU uses binary PLINK files as input.

Citation

Please cite our paper in Bioinformatics

Installation

# Option 1: Build and install via PyPI
pip install emu-popgen

# Option 2: Download source and install via pip
git clone https://github.com/Rosemeis/emu.git
cd emu
pip install .

# Option 3: Download source and install in a new Conda environment
git clone https://github.com/Rosemeis/emu.git
conda env create -f emu/environment.yml
conda activate emu

You can now run the program with the emu command.

If you run into issues with your installation on a HPC system, it could be due to a mismatch of CPU architectures between login and compute nodes (illegal instruction). You can try and remove every instance of the march=native compiler flag in the setup.py file which optimizes emu to your specific hardware setup. Another alternative is to use the uv package manager, where you can run emu in a temporary and isolated environment by simply adding uvx in front of the emu command.

# uv tool run example
uvx emu --bfile test --eig 2 --threads 64 --out test.emu

Quick usage

Running EMU

Provide emu with the file prefix of the PLINK files.

# Check help message of the program
emu -h

# Model and extract 2 eigenvectors using the EM-PCA algorithm
emu --bfile test --eig 2 --threads 64 --out test.emu

# Use 2 eigenvectors for modelling but extract 10 eigenvectors
emu --bfile test --eig 2 --eig-out 10 --threads 64 --out test.emu

Memory-efficient variant

Very memory-efficient variant of emu for large-scale datasets.

# Example run using '--mem' argument
emu --mem --bfile test -eig 2 -threads 64 -out test.emu.mem

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emu_popgen-1.6.0.tar.gz (351.8 kB view details)

Uploaded Source

File details

Details for the file emu_popgen-1.6.0.tar.gz.

File metadata

  • Download URL: emu_popgen-1.6.0.tar.gz
  • Upload date:
  • Size: 351.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for emu_popgen-1.6.0.tar.gz
Algorithm Hash digest
SHA256 680f5afeda4e6b8b6fef73667892b21acf3ac47b9306f67c6d159ea4d2064481
MD5 6682828f1b0395d0b5d1929a37924d96
BLAKE2b-256 ff36cdfc9ed0dcee19ed52c6feedc4470ee1551ce59bcd46acf213ca7d99f7be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page