Skip to main content

EM-PCA for inferring population structure in the presence of missingness

Project description

EMU

EMU is a software for performing principal component analysis (PCA) in the presence of missingness for genetic datasets. EMU can handle both random and non-random missingness by modelling it directly through a truncated SVD approach. EMU uses binary PLINK files as input.

Citation

Please cite our paper in Bioinformatics: https://doi.org/10.1093/bioinformatics/btab027

Installation

# Build and install via PyPI
pip install emu-popgen

# Download source and install via pip
git clone https://github.com/Rosemeis/emu.git
cd emu
pip install .

# Download source and install in new Conda environment
git clone https://github.com/Rosemeis/emu.git
conda env create -f environment.yml
conda activate emu

# You can now run the program with the `emu` command

Quick usage

Running EMU

Provide emu with the file prefix of the PLINK files.

# Check help message of the program
emu -h

# Model and extract 2 eigenvectors using the EM-PCA algorithm
emu --bfile test --eig 2 --threads 64 --out test.emu

Memory efficient implementation

A more memory efficient implementation has been added. It is based of the randomized SVD algorithm using custom matrix multiplications that can handle decomposed matrices. Only factor matrices as well as the 2-bit genotype matrix is kept in memory.

# Example run using '--mem' argument
emu --mem --bfile test -eig 2 -threads 64 -out test.emu.mem

Project details


Release history Release notifications | RSS feed

This version

1.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emu_popgen-1.1.tar.gz (329.2 kB view details)

Uploaded Source

Built Distribution

emu_popgen-1.1-cp311-cp311-macosx_11_0_arm64.whl (183.3 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

File details

Details for the file emu_popgen-1.1.tar.gz.

File metadata

  • Download URL: emu_popgen-1.1.tar.gz
  • Upload date:
  • Size: 329.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for emu_popgen-1.1.tar.gz
Algorithm Hash digest
SHA256 1f7c1d611277347a1f019201edc5a1ee0e648567ec37f6060dcb0e73d6913208
MD5 8a2f120047655b193429b99eef6f5608
BLAKE2b-256 cc58f20aa183f54c49ffc095159eab68080cfe68515211392b674de544c07778

See more details on using hashes here.

File details

Details for the file emu_popgen-1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for emu_popgen-1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5308a484be96ec092f2fc665a29059eca7a59f3faca4b5142f7d7e456b177720
MD5 a9671c2be1685d66cd396d6186c75f0a
BLAKE2b-256 bb9f5435519901168fccec5d9b172e7dde18f735162e256005b4e5bb605859d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page