Skip to main content

Fast Ancestry Estimation

Project description

fastmixture (v0.93.1)

fastmixture is a new software for estimating ancestry proportions in unrelated individuals. It is significantly faster than previous model-based software while providing accurate and robust ancestry estimates.

Table of Contents

Installation

# Build and install via PyPI
pip install fastmixture

# or download source and install via pip
git clone https://github.com/Rosemeis/fastmixture.git
cd fastmixture
pip install .

# or download source and install in new Conda environment
git clone https://github.com/Rosemeis/fastmixture.git
conda env create -f environment.yml
conda activate fastmixture


# You can now run the program with the `fastmixture` command

Citation

Please cite our preprint on BioRxiv.

Usage

fastmixture requires input data in binary PLINK format.

  • Choose the value of K that best fits your data. We recommend performing principal component analysis (PCA) first as an exploratory analysis before running fastmixture.
  • Use multiple seeds for your analysis to ensure robust and reliable results (e.g. ≥ 5).
# Using binary PLINK files for K=3
fastmixture --bfile data --K 3 --threads 32 --seed 1 --out test

# Outputs Q and P files (test.K3.s1.Q and test.K3.s1.P)

Supervised

A supervised mode is available in fastmixture using --supervised. Provide a file of population assignments for individuals as integers in a single column file. Unknown or admixed individuals must be given a value of '0'.

# Using binary PLINK files for K=3
fastmixture --bfile data --K 3 --threads 32 --seed 1 --out super.test --supervised data.labels

# Outputs Q and P files (super.K3.s1.Q and super.K3.s1.P)

Extra options

  • --iter, specify maximum number of iterations for EM algorithm (1000)
  • --tole, specify tolerance for convergence in EM algorithm (0.5)
  • --batches, specify number of initial mini-batches (32)
  • --check, specify number of iterations performed before convergence check (5)
  • --power, specify number of power iterations in SVD (11)
  • --chunk, number of SNPs to process at a time in randomized SVD (8192)
  • --als-iter, specify maximum number of iterations in ALS procedure (1000)
  • --als-tole, specify tolerance for convergence in ALS procedure (1e-4)
  • --no-freqs, do not save ancestral allele frequencies (P-matrix)
  • --random-init, random parameter initialization instead of SVD
  • --safety, only perform safety updates

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details

Authors and Acknowledgements

  • Jonas Meisner, Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen
  • Cindy Santander, Computational and RNA Biology, University of Copenhagen
  • Alba Refoyo Martinez, Center for Health Data Science, University of Copenhagen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastmixture-0.93.2.tar.gz (492.1 kB view details)

Uploaded Source

Built Distribution

fastmixture-0.93.2-cp311-cp311-macosx_11_0_arm64.whl (269.2 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

File details

Details for the file fastmixture-0.93.2.tar.gz.

File metadata

  • Download URL: fastmixture-0.93.2.tar.gz
  • Upload date:
  • Size: 492.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.8

File hashes

Hashes for fastmixture-0.93.2.tar.gz
Algorithm Hash digest
SHA256 f444b7d4c32a2c0ceb30de6adb5c7d222fc54e8daa76f3a5523b2d3afe4647ac
MD5 a426f656b2cdb0066668c4e95c79d2a0
BLAKE2b-256 210aa616027b0ff697d2dbf68fbf9eca5ae0efca71aa1b185590adf06c6408ea

See more details on using hashes here.

File details

Details for the file fastmixture-0.93.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fastmixture-0.93.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1e6a6f9bb0a83c9b6c32903c0be9af05fbf8fcbae38ae845821c1982c440bec0
MD5 a8c17659e0e1324d718c7e128ac2f0ce
BLAKE2b-256 ded651702df28b896c58515161ce54906484e1e63a2f72623fae73948a34a85e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page