Skip to main content

ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Ancestry Inference

Project description

ADAMIXTURE logo

Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering

Python Version PyPI Version License Status Downloads DOI


ADAMIXTURE is an unsupervised global ancestry inference method that scales the ADMIXTURE model to biobank-sized datasets. It combines the Expectation–Maximization (EM) framework with the Adam first-order optimizer, enabling parameter updates after a single EM step. This approach accelerates convergence while maintaining comparable or improved accuracy, substantially reducing runtime on large genotype datasets. For more information, we recommend reading our preprint.

The software can be invoked via CLI and has a similar interface to ADMIXTURE (e.g. the output format is completely interchangeable).

System requirements

Hardware requirements

The successful usage of this package requires a computer with enough RAM to be able to handle the large datasets the network has been designed to work with. Due to this, we recommend using compute clusters whenever available to avoid memory issues.

Software requirements

We recommend creating a fresh Python 3.10+ virtual environment. For a faster installation experience, we highly recommend using uv (or pixi). Alternatively, you can use virtualenv or conda.

[!IMPORTANT]
If you plan to use GPU acceleration, ensure that the CUDA toolkit is correctly loaded (e.g., module load cuda) before starting the installation. This ensures that the dependencies and internal components are correctly configured for your hardware.

As an example, using uv (recommended):

$ uv venv --python 3.10
$ source .venv/bin/activate
$ uv pip install adamixture

Or using virtualenv:

$ virtualenv --python=python3.10 ~/venv/nadmenv
$ source ~/venv/nadmenv/bin/activate
(nadmenv) $ pip install adamixture

[!IMPORTANT] macOS Users: ADAMIXTURE requires OpenMP for parallel processing. You must install libomp (e.g., via Homebrew) before installing the package, otherwise the compilation will fail:

$ brew install libomp

Installation Guide

The package can be easily installed in at most a few minutes using pip (make sure to add the --upgrade flag if updating the version):

(nadmenv) $ pip install adamixture

Running ADAMIXTURE

To train a model, simply invoke the following commands from the root directory of the project. For more info about all the arguments, please run adamixture --help. Note that BED, VCF and PGEN are supported:

[!TIP] GPU Acceleration: Using GPUs greatly speeds up processing and is highly recommended for large datasets. You can specify the hardware to use with the --device parameter:

  • For NVIDIA GPUs, use --device gpu (requires CUDA).
  • For macOS users with Apple Silicon (M1/M2/M3/M4/M5), use --device mps to enable Metal Performance Shaders (MPS) acceleration.
  • Note that biobank-scale datasets are best handled on dedicated CUDA-capable GPUs due to high RAM requirements.

As an example, the following ADMIXTURE call

$ ./admixture snps_data.bed 8 -s 42

would be equivalent in ADAMIXTURE by running

$ adamixture -k 8 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_data -s 42

Two files will be output to the SAVE_PATH directory (the name parameter will be used to create the full filenames):

  • A .P file, similar to ADMIXTURE.
  • A .Q file, similar to ADMIXTURE.

Logs are printed to the stdout channel by default. If you want to save them to a file, you can use the command tee along with a pipe:

$ adamixture -k 8 ... | tee run.log

Running with multi-threading

To run ADAMIXTURE using multiple CPU threads, use the -t flag:

$ adamixture -k 8 --data_path data.bed --save_dir out/ --name test -t 8

Running with GPU acceleration

To leverage GPU acceleration (highly recommended for large datasets), use the --device flag:

  • NVIDIA GPU (CUDA):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device gpu
    
  • macOS Apple Silicon (MPS):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device mps
    

[!NOTE]
Biobank-scale datasets are best handled on dedicated CUDA-capable GPUs.

[!TIP] Biobank-Scale Execution & High K Values: For large-scale datasets (e.g., UK Biobank, All of Us) with high K values, we recommend the following parameter settings for optimal convergence and performance:

--patience_adam 5 \
--lr_decay 0.85 \
--lr 0.0075

Multi-K Sweep

Instead of running ADAMIXTURE for a single K, you can automatically sweep over a range of K values using --min_k and --max_k. The data is loaded once, and each K is trained sequentially:

$ adamixture --min_k 2 --max_k 10 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_sweep

Other options

  • --lr (float, default: 0.005):
    Learning rate used by the Adam optimizer in the EM updates.

  • --min_lr (float, default: 1e-6):
    Minimum learning rate used by the Adam optimizer in the EM updates.

  • --lr_decay (float, default: 0.5):
    Learning rate decay factor.

  • --beta1 (float, default: 0.80):
    Exponential decay rate for the first moment estimates in Adam.

  • --beta2 (float, default: 0.88):
    Exponential decay rate for the second moment estimates in Adam.

  • --reg_adam (float, default: 1e-8):
    Numerical stability constant (epsilon) for the Adam optimizer.

  • --patience_adam (int, default: 2):
    Patience for reducing the learning rate in Adam-EM.

  • --tol_adam (float, default: 0.1):
    Tolerance for stopping the Adam-EM algorithm.

  • --data_path (str, required):
    Path to the genotype data (BED, VCF or PGEN).

  • --save_dir (str, required):
    Directory where the output files will be saved.

  • --name (str, required):
    Experiment/model name used as prefix for output files.

  • --device (str, default: cpu):
    Target hardware for computation. Choices: cpu, gpu (NVIDIA/CUDA), or mps (Apple Metal).

  • -s (int, default: 42):
    Random number generator seed for reproducibility.

  • -k (int):
    Number of ancestral populations (clusters) to infer. Required if --min_k/--max_k are not specified.

  • --min_k (int):
    Minimum K for a multi-K sweep (inclusive). Must be used together with --max_k.

  • --max_k (int):
    Maximum K for a multi-K sweep (inclusive). Must be used together with --min_k.

  • --no_freqs (flag):
    If set, the P (allele frequencies) matrix is not saved to disk. Only the Q (admixture proportions) file will be written.

  • --max_iter (int, default: 1500):
    Maximum number of Adam-EM iterations.

  • --check (int, default: 5):
    Frequency (in iterations) at which the log-likelihood is evaluated.

  • --max_als (int, default: 1000):
    Maximum number of iterations for the ALS solver.

  • --tol_als (float, default: 1e-4):
    Convergence tolerance for the ALS optimization.

  • --power (int, default: 5):
    Number of power iterations used in randomized SVD.

  • --tol_svd (float, default: 1e-1):
    Convergence tolerance for the SVD approximation.

  • --chunk_size (int, default: 4096):
    Number of SNPs in chunk operations for SVD.

  • -t (int, default: 1):
    Number of CPU threads used during execution.

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Troubleshooting

CUDA issues

If you get an error similar to the following when using the GPU:

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Simply installing nvcc using conda or mamba should fix it:

$ conda install -c nvidia nvcc

macOS compilation issues

If you get errors related to OpenMP (OMP) during installation on macOS, ensure you have libomp installed via Homebrew:

$ brew install libomp

Cite

When using this software, please cite the following preprint:

@article{saurina2026adamixture,
  title={ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering},
  author={Saurina-i-Ricos, Joan and Mas Monserrat, Daniel and Ioannidis, Alexander G.},
  journal={bioRxiv},
  year={2026},
  doi={10.64898/2026.02.13.700171},
  url={https://doi.org/10.64898/2026.02.13.700171}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adamixture-1.5.4.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

adamixture-1.5.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (709.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

adamixture-1.5.4-cp312-cp312-macosx_14_0_x86_64.whl (615.9 kB view details)

Uploaded CPython 3.12macOS 14.0+ x86-64

adamixture-1.5.4-cp312-cp312-macosx_14_0_arm64.whl (845.3 kB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

adamixture-1.5.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (711.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

adamixture-1.5.4-cp311-cp311-macosx_14_0_x86_64.whl (610.7 kB view details)

Uploaded CPython 3.11macOS 14.0+ x86-64

adamixture-1.5.4-cp311-cp311-macosx_14_0_arm64.whl (839.7 kB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

adamixture-1.5.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (721.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

adamixture-1.5.4-cp310-cp310-macosx_14_0_x86_64.whl (613.8 kB view details)

Uploaded CPython 3.10macOS 14.0+ x86-64

adamixture-1.5.4-cp310-cp310-macosx_14_0_arm64.whl (842.7 kB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file adamixture-1.5.4.tar.gz.

File metadata

  • Download URL: adamixture-1.5.4.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for adamixture-1.5.4.tar.gz
Algorithm Hash digest
SHA256 19307389bddd950c8ab106bc40a5eb3d39c47d79110cf4b10da1d3fbea3998f7
MD5 472b3163948a9d5c9981af22acd539ff
BLAKE2b-256 41f739bb7bb148262d196fd87e64b2090904b04ef8d6e0408f0c3fb61f5ad4c2

See more details on using hashes here.

File details

Details for the file adamixture-1.5.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 36f1b79ccc56a65a94362a544674b38c12edf785e59131fd35079716b92dfedb
MD5 eac81bb3832affde107b7e41084f67b2
BLAKE2b-256 7b4b607c5b8b8e4ec8b6fa47ea11a1c1f5e03b8fed9631500ad37e3e1dd3a2dd

See more details on using hashes here.

File details

Details for the file adamixture-1.5.4-cp312-cp312-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.4-cp312-cp312-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 a1efe817f0090f4662fd709944f49da1c448569a7d639abf8e37523d83667504
MD5 96b7313b225b936d208e787f4bb98630
BLAKE2b-256 83f904a33c8f32d96825832056708bb122f1bcf693d1fb68e86fdd77939a9671

See more details on using hashes here.

File details

Details for the file adamixture-1.5.4-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.4-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 367536504c77887125b721deccb5574aee8bb3609963f7f762d2982927fd039a
MD5 1aeb1f678f2dd960c10022b71b20cf52
BLAKE2b-256 ae2eeaf9647fe5db9a90c09f6ba313f00bba0b4fbdaf951f9125e99ebee131a5

See more details on using hashes here.

File details

Details for the file adamixture-1.5.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7326ae06d43f375f7f2a1602120e1ff3092351e6e8554becad794252b99eb3d0
MD5 5b0cc51fec6a3b132bae289616d18cbf
BLAKE2b-256 249e2ea8ba8066554977d317a6cef51cf718f4f1f4d798323341eb82f9888bad

See more details on using hashes here.

File details

Details for the file adamixture-1.5.4-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.4-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 5b27521280f03d7db5429084468a2c0e33145b315527a993b595ca96e1d7db80
MD5 e69837271ee3c05352fb6e3e75a7b1e0
BLAKE2b-256 2506cab3106e922cbe4c75ae321c2f4483f8a8dfd43efd67efe8ec96dbe95936

See more details on using hashes here.

File details

Details for the file adamixture-1.5.4-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.4-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 5bd4619dc9a8eaeb3244b74ff4306d49abaac70edbeaacf978a8e90e7789a246
MD5 f2171431aa874fe80ba720851891b2ef
BLAKE2b-256 d241519b29f135f6e321c99169a5ee3e2cc3e1207dc1c4495f5e2df0831835c8

See more details on using hashes here.

File details

Details for the file adamixture-1.5.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 05598bc3b73062dd470640723956dead89caeeef6ccd8b3a77b9e1abb55c5438
MD5 ba21976d520d0557de677e601ad8b5fd
BLAKE2b-256 d28a670c6d8a2efe45cec8d37aae058e55379c197cdd9f8448c22e69531ffd9b

See more details on using hashes here.

File details

Details for the file adamixture-1.5.4-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.4-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 f4e3feb1b9f97191583a0cef8954bc365d8d634051e80c37c25346ee0f362f9c
MD5 294d87a4911bdfd6b34713821afbc167
BLAKE2b-256 461f5ecd6363aa1746663e5aa1b2984acb09eab6a2b778462e42c031ded10845

See more details on using hashes here.

File details

Details for the file adamixture-1.5.4-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.4-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 4129e844e36cf3efd5e950b38f647a666ff73ed082755b60acde92c9cd76210c
MD5 ef809e004689cd0b70bc772de68b34f5
BLAKE2b-256 baf6da5153de3bb05859c1b79c7016b62321a9879576695173c7a1d58709c066

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page