Skip to main content

ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Ancestry Inference

Project description

PyPI - Python Version PyPI - Version PyPI - License PyPI - Status PyPI - Downloads DOI

ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering

ADAMIXTURE is an unsupervised global ancestry inference method that scales the ADMIXTURE model to biobank-sized datasets. It combines the Expectation–Maximization (EM) framework with the ADAM first-order optimizer, enabling parameter updates after a single EM step. This approach accelerates convergence while maintaining comparable or improved accuracy, substantially reducing runtime on large genotype datasets. For more information, we recommend reading our pre-print.

The software can be invoked via CLI and has a similar interface to ADMIXTURE (e.g. the output format is completely interchangeable).

nadm_mna

System requirements

Hardware requirements

The successful usage of this package requires a computer with enough RAM to be able to handle the large datasets the network has been designed to work with. Due to this, we recommend using compute clusters whenever available to avoid memory issues.

Software requirements

We recommend creating a fresh Python 3.10 virtual environment using virtualenv (or conda), and then install the package adamixture there. As an example, for virtualenv, one should launch the following commands:

$ virtualenv --python=python3.10 ~/venv/nadmenv
$ source ~/venv/nadmenv/bin/activate
(nadmenv) $ pip install adamixture

[!IMPORTANT] macOS Users: ADAMIXTURE requires OpenMP for parallel processing. You must install libomp (e.g., via Homebrew) before installing the package, otherwise the compilation will fail:

$ brew install libomp

Installation Guide

The package can be easily installed in at most a few minutes using pip (make sure to add the --upgrade flag if updating the version):

(nadmenv) $ pip install adamixture

Running ADAMIXTURE

To train a model, simply invoke the following commands from the root directory of the project. For more info about all the arguments, please run adamixture --help. Note that BED, VCF and PGEN are supported:

[!TIP] GPU Acceleration: Using GPUs greatly speeds up processing and is highly recommended for large datasets. You can specify the hardware to use with the --device parameter:

  • For NVIDIA GPUs, use --device gpu (requires CUDA).
  • For macOS users with Apple Silicon (M1/M2/M3/M4/M5), use --device mps to enable Metal Performance Shaders (MPS) acceleration.
  • Note that biobank-scale datasets are best handled on dedicated CUDA-capable GPUs due to high RAM requirements.

As an example, the following ADMIXTURE call

$ ./admixture snps_data.bed 8 -s 42

would be mimicked in ADAMIXTURE by running

$ adamixture --k 8 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_data --seed 42

Two files will be output to the SAVE_PATH directory (the name parameter will be used to create the whole filenames):

  • A .P file, similar to ADMIXTURE.
  • A .Q file, similar to ADMIXTURE.

Logs are printed to the stdout channel by default. If you want to save them to a file, you can use the command tee along with a pipe:

$ adamixture --k 8 ... | tee run.log

Running with GPU acceleration

To leverage GPU acceleration (highly recommended for large datasets), use the --device flag:

  • NVIDIA GPU (CUDA):
    $ adamixture --k 8 --data_path data.bed --save_dir out/ --name test --device gpu
    
  • macOS Apple Silicon (MPS):
    $ adamixture --k 8 --data_path data.bed --save_dir out/ --name test --device mps
    

[!NOTE]
Biobank-scale datasets are best handled on dedicated CUDA-capable GPUs.

Multi-K Sweep

Instead of running ADAMIXTURE for a single K, you can automatically sweep over a range of K values using --min_k and --max_k. The data is loaded once, and each K is trained sequentially:

$ adamixture --min_k 2 --max_k 10 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_sweep

Other options

  • --lr (float, default: 0.005):
    Learning rate used by the Adam optimizer in the EM updates.

  • --min_lr (float, default: 1e-6):
    Minimum learning rate used by the Adam optimizer in the EM updates.

  • --lr_decay (float, default: 0.5):
    Learning rate decay factor.

  • --beta1 (float, default: 0.80):
    Exponential decay rate for the first moment estimates in Adam.

  • --beta2 (float, default: 0.88):
    Exponential decay rate for the second moment estimates in Adam.

  • --reg_adam (float, default: 1e-8):
    Numerical stability constant (epsilon) for the Adam optimizer.

  • --patience_adam (int, default: 2):
    Patience for reducing the learning rate in Adam-EM.

  • --tol_adam (float, default: 0.1):
    Tolerance for stopping the Adam-EM algorithm.

  • --data_path (str, required):
    Path to the genotype data (BED, VCF or PGEN).

  • --save_dir (str, required):
    Directory where the output files will be saved.

  • --name (str, required):
    Experiment/model name used as prefix for output files.

  • --device (str, default: cpu):
    Target hardware for computation. Choices: cpu, gpu (NVIDIA/CUDA), or mps (Apple Metal).

  • --seed (int, default: 42):
    Random number generator seed for reproducibility.

  • --k (int):
    Number of ancestral populations (clusters) to infer. Required if --min_k/--max_k are not specified.

  • --min_k (int):
    Minimum K for a multi-K sweep (inclusive). Must be used together with --max_k.

  • --max_k (int):
    Maximum K for a multi-K sweep (inclusive). Must be used together with --min_k.

  • --no_freqs (flag):
    If set, the P (allele frequencies) matrix is not saved to disk. Only the Q (admixture proportions) file will be written.

  • --max_iter (int, default: 1500):
    Maximum number of Adam-EM iterations.

  • --check (int, default: 5):
    Frequency (in iterations) at which the log-likelihood is evaluated.

  • --max_als (int, default: 1000):
    Maximum number of iterations for the ALS solver.

  • --tol_als (float, default: 1e-4):
    Convergence tolerance for the ALS optimization.

  • --power (int, default: 5):
    Number of power iterations used in randomized SVD.

  • --tol_svd (float, default: 1e-1):
    Convergence tolerance for the SVD approximation.

  • --chunk_size (int, default: 4096):
    Number of SNPs in chunk operations for SVD.

  • --threads (int, default: 1):
    Number of CPU threads used during execution.

License

NOTICE: This software is available for use free of charge for academic research use only. Academic users may fork this repository and modify and improve to suit their research needs, but also inherit these terms and must include a licensing notice to that effect. Commercial users, for profit companies or consultants, and non-profit institutions not qualifying as "academic research" should contact the authors for a separate license. This applies to this repository directly and any other repository that includes source, executables, or git commands that pull/clone this repository as part of its function. Such repositories, whether ours or others, must include this notice.

Troubleshooting

CUDA issues

If you get an error similar to the following when using the GPU:

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Simply installing nvcc using conda or mamba should fix it:

$ conda install -c nvidia nvcc

macOS compilation issues

If you get errors related to OpenMP (OMP) during installation on macOS, ensure you have libomp installed via Homebrew:

$ brew install libomp

Cite

When using this software, please cite the following pre-print:

@article{saurina2026adamixture,
  title={ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering},
  author={Saurina-i-Ricos, Joan and Mas Monserrat, Daniel and Ioannidis, Alexander G.},
  journal={bioRxiv},
  year={2026},
  doi={10.64898/2026.02.13.700171},
  url={https://doi.org/10.64898/2026.02.13.700171}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adamixture-1.5.1.tar.gz (3.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

adamixture-1.5.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (658.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

adamixture-1.5.1-cp312-cp312-macosx_14_0_x86_64.whl (566.3 kB view details)

Uploaded CPython 3.12macOS 14.0+ x86-64

adamixture-1.5.1-cp312-cp312-macosx_14_0_arm64.whl (795.4 kB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

adamixture-1.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (659.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

adamixture-1.5.1-cp311-cp311-macosx_14_0_x86_64.whl (561.1 kB view details)

Uploaded CPython 3.11macOS 14.0+ x86-64

adamixture-1.5.1-cp311-cp311-macosx_14_0_arm64.whl (791.1 kB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

adamixture-1.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (667.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

adamixture-1.5.1-cp310-cp310-macosx_14_0_x86_64.whl (563.4 kB view details)

Uploaded CPython 3.10macOS 14.0+ x86-64

adamixture-1.5.1-cp310-cp310-macosx_14_0_arm64.whl (794.2 kB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file adamixture-1.5.1.tar.gz.

File metadata

  • Download URL: adamixture-1.5.1.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for adamixture-1.5.1.tar.gz
Algorithm Hash digest
SHA256 d22f516ddbc5d17e36a66a74ab13aef443c13e8ee66559c883092cdca9ad0b38
MD5 ca318eb60167a8947366977ce79d7771
BLAKE2b-256 27c5ceb54b4bd452be9709e36913323c82aaa0536c334c503d63937b073479e2

See more details on using hashes here.

File details

Details for the file adamixture-1.5.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5e7fe7f03c279ddd43a67f68a50ba02aaff1fe373ea038076bfd45de4fba869d
MD5 f74235eba840c9a8fd14b38304645957
BLAKE2b-256 e417a48ba05674ed24b96346082499eedd48eb0e90e245daea483105c7b2ae17

See more details on using hashes here.

File details

Details for the file adamixture-1.5.1-cp312-cp312-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.1-cp312-cp312-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 add4a6c3569ba5156043f75ad0b6fa9fcb6d3936b2cafe444198b991d417104a
MD5 784c749c6285c0fc68dee3dcf2d11255
BLAKE2b-256 6a8641335a69b27f7094d15a2645a4ee705bd727bb134000e348b1b04a4c2564

See more details on using hashes here.

File details

Details for the file adamixture-1.5.1-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.1-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 5f1504ce837b5a8eb0f692fbb2fcc203695268e234c56abc21a519c0e726e31d
MD5 3269df3500e910feaf8c2dc67fd24fab
BLAKE2b-256 cf8b841f300628475fd34ae67f2e592050d6200ea45961dcfd6302170faaf3d8

See more details on using hashes here.

File details

Details for the file adamixture-1.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fdd6cfda171e599960a5472c465806ff5e765439a57055680e15d71b688d457d
MD5 4ff22e7b57432bc91309fc940e3c9a61
BLAKE2b-256 89c54f221b5b9ffa46e49ff38d13d479feff182120291bb2e6c99f74508b3544

See more details on using hashes here.

File details

Details for the file adamixture-1.5.1-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.1-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 f3061a3621bef97091679cb4eebb43ba254729d29b81a5c29a6de2355d0b99d3
MD5 555ec10fac7942b8ffb8b3824ea52858
BLAKE2b-256 228c18e0e0e0a804d653c43bff0864577e9e9085ae9ad5fb10b7dc6c0a537a48

See more details on using hashes here.

File details

Details for the file adamixture-1.5.1-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.1-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 12e96861705db704c4d3bbdbf7930770c603ee0a82f355cc0fae646081fec8e8
MD5 4be7811cd317bcc59e862f5e9f29da38
BLAKE2b-256 3cd98aea3185b2f6463c5b5e436d69d3e0fdde63b97043981139521af9ff47f0

See more details on using hashes here.

File details

Details for the file adamixture-1.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ac1d2217150d9881c0f40d0a9d711f0b5403beee55d8bc2c4817809c149a683f
MD5 d9757fa008d96d9e95943bac75ffd3c7
BLAKE2b-256 404085e9fa0748be363ca6f4b8c3eff5719c21535f950eb0b0d8481623ac2b11

See more details on using hashes here.

File details

Details for the file adamixture-1.5.1-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.1-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 c2ecdd35cd885b0069cb9e676e8e490c89b2e7b0cf9e64da7f9dc27f5203f049
MD5 904d86e8714d7992cf8a4604041907e0
BLAKE2b-256 d94a52054244716235d7c5cbac3878ed1af3530a3cc4dacb41debf0b11768a05

See more details on using hashes here.

File details

Details for the file adamixture-1.5.1-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.1-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 c830dec74e7853b6ae11190941d4f18fc278deb9afb2b1c6a1fa2295eb96931b
MD5 c0659fe1a990716fe2beea45e2f1246e
BLAKE2b-256 c3be1dbfdbadb743ff18f89f23ed924a5680f5cc4c893dd18fbf090c2f174af5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page