Skip to main content

ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Ancestry Inference

Project description

ADAMIXTURE logo

Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering

Python Version PyPI Version License Status Downloads DOI


ADAMIXTURE is an unsupervised global ancestry inference method that scales the ADMIXTURE model to biobank-sized datasets. It combines the Expectation–Maximization (EM) framework with the Adam first-order optimizer, enabling parameter updates after a single EM step. This approach accelerates convergence while maintaining comparable or improved accuracy, substantially reducing runtime on large genotype datasets. For more information, we recommend reading our preprint.

The software can be invoked via CLI and has a similar interface to ADMIXTURE (e.g. the output format is completely interchangeable).

System requirements

Hardware requirements

The successful usage of this package requires a computer with enough RAM to be able to handle the large datasets the network has been designed to work with. Due to this, we recommend using compute clusters whenever available to avoid memory issues.

Software requirements

We recommend creating a fresh Python 3.10+ virtual environment. For a faster installation experience, we highly recommend using uv (or pixi). Alternatively, you can use virtualenv or conda.

[!IMPORTANT]
If you plan to use GPU acceleration, ensure that the CUDA toolkit is correctly loaded (e.g., module load cuda) before starting the installation. This ensures that the dependencies and internal components are correctly configured for your hardware.

As an example, using uv (recommended):

$ uv venv --python 3.10
$ source .venv/bin/activate
$ uv pip install adamixture

Or using virtualenv:

$ virtualenv --python=python3.10 ~/venv/nadmenv
$ source ~/venv/nadmenv/bin/activate
(nadmenv) $ pip install adamixture

[!IMPORTANT] macOS Users: ADAMIXTURE requires OpenMP for parallel processing. You must install libomp (e.g., via Homebrew) before installing the package, otherwise the compilation will fail:

$ brew install libomp

Installation Guide

The package can be easily installed in at most a few minutes using pip (make sure to add the --upgrade flag if updating the version):

(nadmenv) $ pip install adamixture

Running ADAMIXTURE

To train a model, simply invoke the following commands from the root directory of the project. For more info about all the arguments, please run adamixture --help. Note that BED, VCF and PGEN are supported:

[!TIP] GPU Acceleration: Using GPUs greatly speeds up processing and is highly recommended for large datasets. You can specify the hardware to use with the --device parameter:

  • For NVIDIA GPUs, use --device gpu (requires CUDA).
  • For macOS users with Apple Silicon (M1/M2/M3/M4/M5), use --device mps to enable Metal Performance Shaders (MPS) acceleration.
  • Note that biobank-scale datasets are best handled on dedicated CUDA-capable GPUs due to high RAM requirements.

As an example, the following ADMIXTURE call

$ ./admixture snps_data.bed 8 -s 42

would be equivalent in ADAMIXTURE by running

$ adamixture -k 8 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_data -s 42

Two files will be output to the SAVE_PATH directory (the name parameter will be used to create the full filenames):

  • A .P file, similar to ADMIXTURE.
  • A .Q file, similar to ADMIXTURE.

Logs are printed to the stdout channel by default. If you want to save them to a file, you can use the command tee along with a pipe:

$ adamixture -k 8 ... | tee run.log

Running with multi-threading

To run ADAMIXTURE using multiple CPU threads, use the -t flag:

$ adamixture -k 8 --data_path data.bed --save_dir out/ --name test -t 8

Running with GPU acceleration

To leverage GPU acceleration (highly recommended for large datasets), use the --device flag:

  • NVIDIA GPU (CUDA):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device gpu
    
  • macOS Apple Silicon (MPS):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device mps
    

[!NOTE]
Biobank-scale datasets are best handled on dedicated CUDA-capable GPUs.

[!TIP] Biobank-Scale Execution & High K Values: For large-scale datasets (e.g., UK Biobank, All of Us) with high K values, we recommend the following parameter settings for optimal convergence and performance:

--patience_adam 5 \
--lr_decay 0.85 \
--lr 0.0075

Multi-K Sweep

Instead of running ADAMIXTURE for a single K, you can automatically sweep over a range of K values using --min_k and --max_k. The data is loaded once, and each K is trained sequentially:

$ adamixture --min_k 2 --max_k 10 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_sweep

Other options

  • --lr (float, default: 0.005):
    Learning rate used by the Adam optimizer in the EM updates.

  • --min_lr (float, default: 1e-6):
    Minimum learning rate used by the Adam optimizer in the EM updates.

  • --lr_decay (float, default: 0.5):
    Learning rate decay factor.

  • --beta1 (float, default: 0.80):
    Exponential decay rate for the first moment estimates in Adam.

  • --beta2 (float, default: 0.88):
    Exponential decay rate for the second moment estimates in Adam.

  • --reg_adam (float, default: 1e-8):
    Numerical stability constant (epsilon) for the Adam optimizer.

  • --patience_adam (int, default: 2):
    Patience for reducing the learning rate in Adam-EM.

  • --tol_adam (float, default: 0.1):
    Tolerance for stopping the Adam-EM algorithm.

  • --data_path (str, required):
    Path to the genotype data (BED, VCF or PGEN).

  • --save_dir (str, required):
    Directory where the output files will be saved.

  • --name (str, required):
    Experiment/model name used as prefix for output files.

  • --device (str, default: cpu):
    Target hardware for computation. Choices: cpu, gpu (NVIDIA/CUDA), or mps (Apple Metal).

  • -s (int, default: 42):
    Random number generator seed for reproducibility.

  • -k (int):
    Number of ancestral populations (clusters) to infer. Required if --min_k/--max_k are not specified.

  • --min_k (int):
    Minimum K for a multi-K sweep (inclusive). Must be used together with --max_k.

  • --max_k (int):
    Maximum K for a multi-K sweep (inclusive). Must be used together with --min_k.

  • --no_freqs (flag):
    If set, the P (allele frequencies) matrix is not saved to disk. Only the Q (admixture proportions) file will be written.

  • --max_iter (int, default: 1500):
    Maximum number of Adam-EM iterations.

  • --check (int, default: 5):
    Frequency (in iterations) at which the log-likelihood is evaluated.

  • --max_als (int, default: 1000):
    Maximum number of iterations for the ALS solver.

  • --tol_als (float, default: 1e-4):
    Convergence tolerance for the ALS optimization.

  • --power (int, default: 5):
    Number of power iterations used in randomized SVD.

  • --tol_svd (float, default: 1e-1):
    Convergence tolerance for the SVD approximation.

  • --chunk_size (int, default: 4096):
    Number of SNPs in chunk operations for SVD.

  • -t (int, default: 1):
    Number of CPU threads used during execution.

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Troubleshooting

CUDA issues

If you get an error similar to the following when using the GPU:

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Simply installing nvcc using conda or mamba should fix it:

$ conda install -c nvidia nvcc

macOS compilation issues

If you get errors related to OpenMP (OMP) during installation on macOS, ensure you have libomp installed via Homebrew:

$ brew install libomp

Cite

When using this software, please cite the following preprint:

@article{saurina2026adamixture,
  title={ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering},
  author={Saurina-i-Ricos, Joan and Mas Monserrat, Daniel and Ioannidis, Alexander G.},
  journal={bioRxiv},
  year={2026},
  doi={10.64898/2026.02.13.700171},
  url={https://doi.org/10.64898/2026.02.13.700171}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adamixture-1.5.3.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

adamixture-1.5.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (664.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

adamixture-1.5.3-cp312-cp312-macosx_14_0_x86_64.whl (571.7 kB view details)

Uploaded CPython 3.12macOS 14.0+ x86-64

adamixture-1.5.3-cp312-cp312-macosx_14_0_arm64.whl (800.7 kB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

adamixture-1.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (665.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

adamixture-1.5.3-cp311-cp311-macosx_14_0_x86_64.whl (566.4 kB view details)

Uploaded CPython 3.11macOS 14.0+ x86-64

adamixture-1.5.3-cp311-cp311-macosx_14_0_arm64.whl (796.3 kB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

adamixture-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (672.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

adamixture-1.5.3-cp310-cp310-macosx_14_0_x86_64.whl (568.8 kB view details)

Uploaded CPython 3.10macOS 14.0+ x86-64

adamixture-1.5.3-cp310-cp310-macosx_14_0_arm64.whl (799.5 kB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file adamixture-1.5.3.tar.gz.

File metadata

  • Download URL: adamixture-1.5.3.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for adamixture-1.5.3.tar.gz
Algorithm Hash digest
SHA256 00ca08f420be08256a40f17f1766b0314b661574bb124f43b62637d0b96a062e
MD5 50986560284929947b9ac36e2d736d83
BLAKE2b-256 c722e34eb7dd2287cc39707cd9581be3653d05de3d44a130e4b82e530c42f884

See more details on using hashes here.

File details

Details for the file adamixture-1.5.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 11e5668cd7d1269f09b206cf4e7dddace3297c0f79590c5ad46ed2db0c0e7ebd
MD5 a94521db1fd8947970a4a83913ebf59b
BLAKE2b-256 2b1346756a02ad4e9a633e683600d144e1e1420d1bc095f33207783671e16105

See more details on using hashes here.

File details

Details for the file adamixture-1.5.3-cp312-cp312-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.3-cp312-cp312-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 0f8e6b2b6e7d8e601331e40f2b070f61bc249c5051cfc8d19917387f4647b187
MD5 8b83fcf92e5ba662e5b258c84c81d097
BLAKE2b-256 f08b518ce65473f27a406fb1791d87a5efde6281312c139a618d12ee312a4b5a

See more details on using hashes here.

File details

Details for the file adamixture-1.5.3-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.3-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 fc8589e19687b4c841f9a49b80eab1184243f2bf484a356f523d1359938ff556
MD5 3603ddf1a02623e801cfb4f6c0aa1bd7
BLAKE2b-256 b66d7ae3ea6105f0a471a7b35e8e8c01f8a073b9d2be365642919d9ba85ea261

See more details on using hashes here.

File details

Details for the file adamixture-1.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4922d156ffc5bd1d1db1584539567773263b5120a5e59dd3778b74f546060171
MD5 903eb3c94a7235a4e618f0a4675edb78
BLAKE2b-256 c43ef5e3596c812b4a9d57c491a86a877dadbd0befa5f2dfeed283a81333daf9

See more details on using hashes here.

File details

Details for the file adamixture-1.5.3-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.3-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 33b7a9151bcd675406241687487928c772db7253a2e7d856f1c47332f6ea4b98
MD5 cdb89969dba978e5fbdaeb1275840fbd
BLAKE2b-256 a6f2541c54b79df60542fa24f067a6a56511ec31f7ba77dff93210d4bdd888ec

See more details on using hashes here.

File details

Details for the file adamixture-1.5.3-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.3-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 244ae5f5ae1c80d3b8f147f3f60e22643ca953d7ce2b91e09fa21ef5845dac25
MD5 3a44123dfc2ec3891744a19269bdb204
BLAKE2b-256 16feb256cad2aeec0bc41f3f1343a737fd26f7c5169fc1f168508e805abec76a

See more details on using hashes here.

File details

Details for the file adamixture-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0a1823ad76fd30591de72eedec80e0a7c89883a89331d9438e54e4a8de68dcb0
MD5 23c82fb93a00b020e50f37698b114a41
BLAKE2b-256 5b3cad40a3bd7e016dea47427e2bc9c5af200ed04bd36ab75a5ea98b6783d9f7

See more details on using hashes here.

File details

Details for the file adamixture-1.5.3-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.3-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 9f381c6923aa12244e7f17a6da2b92453d7e1c734946bc957a8e7d3459f9318e
MD5 fcfeff578a26cc062add7c4d80e7d523
BLAKE2b-256 cff38c829fe99e729f73002bd10ebcb69409c9bb714087bedb6dd66e99a6f50c

See more details on using hashes here.

File details

Details for the file adamixture-1.5.3-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.3-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 587132f8532ab2ac49c6860d62bcb4368af5f83e64082c6344f0b8ad69490b07
MD5 657bd1e6b69cc09dbb08d3c3b5ae80fb
BLAKE2b-256 3278149c7aa7de760d49c8ae3b3b2de66925f529d7a5db1e694d6b18cad1cdcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page