Skip to main content

ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Ancestry Inference

Project description

ADAMIXTURE logo

Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering

Python Version PyPI Version License Status Downloads DOI


ADAMIXTURE is an unsupervised global ancestry inference method that scales the ADMIXTURE model to biobank-sized datasets. It combines the Expectation–Maximization (EM) framework with the Adam first-order optimizer, enabling parameter updates after a single EM step. This approach accelerates convergence while maintaining comparable or improved accuracy, substantially reducing runtime on large genotype datasets. For more information, we recommend reading our preprint.

The software can be invoked via CLI and has a similar interface to ADMIXTURE (e.g. the output format is completely interchangeable).

System requirements

Hardware requirements

The successful usage of this package requires a computer with enough RAM to be able to handle the large datasets the network has been designed to work with. Due to this, we recommend using compute clusters whenever available to avoid memory issues.

Software requirements

We recommend creating a fresh Python 3.10+ virtual environment. For a faster installation experience, we highly recommend using uv (or pixi). Alternatively, you can use virtualenv or conda.

[!IMPORTANT]
If you plan to use GPU acceleration, ensure that the CUDA toolkit is correctly loaded (e.g., module load cuda) before starting the installation. This ensures that the dependencies and internal components are correctly configured for your hardware.

As an example, using uv (recommended):

$ uv venv --python 3.10
$ source .venv/bin/activate
$ uv pip install adamixture

Or using virtualenv:

$ virtualenv --python=python3.10 ~/venv/nadmenv
$ source ~/venv/nadmenv/bin/activate
(nadmenv) $ pip install adamixture

[!IMPORTANT] macOS Users: ADAMIXTURE requires OpenMP for parallel processing. You must install libomp (e.g., via Homebrew) before installing the package, otherwise the compilation will fail:

$ brew install libomp

Installation Guide

The package can be easily installed in at most a few minutes using pip (make sure to add the --upgrade flag if updating the version):

(nadmenv) $ pip install adamixture

Running ADAMIXTURE

To train a model, simply invoke the following commands from the root directory of the project. For more info about all the arguments, please run adamixture --help. Note that BED, VCF and PGEN are supported:

[!TIP] GPU Acceleration: Using GPUs greatly speeds up processing and is highly recommended for large datasets. You can specify the hardware to use with the --device parameter:

  • For NVIDIA GPUs, use --device gpu (requires CUDA).
  • For macOS users with Apple Silicon (M1/M2/M3/M4/M5), use --device mps to enable Metal Performance Shaders (MPS) acceleration.
  • Note that biobank-scale datasets are best handled on dedicated CUDA-capable GPUs due to high RAM requirements.

As an example, the following ADMIXTURE call

$ ./admixture snps_data.bed 8 -s 42

would be equivalent in ADAMIXTURE by running

$ adamixture -k 8 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_data -s 42

Two files will be output to the SAVE_PATH directory (the name parameter will be used to create the full filenames):

  • A .P file, similar to ADMIXTURE.
  • A .Q file, similar to ADMIXTURE.

Logs are printed to the stdout channel by default. If you want to save them to a file, you can use the command tee along with a pipe:

$ adamixture -k 8 ... | tee run.log

Running with multi-threading

To run ADAMIXTURE using multiple CPU threads, use the -t flag:

$ adamixture -k 8 --data_path data.bed --save_dir out/ --name test -t 8

Running with GPU acceleration

To leverage GPU acceleration (highly recommended for large datasets), use the --device flag:

  • NVIDIA GPU (CUDA):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device gpu
    
  • macOS Apple Silicon (MPS):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device mps
    

[!NOTE]
Biobank-scale datasets are best handled on dedicated CUDA-capable GPUs.

[!TIP] Biobank-Scale Execution & High K Values: For large-scale datasets (e.g., UK Biobank, All of Us) with high K values, we recommend the following parameter settings for optimal convergence and performance:

--patience_adam 5 \
--lr_decay 0.85 \
--lr 0.0075

Multi-K Sweep

Instead of running ADAMIXTURE for a single K, you can automatically sweep over a range of K values using --min_k and --max_k. The data is loaded once, and each K is trained sequentially:

$ adamixture --min_k 2 --max_k 10 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_sweep

Other options

  • --lr (float, default: 0.005):
    Learning rate used by the Adam optimizer in the EM updates.

  • --min_lr (float, default: 1e-6):
    Minimum learning rate used by the Adam optimizer in the EM updates.

  • --lr_decay (float, default: 0.5):
    Learning rate decay factor.

  • --beta1 (float, default: 0.80):
    Exponential decay rate for the first moment estimates in Adam.

  • --beta2 (float, default: 0.88):
    Exponential decay rate for the second moment estimates in Adam.

  • --reg_adam (float, default: 1e-8):
    Numerical stability constant (epsilon) for the Adam optimizer.

  • --patience_adam (int, default: 2):
    Patience for reducing the learning rate in Adam-EM.

  • --tol_adam (float, default: 0.1):
    Tolerance for stopping the Adam-EM algorithm.

  • --data_path (str, required):
    Path to the genotype data (BED, VCF or PGEN).

  • --save_dir (str, required):
    Directory where the output files will be saved.

  • --name (str, required):
    Experiment/model name used as prefix for output files.

  • --device (str, default: cpu):
    Target hardware for computation. Choices: cpu, gpu (NVIDIA/CUDA), or mps (Apple Metal).

  • -s (int, default: 42):
    Random number generator seed for reproducibility.

  • -k (int):
    Number of ancestral populations (clusters) to infer. Required if --min_k/--max_k are not specified.

  • --min_k (int):
    Minimum K for a multi-K sweep (inclusive). Must be used together with --max_k.

  • --max_k (int):
    Maximum K for a multi-K sweep (inclusive). Must be used together with --min_k.

  • --no_freqs (flag):
    If set, the P (allele frequencies) matrix is not saved to disk. Only the Q (admixture proportions) file will be written.

  • --max_iter (int, default: 1500):
    Maximum number of Adam-EM iterations.

  • --check (int, default: 5):
    Frequency (in iterations) at which the log-likelihood is evaluated.

  • --max_als (int, default: 1000):
    Maximum number of iterations for the ALS solver.

  • --tol_als (float, default: 1e-4):
    Convergence tolerance for the ALS optimization.

  • --power (int, default: 5):
    Number of power iterations used in randomized SVD.

  • --tol_svd (float, default: 1e-1):
    Convergence tolerance for the SVD approximation.

  • --chunk_size (int, default: 4096):
    Number of SNPs in chunk operations for SVD.

  • -t (int, default: 1):
    Number of CPU threads used during execution.

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Troubleshooting

CUDA issues

If you get an error similar to the following when using the GPU:

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Simply installing nvcc using conda or mamba should fix it:

$ conda install -c nvidia nvcc

macOS compilation issues

If you get errors related to OpenMP (OMP) during installation on macOS, ensure you have libomp installed via Homebrew:

$ brew install libomp

Cite

When using this software, please cite the following preprint:

@article{saurina2026adamixture,
  title={ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering},
  author={Saurina-i-Ricos, Joan and Mas Monserrat, Daniel and Ioannidis, Alexander G.},
  journal={bioRxiv},
  year={2026},
  doi={10.64898/2026.02.13.700171},
  url={https://doi.org/10.64898/2026.02.13.700171}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adamixture-1.5.5.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

adamixture-1.5.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (711.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

adamixture-1.5.5-cp312-cp312-macosx_14_0_x86_64.whl (618.8 kB view details)

Uploaded CPython 3.12macOS 14.0+ x86-64

adamixture-1.5.5-cp312-cp312-macosx_14_0_arm64.whl (846.4 kB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

adamixture-1.5.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (713.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

adamixture-1.5.5-cp311-cp311-macosx_14_0_x86_64.whl (612.8 kB view details)

Uploaded CPython 3.11macOS 14.0+ x86-64

adamixture-1.5.5-cp311-cp311-macosx_14_0_arm64.whl (841.4 kB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

adamixture-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (724.2 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

adamixture-1.5.5-cp310-cp310-macosx_14_0_x86_64.whl (615.8 kB view details)

Uploaded CPython 3.10macOS 14.0+ x86-64

adamixture-1.5.5-cp310-cp310-macosx_14_0_arm64.whl (845.3 kB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file adamixture-1.5.5.tar.gz.

File metadata

  • Download URL: adamixture-1.5.5.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for adamixture-1.5.5.tar.gz
Algorithm Hash digest
SHA256 ec96ce9a82ca9bdd9912e4120287af229384b40c6f4b86c4601fa21834ef4c32
MD5 f2841a5368ceb13177fdf4650ae54aab
BLAKE2b-256 9dcb97e5ab68ca960850bb7a75ad52cc20ac369bba0980087bd433c988796632

See more details on using hashes here.

File details

Details for the file adamixture-1.5.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6fe7b40321d4a2ab2a09257dd1697b7bf0656be0b25a1f9fdeb701ba7b71a8a2
MD5 3890eb14ce9bb6a17c1d4d270d94cd6f
BLAKE2b-256 47add8382dc6729e78627ed9adf4284b9d1ade6ae625560b09b66a7201b854ca

See more details on using hashes here.

File details

Details for the file adamixture-1.5.5-cp312-cp312-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.5-cp312-cp312-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 5f18e69dc0c682b89a8e3930a6abd08f9c883201da8760889a8d332e85cc0f9b
MD5 485578d6b9c0d8577ef40377a917e449
BLAKE2b-256 6a2bd8e328b9d1b9c5c3291298634d2438f8f68f040a6bcbd90d002694c2002c

See more details on using hashes here.

File details

Details for the file adamixture-1.5.5-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.5-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 8d812745ea580522c406b4bed4040c23e8df91f70a189c134e62d7b0ab24afd8
MD5 7d20c0f9641e70ac4b69ff5cbc210cbd
BLAKE2b-256 bae953707acf983b9deee9b7af35454fca969533bff558c66cc340c785e6ecd0

See more details on using hashes here.

File details

Details for the file adamixture-1.5.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 73c4f0bd26f937a01e47a52858aff87142aa6ab7aa0833fd998da5f18e80ab82
MD5 ec6dc539f60eac925c790ccb3dd6a063
BLAKE2b-256 95d3d8a37113de0a9d08bce91db5f991dd0b98b94a84cade137ea0bf9a29ee2c

See more details on using hashes here.

File details

Details for the file adamixture-1.5.5-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.5-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 db982acf5b09de1f79cac51b8e691ddc903db663ebd9b651599b2e691f956447
MD5 ae4a05fcc4be7d9ee57195f8c7f91868
BLAKE2b-256 565b84b2e24a897c11a82f53601f61cb20a7af65f845ead1a3d9dd6d5f4ce8a4

See more details on using hashes here.

File details

Details for the file adamixture-1.5.5-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.5-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 39ee18a1b12a7ee610e5a88af9ca2c0bbd4b3220de16346717855b3c5dd16626
MD5 3436d431f087cfc87b5a1450fd172aec
BLAKE2b-256 e969f31846c6c3202300f9b5d3e219dac16084f57e2d8a654aa611703733dc94

See more details on using hashes here.

File details

Details for the file adamixture-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 694d3e74bc9320ebe1a8544a2eab858e25f68f307a3bfcebf3fd62068202e706
MD5 ac55837aa260a645037b66011917b11c
BLAKE2b-256 4962f3263130bcb604f8633f7c87911d2160362ca8ba348f262a097d05b168d2

See more details on using hashes here.

File details

Details for the file adamixture-1.5.5-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.5-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 5cac80053626a5a01cf3ccc7566ddae9989020a8d1a2db0480b804597a789563
MD5 d505fa0ab0f1f12dd2ce511aaa981f7d
BLAKE2b-256 09af8309383da53b6e0711e2d232359915a376fc1a10a4a537e14c04e049ee8e

See more details on using hashes here.

File details

Details for the file adamixture-1.5.5-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.5.5-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 220cca760ac65adb7edbae0e9db361877ad187d52d868fca387414e06dface5f
MD5 9b3bd03cb61045573538f3efe72e739c
BLAKE2b-256 9fd5cd0c9d2d3ba8dfba119cfed870663c64950d4b075e1c9574fc5bc1c43d18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page