Skip to main content

ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Ancestry Inference

Project description

ADAMIXTURE logo

Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering

Python Version PyPI Version License Status Downloads


ADAMIXTURE is an unsupervised global ancestry inference method that scales the ADMIXTURE model to biobank-sized datasets. It combines the Expectation–Maximization (EM) framework with the Adam first-order optimizer, enabling parameter updates after a single EM step. This approach accelerates convergence while maintaining comparable or improved accuracy, substantially reducing runtime on large genotype datasets. For more information, we recommend reading our preprint.

The software can be invoked via CLI and has a similar interface to ADMIXTURE (e.g. the output format is completely interchangeable).

System requirements

Hardware requirements

The successful usage of this package requires a computer with enough RAM to be able to handle the large datasets the network has been designed to work with. Due to this, we recommend using compute clusters whenever available to avoid memory issues.

Software requirements

We recommend creating a fresh Python 3.10+ virtual environment. For a faster installation experience, we highly recommend using uv.

[!IMPORTANT]
If you plan to use GPU acceleration, ensure that the CUDA toolkit is correctly loaded (e.g., module load cuda) before starting the installation. This ensures that the dependencies and internal components are correctly configured for your hardware.

As an example, using uv (recommended):

$ uv venv --python 3.10
$ source .venv/bin/activate
$ uv pip install adamixture

Installation Guide

The package can be easily installed in at most a few minutes using pip (make sure to add the --upgrade flag if updating the version):

$ pip install adamixture

Running ADAMIXTURE

To train a model, simply invoke the following commands from the root directory of the project. For more info about all the arguments, please run adamixture --help. Note that BED, VCF and PGEN are supported:

As an example, the following ADMIXTURE call

$ ./admixture snps_data.bed 8 -s 42

would be equivalent in ADAMIXTURE by running

$ adamixture -k 8 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_data -s 42

By default, the following files will be output to the SAVE_PATH directory (the name parameter will be used to create the full filenames):

  • A .P file, similar to ADMIXTURE.
  • A .Q file, similar to ADMIXTURE.
  • A .png plot file containing the visualization of the inferred ancestry proportions (Q matrix).

Logs are printed to the stdout channel by default. If you want to save them to a file, you can use the command tee along with a pipe:

$ adamixture -k 8 ... | tee run.log

Running with multi-threading

To run ADAMIXTURE using multiple CPU threads, use the -t flag:

$ adamixture -k 8 --data_path data.bed --save_dir out/ --name test -t 8

Running with GPU acceleration

To leverage GPU acceleration (highly recommended for large datasets), use the --device flag:

  • NVIDIA GPU (CUDA):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device gpu
    
  • macOS Apple Silicon (MPS):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device mps
    

[!TIP] GPU Acceleration: Using GPUs greatly speeds up processing and is highly recommended for large datasets. You can specify the hardware to use with the --device parameter:

  • For NVIDIA GPUs, use --device gpu (requires CUDA).
  • For macOS users with Apple Silicon (M1/M2/M3/M4/M5), use --device mps to enable Metal Performance Shaders (MPS) acceleration.
  • Note that biobank-scale datasets are best handled on dedicated CUDA-capable GPUs due to high RAM requirements.

Multi-K Sweep

Instead of running ADAMIXTURE for a single K, you can automatically sweep over a range of K values using --min_k and --max_k. The data is loaded once, and each K is trained sequentially:

$ adamixture --min_k 2 --max_k 10 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_sweep

Cross-validation

Use --cv to estimate the optimal K by masking a fraction of genotype entries and measuring prediction error. → Full documentation

$ adamixture -k 8 --cv --data_path data.bed --save_dir out/ --name test

Plotting

By default, ADAMIXTURE automatically generates a png plot at 300 DPI without needing any additional flags. → Full documentation

Plots can include hierarchical population labels if you provide the arguments (--labels, --labels2, --labels3).

If you want to customize the format and resolution (e.g., to generate a PDF), you must use the appropriate flag depending on your execution mode:

  • Single K runs (-k): Use --plot_single. Note that --plot will be ignored in single K mode.

    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --plot_single pdf 300
    
  • Multi-K sweeps (--min_k and --max_k): Use --plot to configure the combined sweep plot.

    $ adamixture --min_k 2 --max_k 10 --data_path data.bed --save_dir out/ --name test --plot pdf 300
    

Projection Mode

Estimate ancestry proportions for new samples using a pre-trained, fixed P matrix (Q-only optimisation). K is detected automatically from P. → Full documentation

$ adamixture-project \
    --data_path new_samples.bed \
    --p_path trained_model/results.8.P \
    --save_dir projection_out/ \
    --name projected

Supervised Mode

Anchor the model with known population labels for a subset of samples while estimating Q freely for unlabeled ones. Labels use the same format as --labels (population name or -). → Full documentation

$ adamixture-supervised \
    --data_path all_samples.bed \
    --labels labels.txt \
    --save_dir supervised_out/ \
    --name supervised_run \
    -k 8

Other options

All hyperparameters and flags can be explored with:

$ adamixture --help

Key optimizer arguments:

Argument Default Description
--lr 0.005 Adam learning rate
--beta1 0.80 Adam β₁
--beta2 0.88 Adam β₂
--reg_adam 1e-8 Adam ε (numerical stability)
--lr_decay 0.5 Learning rate decay factor
--min_lr 1e-4 Minimum learning rate
--patience_adam 3 Checks without improvement before decaying lr
--tol_adam 0.1 Convergence tolerance
--max_iter 10000 Maximum Adam-EM iterations
--check 5 Log-likelihood evaluation frequency
-t 1 Number of CPU threads
-s 42 Random seed
--chunk_size 4096 Number of SNPs in chunk operations

Troubleshooting and Tips

Full documentation

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Cite

When using this software, please cite the following preprint:

@article{saurina2026adamixture,
  title={ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering},
  author={Saurina-i-Ricos, Joan and Mas Monserrat, Daniel and Ioannidis, Alexander G.},
  journal={bioRxiv},
  year={2026},
  doi={10.64898/2026.02.13.700171},
  url={https://doi.org/10.64898/2026.02.13.700171}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adamixture-1.6.2.tar.gz (7.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

adamixture-1.6.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

adamixture-1.6.2-cp312-cp312-macosx_14_0_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12macOS 14.0+ x86-64

adamixture-1.6.2-cp312-cp312-macosx_14_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

adamixture-1.6.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

adamixture-1.6.2-cp311-cp311-macosx_14_0_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11macOS 14.0+ x86-64

adamixture-1.6.2-cp311-cp311-macosx_14_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

adamixture-1.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

adamixture-1.6.2-cp310-cp310-macosx_14_0_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10macOS 14.0+ x86-64

adamixture-1.6.2-cp310-cp310-macosx_14_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file adamixture-1.6.2.tar.gz.

File metadata

  • Download URL: adamixture-1.6.2.tar.gz
  • Upload date:
  • Size: 7.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adamixture-1.6.2.tar.gz
Algorithm Hash digest
SHA256 ca8278e030b5000d3bfb5f9927f9e3ff21bd116f8f3dee7fc97d8efedbf2163b
MD5 b219aab74f8f8cbccc8839226ba816c7
BLAKE2b-256 5f1520e419920d1092b7f773714a4e1fbd752ca80951e1bcda5e9320836140d2

See more details on using hashes here.

File details

Details for the file adamixture-1.6.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.6.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f10f83212846a94eb4f677c84eb8a93fdb7d0568ea0e957d4565f84a3ed3fa3a
MD5 9a8158bc9c9637e07bbc21d1a8e43e5b
BLAKE2b-256 56f92327a383df54dac1ff55b9d4193ea81d3ee4886bd4a8ba8732980a85530b

See more details on using hashes here.

File details

Details for the file adamixture-1.6.2-cp312-cp312-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.6.2-cp312-cp312-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 d7dd0a3cfc0a44c7d8c5421b6da2c58b5621285ef98799c2c4feb97526b867fb
MD5 2a35781adcf46e78f86ef8b88dd1a5af
BLAKE2b-256 55edfb7a269df87101bf560fdbdc6c2b308ba5434c40d7e923a4a4a5354af8be

See more details on using hashes here.

File details

Details for the file adamixture-1.6.2-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.6.2-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 e212d8a7c910d024d67b0cd86a231b64546ca8c8417b3cc22c483c1f99a7f686
MD5 a0440c9f32eb4f8bb9385549f77fcb00
BLAKE2b-256 92d44717a81f9ee14be1e162aff3a110df02bdffb9fbc536aa8a6b00703891e8

See more details on using hashes here.

File details

Details for the file adamixture-1.6.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.6.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 59f87606b45433d21d09e4726ab05b360442757434844f9dcd82d5e8a9cddcc1
MD5 5ab6804b17d40ce64ecb7d90eda86113
BLAKE2b-256 3312180e8d99e5d33e77117711abc8007af6488c79780b5bf7805e691c1944f7

See more details on using hashes here.

File details

Details for the file adamixture-1.6.2-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.6.2-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 671802443ec2b0f2a3ccb9b6cc06e994a0a5da933fac319cd307ee9b04fde626
MD5 adfdec9b75d8a429f6ab96b9b34733d4
BLAKE2b-256 f5e8a5b9645a4a9e168a3100a0ad903e6acd97cf4aaa1d341b4e07c19c902f4f

See more details on using hashes here.

File details

Details for the file adamixture-1.6.2-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.6.2-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 bf793956f4be51d1d937bc8534b91480f65de8aeee50364131cddb8599d08924
MD5 d3a803f696d66062ad27d38aec6e03c1
BLAKE2b-256 17a5d71b87bb0a82e4952b51364d40766d7ebb81f33cc8cea5585d2c8dc643a3

See more details on using hashes here.

File details

Details for the file adamixture-1.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bbc228143689e38ca4daa6df18928230df0de1c702128baed0e123e155b56244
MD5 4cea5291b4c3394fcd92c1af79bf218a
BLAKE2b-256 2b6cc0b62b46bf3cf13c7cd5a6eaccaa3f5e36e30bfbb57ac2d8901508d2a070

See more details on using hashes here.

File details

Details for the file adamixture-1.6.2-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.6.2-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 45d1b6737a6e54988f3b9e3f736e28dd390ce26fe94c7eb1b8cd5f70869a91da
MD5 55a36894f827c7975d37f0f1eb3049f1
BLAKE2b-256 4f294a619b70edceacff4ccd9382c1f2f98432b11a16d2b5729c17dfbda0f2c9

See more details on using hashes here.

File details

Details for the file adamixture-1.6.2-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.6.2-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 60ef6e2c0c20147f91476b8035b3205f043a8c72d2632a13a84f0f77e2c49b74
MD5 87e0652cd1033606a18ac3c7c4ca3802
BLAKE2b-256 f9baa52d0b8eeff171c937154ae1ed561216a2a758fc5ee502be5460e5f90524

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page