Skip to main content

ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Ancestry Inference

Project description

ADAMIXTURE logo

Fast Biobank-Scale Population Genetics Clustering

Python Version PyPI Version License Status Downloads


ADAMIXTURE is a fast CPU/GPU implementation of ADMIXTURE for biobank-scale genetic clustering. .P and .Q outputs remain compatible with ADMIXTURE.

System requirements

Hardware requirements

The successful usage of this package requires a computer with enough RAM to be able to handle the large datasets the network has been designed to work with. Due to this, we recommend using compute clusters whenever available to avoid memory issues.

Software requirements

We recommend creating a fresh Python 3.10+ virtual environment. For a faster installation experience, we highly recommend using uv.

[!IMPORTANT]
If you plan to use GPU acceleration, ensure that the CUDA toolkit is correctly loaded (e.g., module load cuda) before starting the installation. This ensures that the dependencies and internal components are correctly configured for your hardware.

As an example, using uv (recommended):

$ uv venv --python 3.10
$ source .venv/bin/activate
$ uv pip install adamixture

Installation Guide

The package can be easily installed in at most a few minutes using pip (make sure to add the --upgrade flag if updating the version):

$ pip install adamixture

Running ADAMIXTURE

To train a model, simply invoke the following commands from the root directory of the project. For more info about all the arguments, please run adamixture --help. Note that BED, VCF and PGEN are supported.

As an example, the following ADMIXTURE call

$ ./admixture snps_data.bed 8 -s 42

would be equivalent in ADAMIXTURE by running

$ adamixture -k 8 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_data -s 42

By default, the following files will be output to the SAVE_PATH directory (the name parameter will be used to create the full filenames):

  • A .P file, similar to ADMIXTURE.
  • A .Q file, similar to ADMIXTURE.
  • A .png plot file containing the visualization of the inferred ancestry proportions (Q matrix).

Logs are printed to the stdout channel by default. If you want to save them to a file, you can use the command tee along with a pipe:

$ adamixture -k 8 ... | tee run.log

Running with multi-threading

To run ADAMIXTURE using multiple CPU threads, use the -t flag:

$ adamixture -k 8 --data_path data.bed --save_dir out/ --name test -t 8

Running with GPU acceleration

To leverage GPU acceleration (highly recommended for large datasets), use the --device flag:

  • NVIDIA GPU (CUDA):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device gpu
    
  • macOS Apple Silicon (MPS):
    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --device mps
    

[!TIP] GPU Acceleration: Using GPUs greatly speeds up processing and is highly recommended for large datasets. You can specify the hardware to use with the --device parameter:

  • For NVIDIA GPUs, use --device gpu (requires CUDA).
  • For macOS users with Apple Silicon (M1/M2/M3/M4/M5), use --device mps to enable Metal Performance Shaders (MPS) acceleration.
  • Note that biobank-scale datasets are best handled on dedicated CUDA-capable GPUs due to high RAM requirements.

Multi-K Sweep

Instead of running ADAMIXTURE for a single K, you can automatically sweep over a range of K values using --min_k and --max_k. The data is loaded once, and each K is trained sequentially:

$ adamixture --min_k 2 --max_k 10 --data_path snps_data.bed --save_dir SAVE_PATH --name snps_sweep

Cross-validation

Use --cv to estimate the optimal K by masking a fraction of genotype entries and measuring prediction error. → Full documentation

$ adamixture -k 8 --cv --data_path data.bed --save_dir out/ --name test

Plotting

By default, ADAMIXTURE automatically generates a png plot at 300 DPI without needing any additional flags. → Full documentation

Plots can include hierarchical population labels if you provide the arguments (--labels, --labels2, --labels3).

If you want to customize the format and resolution (e.g., to generate a PDF), you must use the appropriate flag depending on your execution mode:

  • Single K runs (-k): Use --plot_single. Note that --plot will be ignored in single K mode.

    $ adamixture -k 8 --data_path data.bed --save_dir out/ --name test --plot_single pdf 300
    
  • Multi-K sweeps (--min_k and --max_k): Use --plot to configure the combined sweep plot.

    $ adamixture --min_k 2 --max_k 10 --data_path data.bed --save_dir out/ --name test --plot pdf 300
    

Projection Mode

Estimate ancestry proportions for new samples using a pre-trained, fixed P matrix (Q-only optimisation). K is detected automatically from P. → Full documentation

$ adamixture-project \
    --data_path new_samples.bed \
    --p_path trained_model/results.8.P \
    --save_dir projection_out/ \
    --name projected

Supervised Mode

Anchor the model with known population labels for a subset of samples while estimating Q freely for unlabeled ones. Labels use the same format as --labels (population name or -). → Full documentation

$ adamixture-supervised \
    --data_path all_samples.bed \
    --labels labels.txt \
    --save_dir supervised_out/ \
    --name supervised_run \
    -k 8

Other options

All hyperparameters and flags can be explored with:

$ adamixture --help

Key arguments:

Argument Default Description
-k, --k required for single run Number of ancestral populations
--min_k, --max_k unset Inclusive K range for a multi-K sweep
--algorithm brqn Solver to use: brqn for ADMIXTURE SQP + ZAL QN, or adamem
--init als Initialization method: improved SVD+ALS (als) or random EM priming (em)
--tol 0.1 Convergence tolerance for log-likelihood changes
--max_iter 10000 Maximum optimization iterations
--check 5 Log-likelihood evaluation frequency
-t 1 Number of CPU threads
-s 42 Random seed
--device cpu Device to use: cpu, gpu, or mps
--chunk_size 8192 Number of SNPs in chunk operations
--chromosome-mode autosomes Chromosome filter: autosomes keeps autosomes 1..--autosome-count; all keeps every chromosome
--autosome-count 22 Number of autosomes kept when --chromosome-mode autosomes
--cv 0 Enable v-fold cross-validation; --cv without a value uses 5 folds
--no_freqs False Do not save the .P allele-frequency matrix

Algorithm note

The ADAMIXTURE preprint introduced Adam-EM as an adaptive first-order optimizer for admixture inference. The package still includes this solver via --algorithm adamem.

In the current implementation, the default is --algorithm brqn. Empirical benchmarking showed that block relaxation with ZAL quasi-Newton acceleration, when paired with our improved SVD+ALS initialization, reaches high-quality solutions in fewer iterations and better wall-clock time. For that reason, BR-QN is the default solver, while Adam-EM remains available for experimentation and reproducibility. Adam-EM tuning parameters are documented in Troubleshooting and Tips.

Troubleshooting and Tips

Full documentation

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Cite

When using this software, please cite the following preprint:

@article{saurina2026adamixture,
  title={ADAMIXTURE: Adaptive First-Order Optimization for Biobank-Scale Genetic Clustering},
  author={Saurina-i-Ricos, Joan and Mas Monserrat, Daniel and Ioannidis, Alexander G.},
  journal={bioRxiv},
  year={2026},
  doi={10.64898/2026.02.13.700171},
  url={https://doi.org/10.64898/2026.02.13.700171}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adamixture-1.7.0.tar.gz (7.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

adamixture-1.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

adamixture-1.7.0-cp312-cp312-macosx_14_0_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 14.0+ x86-64

adamixture-1.7.0-cp312-cp312-macosx_14_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

adamixture-1.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

adamixture-1.7.0-cp311-cp311-macosx_14_0_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.11macOS 14.0+ x86-64

adamixture-1.7.0-cp311-cp311-macosx_14_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

adamixture-1.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

adamixture-1.7.0-cp310-cp310-macosx_14_0_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.10macOS 14.0+ x86-64

adamixture-1.7.0-cp310-cp310-macosx_14_0_arm64.whl (1.8 MB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file adamixture-1.7.0.tar.gz.

File metadata

  • Download URL: adamixture-1.7.0.tar.gz
  • Upload date:
  • Size: 7.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for adamixture-1.7.0.tar.gz
Algorithm Hash digest
SHA256 f786fa3fbc07dfb6b39902e0683ec6b993a8481a4da07aa16667aefe18f7bc37
MD5 b39a2fd2c5d90a472b99ae16dbfdf863
BLAKE2b-256 18df48610bfbd1ee697f73872e087fd2e2a8fea488066e0faf1162a1b1cae3f8

See more details on using hashes here.

File details

Details for the file adamixture-1.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 84fa6f4e0faf7b65ab2715c11f115d1d0a2f5d2b32ac3456fb20e9a091cf53a6
MD5 14e3e66c826f2c3b1d6023d66f4df03c
BLAKE2b-256 7627a6d7869f530e6dfde61d39adeb54074380d79db72a3249cf7d6c472c1261

See more details on using hashes here.

File details

Details for the file adamixture-1.7.0-cp312-cp312-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.7.0-cp312-cp312-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 191d6a22cc34274238614d555674d1d36a6efccf365802f30e6306fac53893b4
MD5 e3d35c9a65e141c60c3642d2ad5a968d
BLAKE2b-256 359480bebd0922a91d9f19ce2615aa458643b3e08a69edfab6ad9abd0a83f9ce

See more details on using hashes here.

File details

Details for the file adamixture-1.7.0-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.7.0-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 4e5155ea82a89738e64a3bea4781ea2eb9e450e4ca29b3bd74bb88a4204b5bf8
MD5 dfa3cbc0fdd1db7199c6b7b48fbb6fbc
BLAKE2b-256 0d779548f4425bff537f02d447ac593cc5776a9b4bb8831754351f1d45a5b532

See more details on using hashes here.

File details

Details for the file adamixture-1.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e214220651e83e09fcc2fdfaab8d65608ed552e5577a1c27e2e1654b4a9506fd
MD5 f5bf33fab09f516e561f742deca10acf
BLAKE2b-256 5b78157b2f3fff06c7afeea5a1e15bb498bd2e74d0690e4a6875c86663f56777

See more details on using hashes here.

File details

Details for the file adamixture-1.7.0-cp311-cp311-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.7.0-cp311-cp311-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 d140c935601fd25ed66e3ae42abb0c9f828d7eb7bc1230110648eb40cf1b5bac
MD5 54a86daf497469ae718821e15e0c9536
BLAKE2b-256 48e2431b1be2b3ef9da206a7c5058958a78d0a8f7335827b086f0f2cd2b8d3ff

See more details on using hashes here.

File details

Details for the file adamixture-1.7.0-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.7.0-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 4f15355bdfe4228afd6430894d448e856d5b480930af4c6b0c2ea65a5c43849a
MD5 a9670091052043b6dfd33bb8a7b920f6
BLAKE2b-256 ee843f6b6432f40ae46b200b6aa369ed92da023a1fd14e761f861c003f9d6cc6

See more details on using hashes here.

File details

Details for the file adamixture-1.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f897ddddbb3ade222f30bdfded69ec975ac8503d5089a393ad103f3f4871ea59
MD5 c401e1eef1c576c37d34f02e28a856b6
BLAKE2b-256 4bdb941faf5bfddb16d1bf3f0d77e197938ccc42ca89b3c2b7441922da6f3082

See more details on using hashes here.

File details

Details for the file adamixture-1.7.0-cp310-cp310-macosx_14_0_x86_64.whl.

File metadata

File hashes

Hashes for adamixture-1.7.0-cp310-cp310-macosx_14_0_x86_64.whl
Algorithm Hash digest
SHA256 dc92bf000021f5271cd22fc6ed498e2872a074b3f6874f37063c0e5ff9f5bd93
MD5 a6344fad9da570a7a392c19500b13d9b
BLAKE2b-256 bb2ea5446afecbc2b039974954b8fc00663b497c579d52d6649ea86da18813b7

See more details on using hashes here.

File details

Details for the file adamixture-1.7.0-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for adamixture-1.7.0-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 f38ddc39f68a5abb6c7f59311431605b176e72dedfb2d39a9de140669ada1963
MD5 9bdf7af4be5afa813c33e20cc15a6917
BLAKE2b-256 0c9c89760273318d25c69c8e657a6fa503dfdeafdb59644134f8ad14aa96d1e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page