An sklearn implementation of discriminant analysis of principal components (DAPC) for population genetics
Project description
DAPCy
DAPCy is a Python package that enhances the Discriminant Analysis of Principal Components (DAPC) method for population genetics (Jombart et al. 2010). Using the scikit-learn library, DAPCy efficiently handles large genomic datasets. It supports VCF and BED files, utilizes compressed sparse matrices, and employs truncated SVD for dimensionality reduction. The package also includes k-fold cross-validation for robust model evaluation and offers tools for clustering and visualizing genetic data. DAPCy is designed to be more computationally efficient and memory-friendly than the original R implementation, making it ideal for population analysis with large genomic datasets.
Installation
DAPCy is available via pip (on PyPi) or conda/mamba (on the bioconda channel). It should ideally be installed inside a virtual environment.
Note: DAPCy support for VCF is currently not available natively on Windows platforms due to its dependency on
bio2zarr(which in turn depends onCyvcf2). We suggest Windows users to install the package inside a WSL environment if they need to import VCF files. Note that using a Zarr file as an input is still possible on Windows.
Note: While Python >= 3.13 is supported, conda users need to use Python ≤ 3.12 until upstream packages are updated to avoid pinned version conflicts in environments (due to cyvcf2 and/or bed-reader not having conda builds >= 3.12).
pip:
python -m venv <my-env>
source <my-env>/bin/activate
pip install dapcy
conda/mamba:
conda create --name <my-env>
conda activate <my-env>
conda install -c bioconda dapcy
Documentation and tutorial
For more information on how to use DAPCy, please refer to the documentation: https://uhasselt-bioinfo.gitlab.io/dapcy/reference/.
Tutorial: The Plasmodium falciparum Pf7 dataset from the MalariaGEN Consortium
We have created a tutorial on how to use the package, using the Plasmodium falciparum Pf7 dataset as a case study and made it available here. You can also download the associated Jupyter notebook from this repository to play around with the code yourself. All files used in the tutorial can be found in this Zenodo archive.
We have also provided a simple example script in the git repository.
Citation
If you use DAPCy in your own work, you can cite:
Alejandro Correa Rojo, Pieter Moris, Hanne Meuwissen, Pieter Monsieurs, Dirk Valkenborg, DAPCy: a Python package for the discriminant analysis of principal components method for population genetic analyses, Bioinformatics Advances, Volume 5, Issue 1, 2025, vbaf143, https://doi.org/10.1093/bioadv/vbaf143
Release notes
See https://uhasselt-bioinfo.gitlab.io/dapcy/about/release_notes/.
Contributors
- Alejandro Correa Rojo
- Pieter Moris
- Hanne Meuwissen
- Pieter Monsieurs
- Dirk Valkenborg
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dapcy-1.3.1.tar.gz.
File metadata
- Download URL: dapcy-1.3.1.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7386a0045d390f13072cdb041d591b2579415e8a24e7cd2c58387f0279b7bcdf
|
|
| MD5 |
2e571412c6ed93c481552184fd604204
|
|
| BLAKE2b-256 |
da49ab2ee9901d6cae440843bcb6bcb573a0ae9dd0b2a2c2e129964aa843c1a7
|
File details
Details for the file dapcy-1.3.1-py3-none-any.whl.
File metadata
- Download URL: dapcy-1.3.1-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
274b9a10801377af9dfa36ccdbfc95b589bdfc086c61724ecbb0ed4655f18ba0
|
|
| MD5 |
243fa6627bbd942107466255b7a425c7
|
|
| BLAKE2b-256 |
36871ec3ec8f8dc01e8e0424266b2a609593715547df4a44c86149fd01b584a4
|