Skip to main content

An sklearn implementation of discriminant analysis of principal components (DAPC) for population genetics

Project description

GitLab Release GitLab Last Commit PyPI - Version Conda Version GitLab License

DAPCy

DAPCy is a Python package that enhances the Discriminant Analysis of Principal Components (DAPC) method for population genetics (Jombart et al. 2010). Using the scikit-learn library, DAPCy efficiently handles large genomic datasets. It supports VCF and BED files, utilizes compressed sparse matrices, and employs truncated SVD for dimensionality reduction. The package also includes k-fold cross-validation for robust model evaluation and offers tools for clustering and visualizing genetic data. DAPCy is designed to be more computationally efficient and memory-friendly than the original R implementation, making it ideal for population analysis with large genomic datasets.

Installation

DAPCy is available via pip (on PyPi) or conda/mamba (on the bioconda channel). It should ideally be installed inside a virtual environment.

Note: DAPCy support for VCF is currently not available natively on Windows platforms due to its dependency on bio2zarr (which in turn depends on Cyvcf2). We suggest Windows users to install the package inside a WSL environment if they need to import VCF files. Note that using a Zarr file as an input is still possible on Windows.

Note: While Python >= 3.13 is supported, conda users need to use Python ≤ 3.12 until upstream packages are updated to avoid pinned version conflicts in environments (due to cyvcf2 and/or bed-reader not having conda builds >= 3.12).

pip:

python -m venv <my-env>
source <my-env>/bin/activate
pip install dapcy

conda/mamba:

conda create --name <my-env>
conda activate <my-env>
conda install -c bioconda dapcy

Documentation and tutorial

For more information on how to use DAPCy, please refer to the documentation: https://uhasselt-bioinfo.gitlab.io/dapcy/reference/.

Tutorial: The Plasmodium falciparum Pf7 dataset from the MalariaGEN Consortium

We have created a tutorial on how to use the package, using the Plasmodium falciparum Pf7 dataset as a case study and made it available here. You can also download the associated Jupyter notebook from this repository to play around with the code yourself. All files used in the tutorial can be found in this Zenodo archive.

We have also provided a simple example script in the git repository.

Citation

If you use DAPCy in your own work, you can cite:

Alejandro Correa Rojo, Pieter Moris, Hanne Meuwissen, Pieter Monsieurs, Dirk Valkenborg, DAPCy: a Python package for the discriminant analysis of principal components method for population genetic analyses, Bioinformatics Advances, Volume 5, Issue 1, 2025, vbaf143, https://doi.org/10.1093/bioadv/vbaf143

Release notes

See https://uhasselt-bioinfo.gitlab.io/dapcy/about/release_notes/.

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dapcy-1.3.1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dapcy-1.3.1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file dapcy-1.3.1.tar.gz.

File metadata

  • Download URL: dapcy-1.3.1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for dapcy-1.3.1.tar.gz
Algorithm Hash digest
SHA256 7386a0045d390f13072cdb041d591b2579415e8a24e7cd2c58387f0279b7bcdf
MD5 2e571412c6ed93c481552184fd604204
BLAKE2b-256 da49ab2ee9901d6cae440843bcb6bcb573a0ae9dd0b2a2c2e129964aa843c1a7

See more details on using hashes here.

File details

Details for the file dapcy-1.3.1-py3-none-any.whl.

File metadata

  • Download URL: dapcy-1.3.1-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for dapcy-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 274b9a10801377af9dfa36ccdbfc95b589bdfc086c61724ecbb0ed4655f18ba0
MD5 243fa6627bbd942107466255b7a425c7
BLAKE2b-256 36871ec3ec8f8dc01e8e0424266b2a609593715547df4a44c86149fd01b584a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page