An sklearn implementation of discriminant analysis of principal components (DAPC) for population genetics
Project description
DAPCy
DAPCy is a Python package that enhances the Discriminant Analysis of Principal Components (DAPC) method for population genetics (Jombart et al. 2010). Using the scikit-learn library, DAPCy efficiently handles large genomic datasets. It supports VCF and BED files, utilizes compressed sparse matrices, and employs truncated SVD for dimensionality reduction. The package also includes k-fold cross-validation for robust model evaluation and offers tools for clustering and visualizing genetic data. DAPCy is designed to be more computationally efficient and memory-friendly than the original R implementation, making it ideal for population analysis with large genomic datasets.
Installation
Note: DAPCy is currently not available on Windows platforms due to its dependency on
bio2zarrandCyvcf2. We suggest Windows users to install the package inside a WSL environment.
DAPCy is available via pip (on PyPi) or conda/mamba (on the bioconda channel). It should ideally be installed inside a virtual environment.
pip:
python -m venv <my-env>
source <my-env>/bin/activate
pip install dapcy
conda/mamba:
conda create --name <my-env>
conda activate <my-env>
conda install -c bioconda dapcy
Documentation and tutorial
For more information on DAPCy, please refer to the documentation: https://uhasselt-bioinfo.gitlab.io/dapcy/.
Tutorial: The Plasmodium falciparum Pf7 dataset from the MalariaGEN Consortium
We have also created a tutorial on how to use the package, using the Plasmodium falciparum Pf7 dataset as a case study and made it available here. You can also download the associated Jupyter notebook from this repository to play around with the code yourself. All files used in the tutorial can be found in this Zenodo archive.
Citation
If you use DAPCy in your own work, you can cite:
[manuscript currently in revision]
Release notes
See https://uhasselt-bioinfo.gitlab.io/dapcy/about/release_notes/.
Contributors
- Alejandro Correa Rojo
- Pieter Moris
- Hanne Meuwissen
- Pieter Monsieurs
- Dirk Valkenborg
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dapcy-1.3.0.post1.tar.gz.
File metadata
- Download URL: dapcy-1.3.0.post1.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c610df90014361d475a414d69ff8afb51cb86fc3a817dceadec920fe4ba0f42
|
|
| MD5 |
0d6b9f8f56f24737ea5533e262f952f4
|
|
| BLAKE2b-256 |
e5c13304d7327250eec5499aea70c8481c363d42ce72259fdb64e83fcfdb225e
|
File details
Details for the file dapcy-1.3.0.post1-py3-none-any.whl.
File metadata
- Download URL: dapcy-1.3.0.post1-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83f354b915a44c5ca4e42343a6c68746a9eb633eed57820cf86490e0041972fa
|
|
| MD5 |
99b4c88a22d73de9dab231b7e54c321a
|
|
| BLAKE2b-256 |
332bfcfe15bb680e3c382fc3a15688874dcec147c0b21f5591e72231040c4f8d
|