Skip to main content

An sklearn implementation of discriminant analysis of principal components (DAPC) for population genetics

Project description

GitLab Release GitLab Last Commit PyPI - Version Conda Version GitLab License

DAPCy

DAPCy is a Python package that enhances the Discriminant Analysis of Principal Components (DAPC) method for population genetics (Jombart et al. 2010). Using the scikit-learn library, DAPCy efficiently handles large genomic datasets. It supports VCF and BED files, utilizes compressed sparse matrices, and employs truncated SVD for dimensionality reduction. The package also includes k-fold cross-validation for robust model evaluation and offers tools for clustering and visualizing genetic data. DAPCy is designed to be more computationally efficient and memory-friendly than the original R implementation, making it ideal for population analysis with large genomic datasets.

Installation

Note: DAPCy is currently not available on Windows platforms due to its dependency on bio2zarr and Cyvcf2. We suggest Windows users to install the package inside a WSL environment.

DAPCy is available via pip (on PyPi) or conda/mamba (on the bioconda channel). It should ideally be installed inside a virtual environment.

pip:

python -m venv <my-env>
source <my-env>/bin/activate
pip install dapcy

conda/mamba:

conda create --name <my-env>
conda activate <my-env>
conda install -c bioconda dapcy

Documentation and tutorial

For more information on DAPCy, please refer to the documentation: https://uhasselt-bioinfo.gitlab.io/dapcy/.

Tutorial: The Plasmodium falciparum Pf7 dataset from the MalariaGEN Consortium

We have also created a tutorial on how to use the package, using the Plasmodium falciparum Pf7 dataset as a case study and made it available here. You can also download the associated Jupyter notebook from this repository to play around with the code yourself. All files used in the tutorial can be found in this Zenodo archive.

Citation

If you use DAPCy in your own work, you can cite:

[manuscript currently in revision]

Release notes

See https://uhasselt-bioinfo.gitlab.io/dapcy/about/release_notes/.

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dapcy-1.3.0.post1.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dapcy-1.3.0.post1-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file dapcy-1.3.0.post1.tar.gz.

File metadata

  • Download URL: dapcy-1.3.0.post1.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for dapcy-1.3.0.post1.tar.gz
Algorithm Hash digest
SHA256 1c610df90014361d475a414d69ff8afb51cb86fc3a817dceadec920fe4ba0f42
MD5 0d6b9f8f56f24737ea5533e262f952f4
BLAKE2b-256 e5c13304d7327250eec5499aea70c8481c363d42ce72259fdb64e83fcfdb225e

See more details on using hashes here.

File details

Details for the file dapcy-1.3.0.post1-py3-none-any.whl.

File metadata

  • Download URL: dapcy-1.3.0.post1-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for dapcy-1.3.0.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 83f354b915a44c5ca4e42343a6c68746a9eb633eed57820cf86490e0041972fa
MD5 99b4c88a22d73de9dab231b7e54c321a
BLAKE2b-256 332bfcfe15bb680e3c382fc3a15688874dcec147c0b21f5591e72231040c4f8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page