Skip to main content

The Density Peak Advanced packages.

Project description

Status of the scikit-learn compatibility test:

scikit-learn compatibility test status on GitHub Actions

The DPA package implements the Density Peaks Advanced (DPA) clustering algorithm as introduced in the paper “Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering”, published on M. d’Errico, E. Facco, A. Laio, A. Rodriguez, Information Sciences, Volume 560, June 2021, 476-492 (also available on arXiv).

The package offers the following features:

Top-level directory layout

cd DPA
ls -l
.
|-- DP/                              # Auxiliary package with the DP clustering implementation.
|-- docs/                            # Documentation files.
|-- Examples/                        # Auxiliary scripts for the examples generations.
|-- DPA_analysis.ipynb               # Use-case example for DPA.
|-- DPA_comparison-all.ipynb         # Performance comparison with other clustering methods.
|-- README.rst
|-- compile.sh
|-- setup.py
|-- src/                             # Source files for DPA, PAk and twoNN algorithms.

Source files

The source Python codes are stored inside the src folder.

.
|-- ...
|-- src/
|   |-- Pipeline/
|       |-- __init__.py
|       |-- DPA.py           # Python module implementing the DPA
|       |                    # clustering algorithm.
|       |
|       |-- _DPA.pyx         # Cython extension of the DPA module.
|       |
|       |-- PAk.py           # Python module implementing the PAk
|       |                    # density estimator.
|       |
|       |-- _PAk.pyx         # Cython extension of the PAk module.
|       |
|       |-- twoNN.py         # Python module implementing the TWO-NN
|                            # algorithm for the ID calculation.
|
|-- ...

Documentation files

Full documentation about the Python codes developed and the how-to instructions is created in the docs folder using Sphinx. Complete documentation for DPA is available on the Read The Docs website.

Jupyter notebooks

Examples of how-to run the DPA, PAk and twoNN modules are provided as Jupyter notebook in DPA_analysis.ipynb. Additional useful use-cases are available in DPA_comparison-all.ipynb, which include a performance comparison with the following clustering methods: Bayesian Gaussian Mixture, HDBSCAN, Spectral Clustering and Density Peaks.

Both jupyter notebooks are also available as Python script (saved using jupytext) in the jupytext folder.

.
|-- ...
|-- DPA_analysis.ipynb               # Use-case example for DPA.
|-- DPA_comparison-all.ipynb         # Performance comparison with
|                                    # other clustering methods.
|
|-- ...
|-- jupytext/
|   |-- DPA_analysis.py              # DPA_analysis.ipynb saved as
|   |                                # Python script.
|   |-- DPA_comparison-all.py        # DPA_comparison-all.ipynb
|                                    # saved as Python script.

Getting started

The source code of DPA is on github DPA repository.

You need the git command in order to be able to clone it, and we suggest you to use Python virtual environment in order to create a controlled environment in which you can install DPA as normal user avoiding conflicts with system files or Python libraries.

The following section documents the steps required to install DPA on a Linux or Windows/Mac computer.

Debian/Ubuntu

Run the following commands to create and activate a Python virtual environment with python virtualenv:

apt-get install git python-dev virtualenv*
virtualenv -p python3 venvdpa
. venvdpa/bin/activate

Windows

A possible setup makes use of Anaconda. It has preinstalled and configured packages for data analysis and it is available on all major platforms. It uses conda as package manager, in addition to the standard pip.

A versioning control can be installed by downloading git.

Run the following commands to activate the conda virtual environment:

conda create -n venvdpa
conda activate venvdpa

to list the available environments you can type conda info --envs, and to deactivate an active environment use source deactivate.

Installation

Install required dependencies

The DPA package depends on easycython, that can be installed using conda or pip. Note that it is possible to check which packages are installed with the pip freeze command.

Installing released code from GitHub

Install the latest version from the GitHub repository via:

pip install git+https://github.com/mariaderrico/DPA

Installing development code from GitHub

Run the following commands to download the DPA source code:

git clone https://github.com/mariaderrico/DPA.git

Install DPA with the following commands:

cd DPA
. compile.sh

Citing

If you have used this codebase in a scientific publication and wish to cite the algorithm, please cite our paper in Information Sciences.

M. d’Errico, E. Facco, A. Laio, A. Rodriguez, Information Sciences, Volume 560, June 2021, 476-492

@article{DERRICO2021476,
  title = {Automatic topography of high-dimensional data sets by non-parametric density peak clustering},
  journal = {Information Sciences},
  volume = {560},
  pages = {476-492},
  year = {2021},
  issn = {0020-0255},
  doi = {https://doi.org/10.1016/j.ins.2021.01.010},
  url = {https://www.sciencedirect.com/science/article/pii/S0020025521000116},
  author = {Maria d’Errico and Elena Facco and Alessandro Laio and Alex Rodriguez},
  }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DPA-0.0.3.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

DPA-0.0.3-cp37-cp37m-macosx_10_15_x86_64.whl (200.7 kB view details)

Uploaded CPython 3.7m macOS 10.15+ x86-64

File details

Details for the file DPA-0.0.3.tar.gz.

File metadata

  • Download URL: DPA-0.0.3.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.7.7

File hashes

Hashes for DPA-0.0.3.tar.gz
Algorithm Hash digest
SHA256 4c6c1b9e20275339d49ecc7c1be27bc6f259e60a140e4a3f4ca44ae6fced2001
MD5 4a40e05c8bb9daee34e771c56d832fa9
BLAKE2b-256 8b3a2259bc2bb994fdf9d6f36c5f990519a61ba7b73c48bae36eb7f6cef52791

See more details on using hashes here.

File details

Details for the file DPA-0.0.3-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

  • Download URL: DPA-0.0.3-cp37-cp37m-macosx_10_15_x86_64.whl
  • Upload date:
  • Size: 200.7 kB
  • Tags: CPython 3.7m, macOS 10.15+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.7.7

File hashes

Hashes for DPA-0.0.3-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 49e50564aebb423227a1163ad40e6eda744cfa37392c809029f2bac9fe9b9a0a
MD5 a4d598e5fb0dc4da769b0b76f3d3e250
BLAKE2b-256 3cb8f3abdf4b0de3343fade48592e892d2f10a62d5009011d23bbc3178dd1a98

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page