Relative clustering validation to select best number of clusters
Project description
reval
: stability-based relative clustering validation method to determine the best number of clusters
Determining the number of clusters that best partitions a dataset can be a challenging task because of 1) the lack of a
priori information within an unsupervised learning framework; and 2) the absence of a unique clustering validation
approach to evaluate clustering solutions. Here we present reval
: a Python package that leverages
stability-based relative clustering validation methods to determine best clustering solutions, as described in [1].
Statistical software, both in R and Python, usually compute internal validation metrics that can be leveraged to select the number of clusters that best fit the data and open-source software solutions that easily implement relative clustering techniques are lacking. The advantage of a relative approach over internal validation methods lies in the fact that internal metrics exploit characteristics of the data itself to produce a result, whereas relative validation converts an unsupervised clustering algorithm into a supervised classification problem, hence enabling generalizability and replicability of the results.
Requirements
python>=3.6
Installing
From github:
git clone https://github.com/IIT-LAND/reval_clustering
pip install -r requirements.txt
PyPI alternative:
pip install reval
Documentation
Code documentation can be found here. Documents include Python code
descriptions, reval
usage examples,
performance on benchmark datasets, and common issues that can be encountered related to a dataset number of features
and samples.
Refrences
[1] Lange, T., Roth, V., Braun, M. L., & Buhmann, J. M. (2004). Stability-based validation of clustering solutions. Neural computation, 16(6), 1299-1323.
Cite as
Isotta Landi, Veronica Mandelli, & Michael Vincent Lombardo. (2020, June 29).
reval: stability-based relative clustering validation method to determine the best number of clusters
(Version v1.0.0). Zenodo. http://doi.org/10.5281/zenodo.3922334
BibTeX alternative
@software{isotta_landi_2020_3922334,
author = {Isotta Landi and
Veronica Mandelli and
Michael Vincent Lombardo},
title = {{reval: stability-based relative clustering
validation method to determine the best number of
clusters}},
month = jun,
year = 2020,
publisher = {Zenodo},
version = {v1.0.0},
doi = {10.5281/zenodo.3922334},
url = {https://doi.org/10.5281/zenodo.3922334}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file reval-0.1.0.tar.gz
.
File metadata
- Download URL: reval-0.1.0.tar.gz
- Upload date:
- Size: 24.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.0.post20201006 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3625895302fab2ee008471c800c14365639f57329045e2e8399d2d1593d9b0af |
|
MD5 | d06c26f6ebe7187689cf0cbe8eb77dfa |
|
BLAKE2b-256 | 10e21a879511fb94285353adb00c761933b3fa5d3b44234b71946659d6f47a9b |