Skip to main content

Relative clustering validation to select best number of clusters

Project description

reval: stability-based relative clustering validation method to determine the best number of clusters

Determining the number of clusters that best partitions a dataset can be a challenging task because of 1) the lack of a priori information within an unsupervised learning framework; and 2) the absence of a unique clustering validation approach to evaluate clustering solutions. Here we present reval: a Python package that leverages stability-based relative clustering validation methods to determine best clustering solutions, as described in [1].

Statistical software, both in R and Python, usually compute internal validation metrics that can be leveraged to select the number of clusters that best fit the data and open-source software solutions that easily implement relative clustering techniques are lacking. The advantage of a relative approach over internal validation methods lies in the fact that internal metrics exploit characteristics of the data itself to produce a result, whereas relative validation converts an unsupervised clustering algorithm into a supervised classification problem, hence enabling generalizability and replicability of the results.

Requirements

python>=3.6

Installing

From github:

git clone https://github.com/IIT-LAND/reval_clustering
pip install -r requirements.txt

PyPI alternative:

pip install reval

Documentation

Code documentation can be found here. Documents include Python code descriptions, reval usage examples, performance on benchmark datasets, and common issues that can be encountered related to a dataset number of features and samples.

Refrences

[1] Lange, T., Roth, V., Braun, M. L., & Buhmann, J. M. (2004). Stability-based validation of clustering solutions. Neural computation, 16(6), 1299-1323.

Cite as

Isotta Landi, Veronica Mandelli, & Michael Vincent Lombardo. (2020, June 29). 
reval: stability-based relative clustering validation method to determine the best number of clusters 
(Version v1.0.0). Zenodo. http://doi.org/10.5281/zenodo.3922334

BibTeX alternative

@software{isotta_landi_2020_3922334,
          author       = {Isotta Landi and
                          Veronica Mandelli and
                          Michael Vincent Lombardo},
          title        = {{reval: stability-based relative clustering 
                           validation method to determine the best number of
                           clusters}},
          month        = jun,
          year         = 2020,
          publisher    = {Zenodo},
          version      = {v1.0.0},
          doi          = {10.5281/zenodo.3922334},
          url          = {https://doi.org/10.5281/zenodo.3922334}
        }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reval-0.1.0.tar.gz (24.6 kB view details)

Uploaded Source

File details

Details for the file reval-0.1.0.tar.gz.

File metadata

  • Download URL: reval-0.1.0.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.0.post20201006 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for reval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3625895302fab2ee008471c800c14365639f57329045e2e8399d2d1593d9b0af
MD5 d06c26f6ebe7187689cf0cbe8eb77dfa
BLAKE2b-256 10e21a879511fb94285353adb00c761933b3fa5d3b44234b71946659d6f47a9b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page