Skip to main content

clusteval is a python package that provides various methods for unsupervised cluster validation.

Project description

clusteval

Python PyPI Version License Coffee Github Forks GitHub Open Issues Project Status Downloads Downloads

  • clusteval is Python package for unsupervised cluster evaluation. Five methods are implemented that can be used to evalute clusterings; silhouette, dbindex, derivative, dbscan and hdbscan.

Contents

Installation

  • Install clusteval from PyPI (recommended). clusteval is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.

  • It is distributed under the MIT license.

  • A new environment can be created as following:

conda create -n env_clusteval python=3.6
conda activate env_clusteval
pip install clusteval
  • Beta version can be installed from the GitHub source:
git clone https://github.com/erdogant/clusteval
cd clusteval
pip install -U .

Import clusteval package

from clusteval import clusteval

Create example data set

# Generate random data
from sklearn.datasets import make_blobs
X, labx_true = make_blobs(n_samples=750, centers=4, n_features=2, cluster_std=0.5)

Cluster validation using Silhouette score

# Determine the optimal number of clusters

ce = clusteval(method='silhouette')
ce.fit(X)
ce.plot()
ce.dendrogram()
ce.scatter(X)

Cluster validation using davies-boulin index

# Determine the optimal number of clusters
ce = clusteval(method='dbindex')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

Cluster validation using derivative method

# Determine the optimal number of clusters
ce = clusteval(method='derivative')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

Cluster validation using dbscan

# Determine the optimal number of clusters using dbscan and silhoutte
ce = clusteval(cluster='dbscan')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

Cluster validation using hdbscan

To run hdbscan, it requires an installation. This library is not included in the clusteval setup file because it frequently gives installation issues.

pip install hdbscan
# Determine the optimal number of clusters
ce = clusteval(cluster='hdbscan')
ce.plot()
ce.scatter(X)

Citation

Please cite clusteval in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019clusteval,
  title={clusteval},
  author={Erdogan Taskesen},
  year={2019},
  howpublished={\url{https://github.com/erdogant/clusteval}},
}

TODO

Maintainer

  • Erdogan Taskesen, github: erdogant
  • Contributions are welcome.
  • If you wish to buy me a Coffee for this work, it is very appreciated :) Star it if you like it!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for clusteval, version 2.0.0
Filename, size File type Python version Upload date Hashes
Filename, size clusteval-2.0.0-py3-none-any.whl (24.8 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size clusteval-2.0.0.tar.gz (17.1 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page