Skip to main content

clusteval is a python package that provides various methods for unsupervised cluster validation.

Project description

clusteval

Python PyPI Version License Coffee Github Forks GitHub Open Issues Project Status Downloads Downloads

  • clusteval is Python package for unsupervised cluster evaluation. Three methods are implemented that can be used to evalute clusterings; silhouette, dbindex, and derivative Four clustering methods can be used: agglomerative, kmeans, dbscan and hdbscan.

Contents

Installation

  • Install clusteval from PyPI (recommended). clusteval is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.

  • It is distributed under the MIT license.

  • A new environment can be created as following:

conda create -n env_clusteval python=3.6
conda activate env_clusteval
pip install clusteval
  • Beta version can be installed from the GitHub source:
git clone https://github.com/erdogant/clusteval
cd clusteval
pip install -U .

Import clusteval package

from clusteval import clusteval

Create example data set

# Generate random data
from sklearn.datasets import make_blobs
X, labx_true = make_blobs(n_samples=750, centers=4, n_features=2, cluster_std=0.5)

Cluster validation using Silhouette score

# Determine the optimal number of clusters

ce = clusteval(method='silhouette')
ce.fit(X)
ce.plot()
ce.dendrogram()
ce.scatter(X)

Cluster validation using davies-boulin index

# Determine the optimal number of clusters
ce = clusteval(method='dbindex')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

Cluster validation using derivative method

# Determine the optimal number of clusters
ce = clusteval(method='derivative')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

Cluster validation using dbscan

# Determine the optimal number of clusters using dbscan and silhoutte
ce = clusteval(cluster='dbscan')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

Cluster validation using hdbscan

To run hdbscan, it requires an installation. This library is not included in the clusteval setup file because it frequently gives installation issues.

pip install hdbscan
# Determine the optimal number of clusters
ce = clusteval(cluster='hdbscan')
ce.plot()
ce.scatter(X)

Citation

Please cite clusteval in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019clusteval,
  title={clusteval},
  author={Erdogan Taskesen},
  year={2019},
  howpublished={\url{https://github.com/erdogant/clusteval}},
}

TODO

Maintainer

  • Erdogan Taskesen, github: erdogant
  • Contributions are welcome.
  • If you wish to buy me a Coffee for this work, it is very appreciated :) Star it if you like it!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clusteval-2.0.2.tar.gz (17.8 kB view hashes)

Uploaded Source

Built Distribution

clusteval-2.0.2-py3-none-any.whl (27.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page