clusteval is a python package that provides various methods for unsupervised cluster validation.
Project description
clusteval
clusteval
is Python package for unsupervised cluster evaluation. Three methods are implemented that can be used to evalute clusterings; silhouette, dbindex, and derivative Four clustering methods can be used: agglomerative, kmeans, dbscan and hdbscan.
Contents
Installation
-
Install clusteval from PyPI (recommended). clusteval is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
-
It is distributed under the MIT license.
-
A new environment can be created as following:
conda create -n env_clusteval python=3.6
conda activate env_clusteval
pip install clusteval
- Beta version can be installed from the GitHub source:
git clone https://github.com/erdogant/clusteval
cd clusteval
pip install -U .
Import clusteval package
from clusteval import clusteval
Create example data set
# Generate random data
from sklearn.datasets import make_blobs
X, labx_true = make_blobs(n_samples=750, centers=4, n_features=2, cluster_std=0.5)
Cluster validation using Silhouette score
# Determine the optimal number of clusters
ce = clusteval(method='silhouette')
ce.fit(X)
ce.plot()
ce.dendrogram()
ce.scatter(X)
Cluster validation using davies-boulin index
# Determine the optimal number of clusters
ce = clusteval(method='dbindex')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()
Cluster validation using derivative method
# Determine the optimal number of clusters
ce = clusteval(method='derivative')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()
Cluster validation using dbscan
# Determine the optimal number of clusters using dbscan and silhoutte
ce = clusteval(cluster='dbscan')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()
Cluster validation using hdbscan
To run hdbscan, it requires an installation. This library is not included in the clusteval
setup file because it frequently gives installation issues.
pip install hdbscan
# Determine the optimal number of clusters
ce = clusteval(cluster='hdbscan')
ce.plot()
ce.scatter(X)
Citation
Please cite clusteval in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2019clusteval,
title={clusteval},
author={Erdogan Taskesen},
year={2019},
howpublished={\url{https://github.com/erdogant/clusteval}},
}
TODO
- Use ARI when the ground truth clustering has large equal sized clusters
- Usa AMI when the ground truth clustering is unbalanced and there exist small clusters
- https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html
- https://scikit-learn.org/stable/auto_examples/cluster/plot_adjusted_for_chance_measures.html#sphx-glr-auto-examples-cluster-plot-adjusted-for-chance-measures-py
Maintainer
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
clusteval-2.0.4.tar.gz
(21.3 kB
view hashes)
Built Distribution
clusteval-2.0.4-py3-none-any.whl
(28.0 kB
view hashes)
Close
Hashes for clusteval-2.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f504984a4206b140c9f82f6895f1251def648a50449f0ff6ebb0f79135860db1 |
|
MD5 | 4a16d8fdfe95fa9ca69187084be3fa2b |
|
BLAKE2b-256 | 4ceddc98d80f0ee9def414600ffcfc147daac7079276f2ea9ed9859d9c98d97b |