clusteval is a python package that provides various methods for unsupervised cluster validation.
Project description
clusteval
- clusteval is Python package for unsupervised cluster evaluation. Five methods are implemented that can be used to evalute clusterings; silhouette, dbindex, derivative, dbscan and hdbscan.
Methods
# X is your data
out = clusteval.fit(X)
clusteval.plot(out, X)
Contents
Installation
- Install clusteval from PyPI (recommended). clusteval is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- It is distributed under the MIT license.
Requirements
- It is advisable to create a new environment.
conda create -n env_clusteval python=3.6
conda activate env_clusteval
pip install matplotlib numpy pandas tqdm seaborn hdbscan sklearn
Quick Start
pip install clusteval
- Alternatively, install clusteval from the GitHub source:
git clone https://github.com/erdogant/clusteval.git
cd clusteval
python setup.py install
Import clusteval package
import clusteval as clusteval
Create example data set
# Generate some random data
from sklearn.datasets import make_blobs
[X,_] = make_blobs(n_samples=750, centers=4, n_features=2, cluster_std=0.5)
Cluster validation using Silhouette score
# Determine the optimal number of clusters
out = clusteval.fit(X, method='silhouette')
fig = clusteval.plot(out, X)
Cluster validation using davies-boulin index
# Determine the optimal number of clusters
out = clusteval.fit(X, method='dbindex')
fig = clusteval.plot(out, X)
Cluster validation using derivative method
# Determine the optimal number of clusters
out = clusteval.fit(X, method='derivative')
fig = clusteval.plot(out)
Cluster validation using hdbscan
# Determine the optimal number of clusters
out = clusteval.fit(X, method='hdbscan')
fig = clusteval.plot(out)
Cluster validation using dbscan
# Determine the optimal number of clusters
out = clusteval.fit(X, method='dbscan')
fig = clusteval.plot(out, X)
Citation
Please cite clusteval in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2019clusteval,
title={clusteval},
author={Erdogan Taskesen},
year={2019},
howpublished={\url{https://github.com/erdogant/clusteval}},
}
TODO
- Use ARI when the ground truth clustering has large equal sized clusters
- Usa AMI when the ground truth clustering is unbalanced and there exist small clusters https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html https://scikit-learn.org/stable/auto_examples/cluster/plot_adjusted_for_chance_measures.html#sphx-glr-auto-examples-cluster-plot-adjusted-for-chance-measures-py
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- Contributions are welcome.
Licence
See LICENSE for details.
Donation
- This package is created and maintained in my free time. If this package is usefull, you can show your gratitude :) Thanks!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
clusteval-0.1.1.tar.gz
(13.9 kB
view hashes)
Built Distribution
clusteval-0.1.1-py3-none-any.whl
(20.4 kB
view hashes)
Close
Hashes for clusteval-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5b41d6d7f5fd310445d0e2e9ff8daccd99e1dfcc2d6e02c42cc4d0d5cf1910a |
|
MD5 | 9c6f1495934edbe502e7fd9b25f7b896 |
|
BLAKE2b-256 | 5d0c5121b73c8a61e38b7cf94c43755f2325d76d1db0783dc5c09b3dd56f8171 |