A Python package for cluster ensembles
Project description
ClusterEnsembles
A Python package for cluster ensembles. Cluster ensembles generate a single consensus clustering label by using base labels obtained from multiple clustering algorithms. The consensus clustering label stably achieves a high clustering performance.
This package was originally authored by Takehiro Sano but has since been removed from PyPi. This a cloned version I am maintaining. All original code and functionality is unchanged, it is just maintained, tested, and published from here.
Installation
pip install ensembleclustering
Usage
CE.cluster_ensembles
is used as follows.
>> > import numpy as np
>> > import ensembleclustering as CE
>> > label1 = np.array([1, 1, 1, 2, 2, 3, 3])
>> > label2 = np.array([2, 2, 2, 3, 3, 1, 1])
>> > label3 = np.array([4, 4, 2, 2, 3, 3, 3])
>> > label4 = np.array([1, 2, np.nan, 1, 2, np.nan, np.nan]) # `np.nan`: missing value
>> > labels = np.array([label1, label2, label3, label4])
>> > label_ce = CE.cluster_ensembles(labels)
>> > print(label_ce)
[1 1 1 2 2 0 0]
Parameters
-
labels
: numpy.ndarrayLabels generated by multiple clustering algorithms such as K-Means.
Note: Assume that the length of each label is the same.
-
nclass
: int, default=NoneNumber of classes in a consensus clustering label. If
nclass=None
, set the maximum number of classes in each label except missing values. In other words, setnclass=3
automatically in the above. -
solver
: {'cspa', 'hgpa', 'mcla', 'hbgf', 'nmf', 'all'}, default='hbgf''cspa': Cluster-based Similarity Partitioning Algorithm [1].
'hgpa': HyperGraph Partitioning Algorithm [1].
'mcla': Meta-CLustering Algorithm [1].
'hbgf': Hybrid Bipartite Graph Formulation [2].
'nmf': NMF-based consensus clustering [3].
'all': The consensus clustering label with the largest objective function value [1] is returned among the results of all solvers.
Note: Please use 'hbgf' for large-scale
labels
. -
random_state
: int, default=NoneUsed for 'hgpa', 'mcla', and 'nmf'. Please pass an integer for reproducible results.
-
verbose
: bool, default=FalseWhether to be verbose.
Return
-
label_ce
: numpy.ndarrayA consensus clustering label generated by cluster ensembles.
Example
tsano430/egnmf
: https://github.com/tsano430/egnmf
Similar Package
GGiecold/Cluster_Ensembles
: https://github.com/GGiecold/Cluster_Ensembles
References
[1] A. Strehl and J. Ghosh, "Cluster ensembles -- a knowledge reuse framework for combining multiple partitions," Journal of Machine Learning Research, vol. 3, pp. 583-617, 2002.
[2] X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning," In Proceedings of the Twenty-First International Conference on Machine Learning, p. 36, 2004.
[3] T. Li, C. Ding, and M. I. Jordan, "Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization," In Proceedings of the Seventh IEEE International Conference on Data Mining, pp. 577-582, 2007.
[4] J. Ghosh and A. Acharya, "Cluster ensembles," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 305-315, 2011.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ensembleclustering-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 574515c8641d002108bd86a167415bbff3ef2643064fa58e7678e434df18fbc0 |
|
MD5 | 62916101b1769a9b399c728860e5b738 |
|
BLAKE2b-256 | 5652081ab2a5b8a83534bfada1f6a70aa419fbd851c826a3ef6afef2fef20a92 |