A Python package for cluster ensembles
Project description
ClusterEnsembles
A Python package for cluster ensembles. Cluster ensembles generate a single consensus clustering label by using base labels obtained from multiple clustering algorithms. The consensus clustering label stably achieves a high clustering performance.
This package was originally authored by Takehiro Sano but has since been removed from PyPi. This a cloned version I am maintaining. All original code and functionality is unchanged, it is just maintained, tested, and published from here.
Installation
pip install ensembleclustering
Usage
CE.cluster_ensembles
is used as follows.
>> > import numpy as np
>> > import ensembleclustering as CE
>> > label1 = np.array([1, 1, 1, 2, 2, 3, 3])
>> > label2 = np.array([2, 2, 2, 3, 3, 1, 1])
>> > label3 = np.array([4, 4, 2, 2, 3, 3, 3])
>> > label4 = np.array([1, 2, np.nan, 1, 2, np.nan, np.nan]) # `np.nan`: missing value
>> > labels = np.array([label1, label2, label3, label4])
>> > label_ce = CE.cluster_ensembles(labels)
>> > print(label_ce)
[1 1 1 2 2 0 0]
Parameters
-
labels
: numpy.ndarrayLabels generated by multiple clustering algorithms such as K-Means.
Note: Assume that the length of each label is the same.
-
nclass
: int, default=NoneNumber of classes in a consensus clustering label. If
nclass=None
, set the maximum number of classes in each label except missing values. In other words, setnclass=3
automatically in the above. -
solver
: {'cspa', 'hgpa', 'mcla', 'hbgf', 'nmf', 'all'}, default='hbgf''cspa': Cluster-based Similarity Partitioning Algorithm [1].
'hgpa': HyperGraph Partitioning Algorithm [1].
'mcla': Meta-CLustering Algorithm [1].
'hbgf': Hybrid Bipartite Graph Formulation [2].
'nmf': NMF-based consensus clustering [3].
'all': The consensus clustering label with the largest objective function value [1] is returned among the results of all solvers.
Note: Please use 'hbgf' for large-scale
labels
. -
random_state
: int, default=NoneUsed for 'hgpa', 'mcla', and 'nmf'. Please pass an integer for reproducible results.
-
verbose
: bool, default=FalseWhether to be verbose.
Return
-
label_ce
: numpy.ndarrayA consensus clustering label generated by cluster ensembles.
Example
tsano430/egnmf
: https://github.com/tsano430/egnmf
Similar Package
GGiecold/Cluster_Ensembles
: https://github.com/GGiecold/Cluster_Ensembles
References
[1] A. Strehl and J. Ghosh, "Cluster ensembles -- a knowledge reuse framework for combining multiple partitions," Journal of Machine Learning Research, vol. 3, pp. 583-617, 2002.
[2] X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning," In Proceedings of the Twenty-First International Conference on Machine Learning, p. 36, 2004.
[3] T. Li, C. Ding, and M. I. Jordan, "Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization," In Proceedings of the Seventh IEEE International Conference on Data Mining, pp. 577-582, 2007.
[4] J. Ghosh and A. Acharya, "Cluster ensembles," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 305-315, 2011.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ensembleclustering-1.0.2.tar.gz
.
File metadata
- Download URL: ensembleclustering-1.0.2.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.10.6 Linux/5.15.0-60-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7019b7636abd84e8d352b47838d055dbb2e330199fd2709465774e69635593c3 |
|
MD5 | 6e0bb317ca4b3b06856821127ef4781c |
|
BLAKE2b-256 | 24d5fd5f8bebfd270ae01ffc75db0de690020ef753aa30ef85a925cb9913a972 |
File details
Details for the file ensembleclustering-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: ensembleclustering-1.0.2-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.10.6 Linux/5.15.0-60-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 574515c8641d002108bd86a167415bbff3ef2643064fa58e7678e434df18fbc0 |
|
MD5 | 62916101b1769a9b399c728860e5b738 |
|
BLAKE2b-256 | 5652081ab2a5b8a83534bfada1f6a70aa419fbd851c826a3ef6afef2fef20a92 |