Skip to main content

A Python package for cluster ensembles

Project description

ClusterEnsembles

PyPI License: MIT PyPI - Python Version GitHubActions

A Python package for cluster ensembles. Cluster ensembles generate a single consensus clustering label by using base labels obtained from multiple clustering algorithms. The consensus clustering label stably achieves a high clustering performance.

This package was originally authored by Takehiro Sano but has since been removed from PyPi. This a cloned version I am maintaining. All original code and functionality is unchanged, it is just maintained, tested, and published from here.

Installation

pip install ensembleclustering

Usage

CE.cluster_ensembles is used as follows.

>> > import numpy as np

>> > import ensembleclustering as CE

>> > label1 = np.array([1, 1, 1, 2, 2, 3, 3])

>> > label2 = np.array([2, 2, 2, 3, 3, 1, 1])

>> > label3 = np.array([4, 4, 2, 2, 3, 3, 3])

>> > label4 = np.array([1, 2, np.nan, 1, 2, np.nan, np.nan])  # `np.nan`: missing value

>> > labels = np.array([label1, label2, label3, label4])

>> > label_ce = CE.cluster_ensembles(labels)

>> > print(label_ce)
[1 1 1 2 2 0 0]

Parameters

  • labels: numpy.ndarray

    Labels generated by multiple clustering algorithms such as K-Means.

    Note: Assume that the length of each label is the same.

  • nclass: int, default=None

    Number of classes in a consensus clustering label. If nclass=None, set the maximum number of classes in each label except missing values. In other words, set nclass=3 automatically in the above.

  • solver: {'cspa', 'hgpa', 'mcla', 'hbgf', 'nmf', 'all'}, default='hbgf'

    'cspa': Cluster-based Similarity Partitioning Algorithm [1].

    'hgpa': HyperGraph Partitioning Algorithm [1].

    'mcla': Meta-CLustering Algorithm [1].

    'hbgf': Hybrid Bipartite Graph Formulation [2].

    'nmf': NMF-based consensus clustering [3].

    'all': The consensus clustering label with the largest objective function value [1] is returned among the results of all solvers.

    Note: Please use 'hbgf' for large-scale labels.

  • random_state: int, default=None

    Used for 'hgpa', 'mcla', and 'nmf'. Please pass an integer for reproducible results.

  • verbose: bool, default=False

    Whether to be verbose.

Return

  • label_ce: numpy.ndarray

    A consensus clustering label generated by cluster ensembles.

Example

tsano430/egnmf: https://github.com/tsano430/egnmf

Similar Package

GGiecold/Cluster_Ensembles: https://github.com/GGiecold/Cluster_Ensembles

References

[1] A. Strehl and J. Ghosh, "Cluster ensembles -- a knowledge reuse framework for combining multiple partitions," Journal of Machine Learning Research, vol. 3, pp. 583-617, 2002.

[2] X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning," In Proceedings of the Twenty-First International Conference on Machine Learning, p. 36, 2004.

[3] T. Li, C. Ding, and M. I. Jordan, "Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization," In Proceedings of the Seventh IEEE International Conference on Data Mining, pp. 577-582, 2007.

[4] J. Ghosh and A. Acharya, "Cluster ensembles," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 305-315, 2011.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ensembleclustering-1.0.2.tar.gz (12.5 kB view hashes)

Uploaded Source

Built Distribution

ensembleclustering-1.0.2-py3-none-any.whl (14.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page