Skip to main content

A Python package for cluster ensembles

Project description

ClusterEnsembles

PyPI License: MIT PyPI - Python Version GitHubActions

A Python package for cluster ensembles. Cluster ensembles generate a single consensus clustering label by using base labels obtained from multiple clustering algorithms. The consensus clustering label stably achieves a high clustering performance.

This package was originally authored by Takehiro Sano but has since been removed from PyPi. This a cloned version I am maintaining. All original code and functionality is unchanged, it is just maintained, tested, and published from here.

Installation

pip install ensembleclustering

Usage

CE.cluster_ensembles is used as follows.

>> > import numpy as np

>> > import ensembleclustering as CE

>> > label1 = np.array([1, 1, 1, 2, 2, 3, 3])

>> > label2 = np.array([2, 2, 2, 3, 3, 1, 1])

>> > label3 = np.array([4, 4, 2, 2, 3, 3, 3])

>> > label4 = np.array([1, 2, np.nan, 1, 2, np.nan, np.nan])  # `np.nan`: missing value

>> > labels = np.array([label1, label2, label3, label4])

>> > label_ce = CE.cluster_ensembles(labels)

>> > print(label_ce)
[1 1 1 2 2 0 0]

Parameters

  • labels: numpy.ndarray

    Labels generated by multiple clustering algorithms such as K-Means.

    Note: Assume that the length of each label is the same.

  • nclass: int, default=None

    Number of classes in a consensus clustering label. If nclass=None, set the maximum number of classes in each label except missing values. In other words, set nclass=3 automatically in the above.

  • solver: {'cspa', 'hgpa', 'mcla', 'hbgf', 'nmf', 'all'}, default='hbgf'

    'cspa': Cluster-based Similarity Partitioning Algorithm [1].

    'hgpa': HyperGraph Partitioning Algorithm [1].

    'mcla': Meta-CLustering Algorithm [1].

    'hbgf': Hybrid Bipartite Graph Formulation [2].

    'nmf': NMF-based consensus clustering [3].

    'all': The consensus clustering label with the largest objective function value [1] is returned among the results of all solvers.

    Note: Please use 'hbgf' for large-scale labels.

  • random_state: int, default=None

    Used for 'hgpa', 'mcla', and 'nmf'. Please pass an integer for reproducible results.

  • verbose: bool, default=False

    Whether to be verbose.

Return

  • label_ce: numpy.ndarray

    A consensus clustering label generated by cluster ensembles.

Example

tsano430/egnmf: https://github.com/tsano430/egnmf

Similar Package

GGiecold/Cluster_Ensembles: https://github.com/GGiecold/Cluster_Ensembles

References

[1] A. Strehl and J. Ghosh, "Cluster ensembles -- a knowledge reuse framework for combining multiple partitions," Journal of Machine Learning Research, vol. 3, pp. 583-617, 2002.

[2] X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning," In Proceedings of the Twenty-First International Conference on Machine Learning, p. 36, 2004.

[3] T. Li, C. Ding, and M. I. Jordan, "Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization," In Proceedings of the Seventh IEEE International Conference on Data Mining, pp. 577-582, 2007.

[4] J. Ghosh and A. Acharya, "Cluster ensembles," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 305-315, 2011.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ensembleclustering-1.0.2.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

ensembleclustering-1.0.2-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file ensembleclustering-1.0.2.tar.gz.

File metadata

  • Download URL: ensembleclustering-1.0.2.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.6 Linux/5.15.0-60-generic

File hashes

Hashes for ensembleclustering-1.0.2.tar.gz
Algorithm Hash digest
SHA256 7019b7636abd84e8d352b47838d055dbb2e330199fd2709465774e69635593c3
MD5 6e0bb317ca4b3b06856821127ef4781c
BLAKE2b-256 24d5fd5f8bebfd270ae01ffc75db0de690020ef753aa30ef85a925cb9913a972

See more details on using hashes here.

File details

Details for the file ensembleclustering-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: ensembleclustering-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.6 Linux/5.15.0-60-generic

File hashes

Hashes for ensembleclustering-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 574515c8641d002108bd86a167415bbff3ef2643064fa58e7678e434df18fbc0
MD5 62916101b1769a9b399c728860e5b738
BLAKE2b-256 5652081ab2a5b8a83534bfada1f6a70aa419fbd851c826a3ef6afef2fef20a92

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page