Skip to main content

MCMSTClustering: Minimum Spanning Tree and Micro-Cluster based Clustering Algorithm

Project description

PyPI version License: MIT Python 3.7+

Motivation

MCMSTClustering is an MST-based clustering algorithm designed to handle high-dimensional, imbalanced, varying-density, and arbitrary-shaped datasets. It first forms micro-clusters using KD-Tree range search, then builds a Minimum Spanning Tree over these micro-clusters to detect non-spherical macro-clusters. A final cluster-regulation step refines boundaries and improves clustering quality. Experiments show that MCMSTClustering outperforms several state-of-the-art methods with strong accuracy and efficient runtime.

Installation

pip install mcmst-clust

Usage

from mcmst_clust import MCMSTClustering
from sklearn.datasets import load_wine
from sklearn.metrics import adjusted_rand_score

data = load_wine()
X = data.data
y = data.target

model = MCMSTClustering(N=19, r=0.49, n_micro=3, random_state=42) 
labels = model.fit_predict(X)

print("n_micro:", model.n_micro_clusters_, "n_macro:", model.n_macro_clusters_)
print("ARI:", adjusted_rand_score(y, labels))

Oerview

MCMSTClustering (Defining Non-Spherical Clusters by using Minimum Spanning Tree over KD-Tree-based Micro-Clusters) is designed to overcome limitations of conventional clustering algorithms when handling:

- High-dimensional data

- Imbalanced datasets

- Clusters with varying densities

- Noisy data/outliers

- Arbitrary-shaped clusters

The algorithm consists of three main steps:

1. Micro-cluster Formation: Defines micro-clusters using a KD-Tree data structure with range search.

2. Macro-cluster Construction: Builds a minimum spanning tree (MST) over the micro-clusters to form macro-clusters.

3. Cluster Regulation: Refines the clusters to improve accuracy and overall clustering quality.

Extensive experiments against state-of-the-art algorithms show that MCMSTClustering achieves high-quality clustering results with acceptable runtime.

Key Features

- Clusters datasets with high quality

- Detects arbitrary-shaped clusters

- Robust against outliers/noisy data

- Handles clusters with varying densities

- Efficient on imbalanced datasets

Cite

If you use the code in your works, please cite the paper given below:

Şenol, A. MCMSTClustering: defining non-spherical clusters by using minimum 
spanning tree over KD-tree-based micro-clusters. Neural Comput & Applic 35, 
13239–13259 (2023). https://doi.org/10.1007/s00521-023-08386-3

BibTeX

@article{csenol2023mcmstclustering,
  title={MCMSTClustering: defining non-spherical clusters by using minimum spanning tree over KD-tree-based micro-clusters},
  author={{\c{S}}enol, Ali},
  journal={Neural Computing and Applications},
  volume={35},
  number={18},
  pages={13239--13259},
  year={2023},
  publisher={Springer}
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcmst_clust-1.1.0.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcmst_clust-1.1.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file mcmst_clust-1.1.0.tar.gz.

File metadata

  • Download URL: mcmst_clust-1.1.0.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for mcmst_clust-1.1.0.tar.gz
Algorithm Hash digest
SHA256 d7498e64c8710b9df35ed0236a5494f0cf5270e8db28c3fa3bb7dae0ab2d9e8d
MD5 1c857d51af343ab891f2105b46839234
BLAKE2b-256 14c50245199f8cc1d41d0c6cadd0ccce1fee31275f842e830df30777014bb82f

See more details on using hashes here.

File details

Details for the file mcmst_clust-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcmst_clust-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for mcmst_clust-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 799fddf15cf9395309574ff5c0d41d319679503a25c60767e138dba98f7eaa3a
MD5 4acda66eb8060ce4d4152fc1fa00320b
BLAKE2b-256 44f410c71ff8d46fc53d3dfb100f560992665ca6615beac15c802aeb22bc1d7b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page