MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data
Project description
MCMSTStream
MCMSTStream is a streaming clustering algorithm based on Minimum Spanning Trees (MST) and KD-Tree–based micro-clusters.
Features
✨ Features
✔ Online streaming clustering
✔ Sliding window model
✔ KD-Tree accelerated micro-cluster formation
✔ Macro-cluster discovery via Minimum Spanning Tree
✔ Noise & outlier handling
✔ Visualization utilities
✔ Scikit-learn–compatible API (fit, partial_fit, predict, get_params, set_params)
✔ Supports incremental, real-time data processing
Installation
pip install mcmststream
Parameters
If you want to use amount-based sliding window assign WindowType.AMOUNT_BASED If you want to use time based sliding window, assign WindowType.TIME_BASED N: int -> Minimum number of points to form a cluster r: float -> Initial cluster radius r_threshold: float -> Radius increase/decrease threshold r_max: float -> Maximum cluster radius window_type: WindowType -> {WindowType.AMOUNT_BASED,WindowType.TIME_BASED window_size: int -> For amount-based: number of points in window verbose: bool {True, False}
Usage
import numpy as np
from sklearn.metrics.cluster import adjusted_rand_score
from sklearn.preprocessing import MinMaxScaler
from mcmststream import MCMSTStream, load_exclastar
# Load data
X, y_true = load_exclastar()
# Normalize
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
np.random.seed(42)
# Initialize with history keeping enabled
clusterer = MCMSTStream(
W=270,
n_micro=2,
N=2,
r=0.14,
random_state=42,
keep_history=True # Enable history tracking
)
for i, point in enumerate(X_scaled):
label = clusterer.partial_fit(point)
# Visualize periodically
if i % 20 == 0 and i > 0:
print(f"\nStep {i}:")
print(f" Current label for this point: {label}")
print(f" Micro-clusters: {len(clusterer.micro_clusters)}")
print(f" Macro-clusters: {len([m for m in clusterer.macro_clusters if m['active']])}")
if clusterer.keep_history:
hist_labels = np.array(clusterer.history_labels_)
print(f" History labels (unique): {np.unique(hist_labels)}")
clusterer.visualize(title=f"Step {i}")
ARI=adjusted_rand_score(y_true,clusterer.history_labels_)
print("ARI=%0.4f"%ARI)
Visualization
The package includes a built-in visualization function:
clusterer.visualize(title="MCMSTStream Clustering Result")
Evaluation
Calculates:
Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index
ARI, NMI, V-Measure (if true labels provided)
metrics = clusterer.evaluate(true_labels=y_true)
print(metrics)
Citation
If you use this algorithm in research, please cite the corresponding paper.
Erdinç, B., Kaya, M., & Şenol, A. (2024). MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data. Neural Computing and Applications, 36(13), 7025-7042.
BibTeX
@article{erdincc2024mcmststream,
title={MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data},
author={Erdin{\c{c}}, Berfin and Kaya, Mahmut and {\c{S}}enol, Ali},
journal={Neural Computing and Applications},
volume={36},
number={13},
pages={7025--7042},
year={2024},
publisher={Springer}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcmststream-1.0.1.tar.gz.
File metadata
- Download URL: mcmststream-1.0.1.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
626e1c70b9b88468cdba29938cf29f8f0deafaea181a3914b70ea27b39dfbf11
|
|
| MD5 |
2f85e0dd9fe414a0461d2da5aeb95eeb
|
|
| BLAKE2b-256 |
337d571f495352bd7f7f4397fd136a0691ba6e3fbb1a3fe1f7302dcb3e29a014
|
File details
Details for the file mcmststream-1.0.1-py3-none-any.whl.
File metadata
- Download URL: mcmststream-1.0.1-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06badcc7eb3f63efe43a3393bc878bb3b184e5fdc8a6af52455be9a460fa601f
|
|
| MD5 |
e674f292164a4ee020936bd010e8e8a8
|
|
| BLAKE2b-256 |
0bf3de0cdc7685b7b4f85efae33183b7ec58c3e98a85a39c4c55da46a58679a7
|