Skip to main content

VIASCKDE: internal cluster validity index (KDE-weighted compactness & separation)

Project description

VIASCKDE Index

VIASCKDE is a novel internal cluster validity index for arbitrary-shaped clusters based on Kernel Density Estimation (KDE).


Motivation

The VIASCKDE Index was developed to accurately assess clustering quality for non-spherical, arbitrarily shaped clusters, overcoming limitations of traditional validity measures that assume spherical structures. By combining compactness and separation at the point level with kernel density estimation to emphasize dense regions, VIASCKDE provides robust evaluation across diverse clustering outcomes.

The index evaluates clustering quality regardless of cluster shape by computing compactness and separation at the point level instead of relying on cluster centroids. This makes it robust for non-spherical and arbitrarily shaped clusters.


Installation

pip install viasckde

Usage

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import adjusted_rand_score
from viasckde import viasckde_score
import matplotlib.pyplot as plt

# 1. Arbitrary-shaped dataset (moons)
X, y_true = make_moons(n_samples=10000, noise=0.07, random_state=42)
X = StandardScaler().fit_transform(X)

Clustering process with DBSCAN
db = DBSCAN(eps=0.1, min_samples=5)
labels = db.fit_predict(X)

# VIASCKDE Score
viasckde = viasckde_score(X, labels)

# Adjusted Rand Index to validate the suc ess of VIASCKDE Index
ari = adjusted_rand_score(y_true, labels)

# print results
print("VIASCKDE Score:", viasckde)
print("ARI Score:", ari)

# to visualize results
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap="viridis", s=12)
plt.title(f"Best DBSCAN Clusters (eps=0.1, min_samples=5)\n"
          f"VIASCKDE={viasckde:.4f}, ARI={ari:.4f}")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()

Concept

In non-spherical clusters, the distance from a point to the nearest neighbor in the same cluster is often more meaningful than the distance to the cluster centroid. VIASCKDE computes:

Compactness: distance to the closest point in the same cluster

Separation: distance to the closest point in a different cluster

This point-level computation ensures realistic evaluation of clusters regardless of their shape.

Parameters of VIASCKDE Index

VIASCKDE index needs four parameters (two are optional) that are:

- X: your data array (NumPy-like)
- labels: predicted cluster labels
- kernel (optional): selected kernel method, krnl='gaussian' is default kernel.  But it could be 'tophat', 'epanechnikov', 'exponential', 'linear', or 'cosine'.
- bandwidth(optional): the bandwidth value of kernel density estimation. b_width=0.05 is the default value. But it could be changed.

Output Range

VIASCKDE returns a score in [-1, +1]:

+1: best clustering

-1: worst clustering

Citation

Ali Şenol, "VIASCKDE Index: A Novel Internal Cluster Validity Index for Arbitrary-Shaped Clusters Based on the Kernel Density Estimation", Computational Intelligence and Neuroscience, vol. 2022, Article ID 4059302, 20 pages, 2022. https://doi.org/10.1155/2022/4059302

BibTeX

@article{csenol2022viasckde, title={VIASCKDE Index: A Novel Internal Cluster Validity Index for Arbitrary-Shaped Clusters Based on the Kernel Density Estimation}, author={{\c{S}}enol, Ali}, journal={Computational Intelligence and Neuroscience}, volume={2022}, number={1}, pages={4059302}, year={2022}, publisher={Wiley Online Library}, doi = "10.1155/2022/4059302" }

License & Author

Author: Assoc. Prof. Dr. Ali Şenol Computer Engineering Department, Tarsus University

License: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

viasckde-0.1.7.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

viasckde-0.1.7-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file viasckde-0.1.7.tar.gz.

File metadata

  • Download URL: viasckde-0.1.7.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for viasckde-0.1.7.tar.gz
Algorithm Hash digest
SHA256 eb0c1532fb904bb8cb104e0127d302d7f812734e80bd193a17636645a412134c
MD5 274fef8822c7152e8f279ff43ecfcf1c
BLAKE2b-256 e65da326d70f2842a0d18b046a7d7cc938d52b66c432a580723a9c4bb008edc3

See more details on using hashes here.

File details

Details for the file viasckde-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: viasckde-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for viasckde-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 7b184bde5be7bb1e49d69c95f4182f2ec5f4c5eacc2c70abb1f4e98901f3b879
MD5 e756e7241cd1647947671386e48cb2ee
BLAKE2b-256 ec25e0c8dbffd7e5c56e4662db152a5e714c1de3f4fb1bc8f2b6557496c7553a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page