Python library for clustering quality index metrics with original and adjusted forms

These details have not been verified by PyPI

Project links

Project description

PyIVM - Python Library for Clustering Quality Metrics

A comprehensive Python library for computing clustering quality metrics with both original and adjusted forms, making clustering evaluation more reliable and meaningful.

Why Adjusted Metrics?

Traditional clustering metrics have a fundamental problem: they're biased and unreliable when comparing different clustering solutions. For example, some metrics artificially favor clusterings with more clusters, while others behave inconsistently across different datasets. This makes it nearly impossible to fairly evaluate which clustering is actually better.

PyIVM solves this with adjusted metrics (TPAMI 2025). The adjusted versions remove these biases and provide consistent, reliable scores that enable fair comparison across different numbers of clusters and datasets. Moreover, all adjusted metrics use a simple "higher = better" interpretation, making clustering evaluation straightforward.

Features

Six essential clustering metrics: Calinski-Harabasz, Davies-Bouldin, Dunn, I-Index, Silhouette, and Xie-Beni
Original + Adjusted forms: Traditional metrics plus improved adjusted variants
Easy to use: Simple API that works with scikit-learn and numpy arrays

Installation

pip install pyivm

Quick Start

import numpy as np
import pyivm

# Your clustering data
X = np.random.rand(100, 2)  # Features
labels = np.random.randint(0, 3, 100)  # Cluster assignments

# Compute metrics (higher = better clustering)
calinski_score = pyivm.calinski_harabasz(X, labels)
silhouette_score = pyivm.silhouette(X, labels)

# For better cluster comparison, use adjusted forms
calinski_adj = pyivm.calinski_harabasz(X, labels, adjusted=True)
silhouette_adj = pyivm.silhouette(X, labels, adjusted=True)

print(f"Calinski-Harabasz: {calinski_score:.3f} (adjusted: {calinski_adj:.3f})")
print(f"Silhouette Score: {silhouette_score:.3f} (adjusted: {silhouette_adj:.3f})")

When to Use Adjusted Metrics

Use adjusted=True when:

Comparing clusterings with different numbers of clusters
Selecting optimal number of clusters
Evaluating clustering algorithms fairly
Publishing research results

Use original forms when:

You have fixed number of clusters
Comparing with existing literature
Following specific benchmarking protocols

📋 API Reference

All metrics follow the same simple pattern:

score = pyivm.metric_name(X, labels, adjusted=False, **kwargs)

Parameters

X (array-like, shape (n_samples, n_features)): Data points
labels (array-like, shape (n_samples,)): Cluster labels for each data point
adjusted (bool, default=False): Whether to use adjusted form for fair comparison
**kwargs: Additional metric-specific parameters

Supported Metrics

Function	Description	Original Form	Adjusted Form
`pyivm.calinski_harabasz(X, labels, adjusted=False)`	Calinski-Harabasz Index	✅ Higher better	✅ Higher better
`pyivm.davies_bouldin(X, labels, adjusted=False)`	Davies-Bouldin Index	❌ Lower better	✅ Higher better
`pyivm.dunn(X, labels, adjusted=False)`	Dunn Index	✅ Higher better	✅ Higher better
`pyivm.i_index(X, labels, adjusted=False)`	I Index	✅ Higher better	✅ Higher better
`pyivm.silhouette(X, labels, adjusted=False)`	Silhouette Coefficient	✅ Higher better	✅ Higher better
`pyivm.xie_beni(X, labels, adjusted=False)`	Xie Beni Index	❌ Lower better	✅ Higher better

✅ Adjusted Benefits: All adjusted metrics are consistently "higher = better", making them easy to interpret and compare across different metrics and datasets.

Example Usage

import numpy as np
import pyivm

# Your clustering data
X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
labels = np.array([0, 0, 0, 1, 1, 1])

# Compute all metrics
results = {
    'calinski_harabasz': pyivm.calinski_harabasz(X, labels),
    'calinski_harabasz_adj': pyivm.calinski_harabasz(X, labels, adjusted=True),
    'davies_bouldin': pyivm.davies_bouldin(X, labels),
    'davies_bouldin_adj': pyivm.davies_bouldin(X, labels, adjusted=True),
    'dunn': pyivm.dunn(X, labels),
    'dunn_adj': pyivm.dunn(X, labels, adjusted=True),
    'i_index': pyivm.i_index(X, labels),
    'i_index_adj': pyivm.i_index(X, labels, adjusted=True),
    'silhouette': pyivm.silhouette(X, labels),
    'silhouette_adj': pyivm.silhouette(X, labels, adjusted=True),
    'xie_beni': pyivm.xie_beni(X, labels),
    'xie_beni_adj': pyivm.xie_beni(X, labels, adjusted=True),
}

for metric, score in results.items():
    print(f"{metric}: {score:.4f}")

Quick Comparison Helper

def evaluate_clustering(X, labels, adjusted=True):
    """Evaluate clustering with all metrics"""
    metrics = {}
    
    # Higher is better metrics
    for metric_name in ['calinski_harabasz', 'dunn', 'i_index', 'silhouette']:
        metric_func = getattr(pyivm, metric_name)
        metrics[metric_name] = metric_func(X, labels, adjusted=adjusted)
    
    # Lower is better metrics
    for metric_name in ['davies_bouldin', 'xie_beni']:
        metric_func = getattr(pyivm, metric_name)
        metrics[metric_name] = metric_func(X, labels, adjusted=adjusted)
    
    return metrics

# Usage
scores = evaluate_clustering(X, labels, adjusted=True)
print(scores)

Requirements

Python ^3.9
NumPy ^1.20.0
SciPy ^1.7.0
scikit-learn ^1.0.0

Citation

If you use this library in your research, please cite:

@ARTICLE{10909451,
  author={Jeon, Hyeon and Aupetit, Michaël and Shin, DongHwa and Cho, Aeri and Park, Seokhyeon and Seo, Jinwook},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Measuring the Validity of Clustering Validation Datasets}, 
  year={2025},
  volume={47},
  number={6},
  pages={5045-5058},
  keywords={Reliability;Benchmark testing;Protocols;Training;Standards;Size measurement;Mutual information;Indexes;Electronic mail;Data mining;Clustering;clustering validation;internal clustering validation;external clustering validation;clustering benchmark},
  doi={10.1109/TPAMI.2025.3548011}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyivm-0.1.0.tar.gz (11.4 kB view details)

Uploaded Jun 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyivm-0.1.0-py3-none-any.whl (12.7 kB view details)

Uploaded Jun 14, 2025 Python 3

File details

Details for the file pyivm-0.1.0.tar.gz.

File metadata

Download URL: pyivm-0.1.0.tar.gz
Upload date: Jun 14, 2025
Size: 11.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.12.7 Darwin/24.5.0

File hashes

Hashes for pyivm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`affbeb8dafd6888f61b6f47bb400ad56ecb78c9222085a954afdb73ed6c39b5a`
MD5	`0c8bfda2b5eb8ef190bdbd6c1e74cdc0`
BLAKE2b-256	`0683fc5d1cd93dd7bf1876ae62d1709c79caaab3e1f9a38e0cd4f9c7c13ecfde`

See more details on using hashes here.

File details

Details for the file pyivm-0.1.0-py3-none-any.whl.

File metadata

Download URL: pyivm-0.1.0-py3-none-any.whl
Upload date: Jun 14, 2025
Size: 12.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.12.7 Darwin/24.5.0

File hashes

Hashes for pyivm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2b9b210631185cd153f53df69ab68e09eecf4902d671f2ada6c4ffdc2faac881`
MD5	`9a53a770a2878ef9d5b45321624a995f`
BLAKE2b-256	`736778b5466b5146b43d6808e8d482564e8a434b2a366ce1abef37e11c194889`

See more details on using hashes here.

pyivm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyIVM - Python Library for Clustering Quality Metrics

Why Adjusted Metrics?

Features

Installation

Quick Start

When to Use Adjusted Metrics

📋 API Reference

Parameters

Supported Metrics

Example Usage

Quick Comparison Helper

Requirements

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes