A lightweight Python package for internal clustering validation metrics.
Project description
intclustval
A lightweight Python package for internal clustering validation metrics.
intclustval provides a simple InternalClusterScore class for evaluating clustering quality using internal validation metrics.
Internal clustering validation metrics use only the input data and predicted cluster labels. They do not require ground-truth labels.
Related packages
This package is part of a small clustering-validation ecosystem:
| Package | Purpose |
|---|---|
intclustval |
Internal clustering validation metrics |
extclustval |
External clustering validation metrics using ground-truth labels |
sil-score |
Exact and approximate silhouette scoring |
Silhouette scores are intentionally not included in intclustval, because they are provided by the separate sil-score package.
This keeps intclustval focused on other internal validation metrics such as Calinski-Harabasz, Davies-Bouldin, inertia, Dunn Index, and Xie-Beni.
Metrics included
Internal clustering validation metrics
| Attribute | Metric | Better direction |
|---|---|---|
calinski_harabasz |
Calinski-Harabasz score | Higher is better |
davies_bouldin |
Davies-Bouldin score | Lower is better |
inertia |
Within-cluster sum of squared distances | Lower is better for fixed number of clusters |
dunn_index |
Dunn Index | Higher is better |
xie_beni |
Xie-Beni index | Lower is better |
Aliases
| Attribute | Alias for |
|---|---|
ch |
calinski_harabasz |
db |
davies_bouldin |
within_cluster_dispersion |
inertia |
Metadata
| Attribute | Description |
|---|---|
n_samples |
Number of samples |
n_features |
Number of features |
n_clusters |
Number of clusters |
labels_unique |
Unique cluster labels |
cluster_sizes |
Number of samples in each cluster |
centroids |
Cluster centroids |
Installation
You can install intclustval from PyPI:
pip install intclustval
Quick start
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from intclustval import InternalClusterScore
X, _ = make_blobs(
n_samples=300,
centers=3,
cluster_std=1.0,
random_state=42,
)
labels = KMeans(
n_clusters=3,
random_state=42,
n_init=10,
).fit_predict(X)
score = InternalClusterScore(X, labels)
print(score.calinski_harabasz)
print(score.davies_bouldin)
print(score.inertia)
print(score.dunn_index)
print(score.xie_beni)
Example output:
5196.295097418395
0.21231599538998425
566.8595511244131
0.9484430301054112
0.018180444255623783
You can also access all aggregate scores as a dictionary:
scores = score.to_dict()
print(scores)
Example:
{
"calinski_harabasz": 5196.295097418395,
"ch": 5196.295097418395,
"davies_bouldin": 0.21231599538998425,
"db": 0.21231599538998425,
"inertia": 566.8595511244131,
"within_cluster_dispersion": 566.8595511244131,
"dunn_index": 0.9484430301054112,
"xie_beni": 0.018180444255623783,
}
Using silhouette scores
Silhouette scores are available in the separate sil-score package.
Install it with:
pip install sil-score
Then use:
from sil_score import (
sil_samples,
micro_sil_score,
macro_sil_score,
)
sample_scores = sil_samples(X, labels)
micro_score = micro_sil_score(X, labels)
macro_score = macro_sil_score(X, labels)
print(micro_score)
print(macro_score)
The sil-score package also supports approximate silhouette scoring through its approximation argument.
Metric definitions
Calinski-Harabasz score
The Calinski-Harabasz score measures the ratio of between-cluster dispersion to within-cluster dispersion.
A higher value usually indicates better-defined clusters. It is useful for comparing different clustering solutions on the same dataset.
Davies-Bouldin score
The Davies-Bouldin score measures average similarity between each cluster and its most similar other cluster.
A lower value indicates better clustering, because it means clusters are more compact and more separated from each other.
Inertia
Inertia is the within-cluster sum of squared distances from each sample to its assigned cluster centroid.
Lower inertia means samples are closer to their cluster centers. However, inertia always decreases as the number of clusters increases, so it should mainly be used to compare solutions with different values of k on the same dataset.
Dunn Index
The Dunn Index compares the minimum distance between different clusters to the maximum diameter within any cluster.
A higher Dunn Index indicates better clustering, with clusters that are compact and well separated.
This implementation uses pairwise distances, so it may be slower for large datasets.
Xie-Beni index
The Xie-Beni index compares total within-cluster compactness to the minimum squared distance between cluster centroids.
A lower value indicates better clustering, because it means compact clusters with well-separated centers.
Notes
Internal clustering validation metrics do not use ground-truth labels. They evaluate clustering structure using only:
X
labels
For external clustering validation with ground-truth labels, use extclustval.
For silhouette-specific scoring, use sil-score.
Cached properties
InternalClusterScore uses cached properties.
This means each metric is computed once and then stored.
score = InternalClusterScore(X, labels)
score.inertia # computed once
score.inertia # reused from cache
If you want to evaluate different labels, create a new InternalClusterScore object:
score = InternalClusterScore(X, labels)
new_score = InternalClusterScore(X, new_labels)
Do not modify score.X or score.labels after creating the object.
Requirements
numpy
scipy
scikit-learn
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file intclustval-0.1.2.tar.gz.
File metadata
- Download URL: intclustval-0.1.2.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d52fa2790b7efa2458764c962f064a6e376208e786d003d30398e786fc4c9b9
|
|
| MD5 |
4d51b8ca1bb0c5ed4d1ec5d1b1d4524b
|
|
| BLAKE2b-256 |
a62e3be1cfd485082e62805e8a5a9386e3a2de1c640e8de1ad67f2a7e4d52efa
|
Provenance
The following attestation bundles were made for intclustval-0.1.2.tar.gz:
Publisher:
python-publish.yml on semoglou/intclustval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
intclustval-0.1.2.tar.gz -
Subject digest:
1d52fa2790b7efa2458764c962f064a6e376208e786d003d30398e786fc4c9b9 - Sigstore transparency entry: 1615017460
- Sigstore integration time:
-
Permalink:
semoglou/intclustval@d5c1fbe73b4ebe60779c42ba679da745174d2756 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/semoglou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@d5c1fbe73b4ebe60779c42ba679da745174d2756 -
Trigger Event:
release
-
Statement type:
File details
Details for the file intclustval-0.1.2-py3-none-any.whl.
File metadata
- Download URL: intclustval-0.1.2-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90ad3dc12f2054ba6a6aea5402dfe936e65f4b0401a125b8e7a4cec56dab2864
|
|
| MD5 |
e20936ca5030ea3c295f4d591f9825d1
|
|
| BLAKE2b-256 |
7dc601c45fd320a08a51921460113437aa1c08dec641baf985c60109190840bc
|
Provenance
The following attestation bundles were made for intclustval-0.1.2-py3-none-any.whl:
Publisher:
python-publish.yml on semoglou/intclustval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
intclustval-0.1.2-py3-none-any.whl -
Subject digest:
90ad3dc12f2054ba6a6aea5402dfe936e65f4b0401a125b8e7a4cec56dab2864 - Sigstore transparency entry: 1615017480
- Sigstore integration time:
-
Permalink:
semoglou/intclustval@d5c1fbe73b4ebe60779c42ba679da745174d2756 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/semoglou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@d5c1fbe73b4ebe60779c42ba679da745174d2756 -
Trigger Event:
release
-
Statement type: