A lightweight Python package for internal clustering validation metrics.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a.semoglou

These details have not been verified by PyPI

Project description

intclustval

A lightweight Python package for internal clustering validation metrics.

intclustval provides a simple InternalClusterScore class for evaluating clustering quality using internal validation metrics.

Internal clustering validation metrics use only the input data and predicted cluster labels. They do not require ground-truth labels.

Related packages

This package is part of a small clustering-validation ecosystem:

Package	Purpose
`intclustval`	Internal clustering validation metrics
`extclustval`	External clustering validation metrics using ground-truth labels
`sil-score`	Exact and approximate silhouette scoring

Silhouette scores are intentionally not included in intclustval, because they are provided by the separate sil-score package.

This keeps intclustval focused on other internal validation metrics such as Calinski-Harabasz, Davies-Bouldin, inertia, Dunn Index, and Xie-Beni.

Metrics included

Internal clustering validation metrics

Attribute	Metric	Better direction
`calinski_harabasz`	Calinski-Harabasz score	Higher is better
`davies_bouldin`	Davies-Bouldin score	Lower is better
`inertia`	Within-cluster sum of squared distances	Lower is better for fixed number of clusters
`dunn_index`	Dunn Index	Higher is better
`xie_beni`	Xie-Beni index	Lower is better

Aliases

Attribute	Alias for
`ch`	`calinski_harabasz`
`db`	`davies_bouldin`
`within_cluster_dispersion`	`inertia`

Metadata

Attribute	Description
`n_samples`	Number of samples
`n_features`	Number of features
`n_clusters`	Number of clusters
`labels_unique`	Unique cluster labels
`cluster_sizes`	Number of samples in each cluster
`centroids`	Cluster centroids

Installation

You can install intclustval from PyPI:

pip install intclustval

Quick start

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

from intclustval import InternalClusterScore

X, _ = make_blobs(
    n_samples=300,
    centers=3,
    cluster_std=1.0,
    random_state=42,
)

labels = KMeans(
    n_clusters=3,
    random_state=42,
    n_init=10,
).fit_predict(X)

score = InternalClusterScore(X, labels)

print(score.calinski_harabasz)
print(score.davies_bouldin)
print(score.inertia)
print(score.dunn_index)
print(score.xie_beni)

Example output:

5196.295097418395
0.21231599538998425
566.8595511244131
0.9484430301054112
0.018180444255623783

You can also access all aggregate scores as a dictionary:

scores = score.to_dict()
print(scores)

Example:

{
    "calinski_harabasz": 5196.295097418395,
    "ch": 5196.295097418395,
    "davies_bouldin": 0.21231599538998425,
    "db": 0.21231599538998425,
    "inertia": 566.8595511244131,
    "within_cluster_dispersion": 566.8595511244131,
    "dunn_index": 0.9484430301054112,
    "xie_beni": 0.018180444255623783,
}

Using silhouette scores

Silhouette scores are available in the separate sil-score package.

Install it with:

pip install sil-score

Then use:

from sil_score import (
    sil_samples,
    micro_sil_score,
    macro_sil_score,
)

sample_scores = sil_samples(X, labels)
micro_score = micro_sil_score(X, labels)
macro_score = macro_sil_score(X, labels)

print(micro_score)
print(macro_score)

The sil-score package also supports approximate silhouette scoring through its approximation argument.

Metric definitions

Calinski-Harabasz score

The Calinski-Harabasz score measures the ratio of between-cluster dispersion to within-cluster dispersion.

A higher value usually indicates better-defined clusters. It is useful for comparing different clustering solutions on the same dataset.

Davies-Bouldin score

The Davies-Bouldin score measures average similarity between each cluster and its most similar other cluster.

A lower value indicates better clustering, because it means clusters are more compact and more separated from each other.

Inertia

Inertia is the within-cluster sum of squared distances from each sample to its assigned cluster centroid.

Lower inertia means samples are closer to their cluster centers. However, inertia always decreases as the number of clusters increases, so it should mainly be used to compare solutions with different values of k on the same dataset.

Dunn Index

The Dunn Index compares the minimum distance between different clusters to the maximum diameter within any cluster.

A higher Dunn Index indicates better clustering, with clusters that are compact and well separated.

This implementation uses pairwise distances, so it may be slower for large datasets.

Xie-Beni index

The Xie-Beni index compares total within-cluster compactness to the minimum squared distance between cluster centroids.

A lower value indicates better clustering, because it means compact clusters with well-separated centers.

Notes

Internal clustering validation metrics do not use ground-truth labels. They evaluate clustering structure using only:

X
labels

For external clustering validation with ground-truth labels, use extclustval.

For silhouette-specific scoring, use sil-score.

Cached properties

InternalClusterScore uses cached properties.

This means each metric is computed once and then stored.

score = InternalClusterScore(X, labels)

score.inertia  # computed once
score.inertia  # reused from cache

If you want to evaluate different labels, create a new InternalClusterScore object:

score = InternalClusterScore(X, labels)

new_score = InternalClusterScore(X, new_labels)

Do not modify score.X or score.labels after creating the object.

Requirements

numpy
scipy
scikit-learn

License

This project is licensed under the MIT License.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a.semoglou

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.2

May 23, 2026

This version

0.1.0

May 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intclustval-0.1.0.tar.gz (6.6 kB view details)

Uploaded May 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

intclustval-0.1.0-py3-none-any.whl (6.8 kB view details)

Uploaded May 23, 2026 Python 3

File details

Details for the file intclustval-0.1.0.tar.gz.

File metadata

Download URL: intclustval-0.1.0.tar.gz
Upload date: May 23, 2026
Size: 6.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intclustval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3516aa7e56c70b0458f9fbd792e22feebdedfb103562bdd8ee40e704344cdbf9`
MD5	`580cee1413f2b2ec41b7963e8603acf0`
BLAKE2b-256	`e99958bce6a4829f3e33a0602516f9898ddaff9107f74dbe4398aff48537fc33`

See more details on using hashes here.

Provenance

The following attestation bundles were made for intclustval-0.1.0.tar.gz:

Publisher: python-publish.yml on semoglou/intclustval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: intclustval-0.1.0.tar.gz
- Subject digest: 3516aa7e56c70b0458f9fbd792e22feebdedfb103562bdd8ee40e704344cdbf9
- Sigstore transparency entry: 1615000574
- Sigstore integration time: May 23, 2026
Source repository:
- Permalink: semoglou/intclustval@0c84d918909e14eb1c74d19fff05ea856669afda
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/semoglou
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@0c84d918909e14eb1c74d19fff05ea856669afda
- Trigger Event: release

File details

Details for the file intclustval-0.1.0-py3-none-any.whl.

File metadata

Download URL: intclustval-0.1.0-py3-none-any.whl
Upload date: May 23, 2026
Size: 6.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intclustval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b67e9e201cb22208e32cec1f5d444f1621426d7dd4a7b982a2dc883198d37f30`
MD5	`c7a07bd0bf7a0800d6839b9cdd817bb4`
BLAKE2b-256	`278cda5a0034b8b57118692ed17a7d7bb1df0c4fd60bef6ff744628dca1ea636`

See more details on using hashes here.

Provenance

The following attestation bundles were made for intclustval-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on semoglou/intclustval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: intclustval-0.1.0-py3-none-any.whl
- Subject digest: b67e9e201cb22208e32cec1f5d444f1621426d7dd4a7b982a2dc883198d37f30
- Sigstore transparency entry: 1615000579
- Sigstore integration time: May 23, 2026
Source repository:
- Permalink: semoglou/intclustval@0c84d918909e14eb1c74d19fff05ea856669afda
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/semoglou
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@0c84d918909e14eb1c74d19fff05ea856669afda
- Trigger Event: release

intclustval 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

intclustval

Related packages

Metrics included

Internal clustering validation metrics

Aliases

Metadata

Installation

Quick start

Using silhouette scores

Metric definitions

Calinski-Harabasz score

Davies-Bouldin score

Inertia

Dunn Index

Xie-Beni index

Notes

Cached properties

Requirements

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance