A lightweight Python package for external clustering validation metrics.
Project description
extclustval
A lightweight Python package for external clustering validation metrics.
extclustval provides a simple ClusterScore class for evaluating clustering results against ground-truth labels.
Metrics included
Standard external clustering metrics
| Attribute | Metric |
|---|---|
ri |
Rand Index |
ari |
Adjusted Rand Index |
nmi |
Normalized Mutual Information |
ami |
Adjusted Mutual Information |
homogeneity |
Homogeneity score |
completeness |
Completeness score |
v_measure |
V-measure |
fmi |
Fowlkes-Mallows Index |
Additional clustering validation metrics
| Attribute | Metric |
|---|---|
purity |
Purity score |
inverse_purity |
Inverse purity score |
clustering_accuracy |
Hungarian-matched clustering accuracy |
Pairwise metrics
| Attribute | Metric |
|---|---|
pairwise_precision |
Pairwise precision |
pairwise_recall |
Pairwise recall |
pairwise_f1 |
Pairwise F1 score |
BCubed metrics
| Attribute | Metric |
|---|---|
bcubed_precision |
BCubed precision |
bcubed_recall |
BCubed recall |
bcubed_f1 |
BCubed F1 score |
Installation
You can install extclustval from PyPI:
pip install extclustval
Quick start
from extclustval import ClusterScore
y_true = [0, 0, 1, 1, 2, 2]
y_pred = [1, 1, 0, 0, 2, 2]
score = ClusterScore(y_true, y_pred)
print(score.ari)
print(score.nmi)
print(score.clustering_accuracy)
Output:
1.0
1.0
1.0
You can also access metrics directly as attributes:
score.ari
score.ri
score.nmi
score.ami
score.homogeneity
score.completeness
score.v_measure
score.fmi
score.purity
score.inverse_purity
score.clustering_accuracy
score.acc
score.pairwise_precision
score.pairwise_recall
score.pairwise_f1
score.bcubed_precision
score.bcubed_recall
score.bcubed_f1
Dictionary output
You can return all scores as a dictionary:
scores = score.to_dict()
print(scores)
Example:
{
"ari": 1.0,
"ri": 1.0,
"nmi": 1.0,
"ami": 1.0,
"homogeneity": 1.0,
"completeness": 1.0,
"v_measure": 1.0,
"fmi": 1.0,
"purity": 1.0,
"inverse_purity": 1.0,
"clustering_accuracy": 1.0,
"acc": 1.0,
"pairwise_precision": 1.0,
"pairwise_recall": 1.0,
"pairwise_f1": 1.0,
"bcubed_precision": 1.0,
"bcubed_recall": 1.0,
"bcubed_f1": 1.0,
}
Notes about clustering accuracy
Clustering labels are arbitrary. For example, these two clusterings are equivalent:
[0, 0, 1, 1]
[5, 5, 9, 9]
Because of this, extclustval computes clustering accuracy using optimal Hungarian matching between predicted clusters and ground-truth classes.
score.clustering_accuracy
The short alias score.acc is also available and returns the same value.
This metric is most appropriate when the number of predicted clusters roughly matches the number of ground-truth classes.
For general clustering evaluation, adjusted and permutation-invariant metrics such as ARI, AMI, NMI, pairwise F1, and BCubed F1 are often safer to report.
Notes about purity
Purity is easy to understand, but it is biased toward solutions with many clusters. If each sample is placed in its own cluster, purity can become artificially high.
Use purity together with other metrics such as ARI, AMI, pairwise F1, or BCubed F1.
Cached properties
ClusterScore uses cached properties.
This means each score is computed once and then stored.
score = ClusterScore(y_true, y_pred)
score.ari # computed once
score.ari # reused from cache
If you want to evaluate different labels, create a new ClusterScore object:
score = ClusterScore(y_true, y_pred)
new_score = ClusterScore(y_true, new_y_pred)
Do not modify score.y_true or score.y_pred after creating the object.
Metric definitions
Rand Index (RI)
Rand Index measures how similar two partitions are by looking at all possible pairs of samples.
For every pair of samples, RI checks whether the true labels and predicted clusters agree:
- the pair is in the same true class and also in the same predicted cluster, or
- the pair is in different true classes and also in different predicted clusters.
The score is the fraction of pairs where this agreement happens.
A perfect clustering scores 1.0. However, RI is not adjusted for chance, so it can sometimes look high even when the clustering is not very meaningful, especially when many sample pairs are easy to separate.
Adjusted Rand Index (ARI)
Adjusted Rand Index is a chance-adjusted version of the Rand Index.
Like RI, ARI compares pairs of samples and checks whether the true labels and predicted clusters agree. The difference is that ARI corrects for the agreement that would be expected just by random chance.
A perfect clustering scores 1.0. Random clusterings tend to score near 0.0. Bad clusterings can score below 0.0.
ARI is one of the most commonly used external clustering validation metrics because it is permutation-invariant and adjusted for chance.
Normalized Mutual Information (NMI)
Normalized Mutual Information measures how much information the predicted clusters contain about the true labels.
If knowing a sample’s predicted cluster tells you a lot about its true class, NMI is high. If the predicted clusters and true labels are mostly unrelated, NMI is low.
NMI is normalized so that a perfect match scores 1.0. It is permutation-invariant, meaning it does not matter which numeric IDs are used for the clusters. However, NMI is not adjusted for chance.
Adjusted Mutual Information (AMI)
Adjusted Mutual Information is a chance-adjusted version of mutual information.
Like NMI, AMI measures how much information is shared between the predicted clusters and the true labels. Unlike NMI, AMI corrects for the amount of information that would be expected by random cluster assignments.
A perfect clustering scores 1.0. Random clusterings tend to score near 0.0.
AMI is useful when comparing clustering results with different numbers of clusters, because it is less likely than NMI to reward structure that appears only by chance.
Homogeneity
Homogeneity measures whether each predicted cluster contains samples from only one ground-truth class.
A clustering has high homogeneity when its clusters are pure. For example, if one predicted cluster contains only samples from class A, that cluster is homogeneous.
Homogeneity penalizes clusters that mix multiple true classes together. However, homogeneity alone does not penalize splitting one true class into many small clusters.
Completeness
Completeness measures whether all samples from the same ground-truth class are assigned to the same predicted cluster.
A clustering has high completeness when each true class is mostly captured by one cluster. For example, if all samples from class A are placed in the same predicted cluster, completeness is high for that class.
Completeness penalizes splitting a true class across multiple clusters. However, completeness alone does not strongly penalize merging different true classes into the same cluster.
V-measure
V-measure combines homogeneity and completeness into one score.
It is the harmonic mean of homogeneity and completeness, so it rewards clusterings that are both class-pure and class-complete.
A perfect clustering scores 1.0. V-measure is useful when you want a single score that balances over-splitting and over-merging.
Fowlkes-Mallows Index (FMI)
Fowlkes-Mallows Index is a pair-based clustering metric.
It looks at pairs of samples and compares which pairs are placed together in the predicted clustering and which pairs truly belong together according to the ground-truth labels.
FMI is the geometric mean of pairwise precision and pairwise recall. A perfect clustering scores 1.0.
It is useful when you want a score based on pairwise grouping behavior.
Purity
Purity measures how much each predicted cluster is dominated by a single ground-truth class.
For each predicted cluster, purity finds the most common true class inside that cluster. It then sums these dominant-class counts across all clusters and divides by the total number of samples.
Purity is easy to understand: high purity means clusters mostly contain samples from one class.
However, purity is biased toward many clusters. If every sample is placed in its own cluster, purity becomes perfect, even though the clustering may not be useful.
Inverse purity
Inverse purity is the class-oriented counterpart of purity.
Instead of asking whether each predicted cluster is dominated by one true class, inverse purity asks whether each true class is well captured by one predicted cluster.
For each ground-truth class, it finds the predicted cluster that contains the largest number of samples from that class. These counts are summed and divided by the total number of samples.
Inverse purity rewards clusterings that avoid splitting true classes across many clusters.
Clustering accuracy
Clustering accuracy compares predicted cluster labels with ground-truth labels after aligning cluster IDs to class IDs.
Raw classification accuracy is not appropriate for clustering because cluster labels are arbitrary. For example, cluster 0 and cluster 5 could represent the same group.
To handle this, extclustval uses optimal Hungarian matching to find the best one-to-one mapping between predicted clusters and true classes. It then computes the fraction of samples that are correctly matched under that mapping.
This metric is most appropriate when the predicted clusters roughly correspond one-to-one with the ground-truth classes. It can be misleading when the number of clusters and classes differs a lot, or when the clustering intentionally splits or merges classes.
Pairwise precision
Pairwise precision measures how reliable predicted same-cluster decisions are.
It looks at all sample pairs that were placed in the same predicted cluster. Among those pairs, it measures how many truly belong to the same ground-truth class.
High pairwise precision means the clustering makes few incorrect merges.
Pairwise recall
Pairwise recall measures how well true same-class pairs are recovered.
It looks at all sample pairs that belong to the same ground-truth class. Among those pairs, it measures how many were placed in the same predicted cluster.
High pairwise recall means the clustering makes few incorrect splits.
Pairwise F1
Pairwise F1 is the harmonic mean of pairwise precision and pairwise recall.
It balances two types of clustering errors:
- merging samples that should be separate, and
- splitting samples that should be together.
Pairwise F1 is often a better clustering-native alternative to classification F1.
BCubed precision
BCubed precision measures cluster purity from each sample’s point of view.
For each sample, it looks at the predicted cluster containing that sample and checks what fraction of that cluster has the same true label as the sample. These per-sample precision values are averaged across all samples.
High BCubed precision means samples tend to be placed in clusters containing mostly members of their own true class.
BCubed recall
BCubed recall measures class recovery from each sample’s point of view.
For each sample, it looks at the sample’s true class and checks what fraction of that class appears in the same predicted cluster as the sample. These per-sample recall values are averaged across all samples.
High BCubed recall means samples from the same true class tend to stay together.
BCubed F1
BCubed F1 is the harmonic mean of BCubed precision and BCubed recall.
It balances sample-level cluster purity and sample-level class recovery.
BCubed F1 is useful when clusters have different sizes or when you want a metric that evaluates clustering quality from the perspective of individual samples.
Requirements
numpy
scipy
scikit-learn
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file extclustval-0.1.0.tar.gz.
File metadata
- Download URL: extclustval-0.1.0.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c4e77686bf405dc4d7f45fe599ee06c79fc15fe1a09a29224cdad0886a5d5c0
|
|
| MD5 |
1c6e424e525932918632572d3b17c5bd
|
|
| BLAKE2b-256 |
30c0e891932417fd07813835ec9019fde0dd3b7ec72ecca6dd4a8b4d602e6664
|
Provenance
The following attestation bundles were made for extclustval-0.1.0.tar.gz:
Publisher:
python-publish.yml on semoglou/extclustval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
extclustval-0.1.0.tar.gz -
Subject digest:
9c4e77686bf405dc4d7f45fe599ee06c79fc15fe1a09a29224cdad0886a5d5c0 - Sigstore transparency entry: 1604461106
- Sigstore integration time:
-
Permalink:
semoglou/extclustval@280413641ec50e838fa2140f52680b7dad866973 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/semoglou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@280413641ec50e838fa2140f52680b7dad866973 -
Trigger Event:
release
-
Statement type:
File details
Details for the file extclustval-0.1.0-py3-none-any.whl.
File metadata
- Download URL: extclustval-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2caa77accd27e24dbd0b87ed01dc46d6c76a167e600d5fe74f381725fa143141
|
|
| MD5 |
b6e92a77e4cceb58ce31b6487aff19c3
|
|
| BLAKE2b-256 |
da12371ed44b7925fa61a36b36588c33696edd5281f4395f90cb80399f8d0b26
|
Provenance
The following attestation bundles were made for extclustval-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on semoglou/extclustval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
extclustval-0.1.0-py3-none-any.whl -
Subject digest:
2caa77accd27e24dbd0b87ed01dc46d6c76a167e600d5fe74f381725fa143141 - Sigstore transparency entry: 1604461258
- Sigstore integration time:
-
Permalink:
semoglou/extclustval@280413641ec50e838fa2140f52680b7dad866973 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/semoglou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@280413641ec50e838fa2140f52680b7dad866973 -
Trigger Event:
release
-
Statement type: