A lightweight Python package for external clustering validation metrics.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a.semoglou

These details have not been verified by PyPI

Project description

extclustval

A lightweight Python package for external clustering validation metrics.

extclustval provides a simple ClusterScore class for evaluating clustering results against ground-truth labels.

Related packages

This package is part of a small clustering-validation ecosystem:

Package	Purpose
`intclustval`	Internal clustering validation metrics
`extclustval`	External clustering validation metrics using ground-truth labels
`sil-score`	Exact and approximate silhouette scoring

Metrics included

Standard external clustering metrics

Attribute	Metric
`ri`	Rand Index
`ari`	Adjusted Rand Index
`nmi`	Normalized Mutual Information
`ami`	Adjusted Mutual Information
`homogeneity`	Homogeneity score
`completeness`	Completeness score
`v_measure`	V-measure
`fmi`	Fowlkes-Mallows Index

Additional clustering validation metrics

Attribute	Metric
`purity`	Purity score
`inverse_purity`	Inverse purity score
`clustering_accuracy`	Hungarian-matched clustering accuracy

Pairwise metrics

Attribute	Metric
`pairwise_precision`	Pairwise precision
`pairwise_recall`	Pairwise recall
`pairwise_f1`	Pairwise F1 score

BCubed metrics

Attribute	Metric
`bcubed_precision`	BCubed precision
`bcubed_recall`	BCubed recall
`bcubed_f1`	BCubed F1 score

Installation

You can install extclustval from PyPI:

pip install extclustval

Quick start

from extclustval import ClusterScore

y_true = [0, 0, 1, 1, 2, 2]
y_pred = [1, 1, 0, 0, 2, 2]

score = ClusterScore(y_true, y_pred)

print(score.ari)
print(score.nmi)
print(score.clustering_accuracy)

Output:

1.0
1.0
1.0

You can also access metrics directly as attributes:

score.ari
score.ri
score.nmi
score.ami
score.homogeneity
score.completeness
score.v_measure
score.fmi
score.purity
score.inverse_purity
score.clustering_accuracy
score.acc
score.pairwise_precision
score.pairwise_recall
score.pairwise_f1
score.bcubed_precision
score.bcubed_recall
score.bcubed_f1

Dictionary output

You can return all scores as a dictionary:

scores = score.to_dict()
print(scores)

Example:

{
    "ari": 1.0,
    "ri": 1.0,
    "nmi": 1.0,
    "ami": 1.0,
    "homogeneity": 1.0,
    "completeness": 1.0,
    "v_measure": 1.0,
    "fmi": 1.0,
    "purity": 1.0,
    "inverse_purity": 1.0,
    "clustering_accuracy": 1.0,
    "acc": 1.0,
    "pairwise_precision": 1.0,
    "pairwise_recall": 1.0,
    "pairwise_f1": 1.0,
    "bcubed_precision": 1.0,
    "bcubed_recall": 1.0,
    "bcubed_f1": 1.0,
}

Notes about clustering accuracy

Clustering labels are arbitrary. For example, these two clusterings are equivalent:

[0, 0, 1, 1]
[5, 5, 9, 9]

Because of this, extclustval computes clustering accuracy using optimal Hungarian matching between predicted clusters and ground-truth classes.

score.clustering_accuracy

The short alias score.acc is also available and returns the same value.

This metric is most appropriate when the number of predicted clusters roughly matches the number of ground-truth classes.

For general clustering evaluation, adjusted and permutation-invariant metrics such as ARI, AMI, NMI, pairwise F1, and BCubed F1 are often safer to report.

Notes about purity

Purity is easy to understand, but it is biased toward solutions with many clusters. If each sample is placed in its own cluster, purity can become artificially high.

Use purity together with other metrics such as ARI, AMI, pairwise F1, or BCubed F1.

Cached properties

ClusterScore uses cached properties.

This means each score is computed once and then stored.

score = ClusterScore(y_true, y_pred)

score.ari  # computed once
score.ari  # reused from cache

If you want to evaluate different labels, create a new ClusterScore object:

score = ClusterScore(y_true, y_pred)

new_score = ClusterScore(y_true, new_y_pred)

Do not modify score.y_true or score.y_pred after creating the object.

Metric definitions

Rand Index (RI)

Rand Index measures how similar two partitions are by looking at all possible pairs of samples.

For every pair of samples, RI checks whether the true labels and predicted clusters agree:

the pair is in the same true class and also in the same predicted cluster, or
the pair is in different true classes and also in different predicted clusters.

The score is the fraction of pairs where this agreement happens.

A perfect clustering scores 1.0. However, RI is not adjusted for chance, so it can sometimes look high even when the clustering is not very meaningful, especially when many sample pairs are easy to separate.

Adjusted Rand Index (ARI)

Adjusted Rand Index is a chance-adjusted version of the Rand Index.

Like RI, ARI compares pairs of samples and checks whether the true labels and predicted clusters agree. The difference is that ARI corrects for the agreement that would be expected just by random chance.

A perfect clustering scores 1.0. Random clusterings tend to score near 0.0. Bad clusterings can score below 0.0.

ARI is one of the most commonly used external clustering validation metrics because it is permutation-invariant and adjusted for chance.

Normalized Mutual Information (NMI)

Normalized Mutual Information measures how much information the predicted clusters contain about the true labels.

If knowing a sample’s predicted cluster tells you a lot about its true class, NMI is high. If the predicted clusters and true labels are mostly unrelated, NMI is low.

NMI is normalized so that a perfect match scores 1.0. It is permutation-invariant, meaning it does not matter which numeric IDs are used for the clusters. However, NMI is not adjusted for chance.

Adjusted Mutual Information (AMI)

Adjusted Mutual Information is a chance-adjusted version of mutual information.

Like NMI, AMI measures how much information is shared between the predicted clusters and the true labels. Unlike NMI, AMI corrects for the amount of information that would be expected by random cluster assignments.

A perfect clustering scores 1.0. Random clusterings tend to score near 0.0.

AMI is useful when comparing clustering results with different numbers of clusters, because it is less likely than NMI to reward structure that appears only by chance.

Homogeneity

Homogeneity measures whether each predicted cluster contains samples from only one ground-truth class.

A clustering has high homogeneity when its clusters are pure. For example, if one predicted cluster contains only samples from class A, that cluster is homogeneous.

Homogeneity penalizes clusters that mix multiple true classes together. However, homogeneity alone does not penalize splitting one true class into many small clusters.

Completeness

Completeness measures whether all samples from the same ground-truth class are assigned to the same predicted cluster.

A clustering has high completeness when each true class is mostly captured by one cluster. For example, if all samples from class A are placed in the same predicted cluster, completeness is high for that class.

Completeness penalizes splitting a true class across multiple clusters. However, completeness alone does not strongly penalize merging different true classes into the same cluster.

V-measure

V-measure combines homogeneity and completeness into one score.

It is the harmonic mean of homogeneity and completeness, so it rewards clusterings that are both class-pure and class-complete.

A perfect clustering scores 1.0. V-measure is useful when you want a single score that balances over-splitting and over-merging.

Fowlkes-Mallows Index (FMI)

Fowlkes-Mallows Index is a pair-based clustering metric.

It looks at pairs of samples and compares which pairs are placed together in the predicted clustering and which pairs truly belong together according to the ground-truth labels.

FMI is the geometric mean of pairwise precision and pairwise recall. A perfect clustering scores 1.0.

It is useful when you want a score based on pairwise grouping behavior.

Purity

Purity measures how much each predicted cluster is dominated by a single ground-truth class.

For each predicted cluster, purity finds the most common true class inside that cluster. It then sums these dominant-class counts across all clusters and divides by the total number of samples.

Purity is easy to understand: high purity means clusters mostly contain samples from one class.

However, purity is biased toward many clusters. If every sample is placed in its own cluster, purity becomes perfect, even though the clustering may not be useful.

Inverse purity

Inverse purity is the class-oriented counterpart of purity.

Instead of asking whether each predicted cluster is dominated by one true class, inverse purity asks whether each true class is well captured by one predicted cluster.

For each ground-truth class, it finds the predicted cluster that contains the largest number of samples from that class. These counts are summed and divided by the total number of samples.

Inverse purity rewards clusterings that avoid splitting true classes across many clusters.

Clustering accuracy

Clustering accuracy compares predicted cluster labels with ground-truth labels after aligning cluster IDs to class IDs.

Raw classification accuracy is not appropriate for clustering because cluster labels are arbitrary. For example, cluster 0 and cluster 5 could represent the same group.

To handle this, extclustval uses optimal Hungarian matching to find the best one-to-one mapping between predicted clusters and true classes. It then computes the fraction of samples that are correctly matched under that mapping.

This metric is most appropriate when the predicted clusters roughly correspond one-to-one with the ground-truth classes. It can be misleading when the number of clusters and classes differs a lot, or when the clustering intentionally splits or merges classes.

Pairwise precision

Pairwise precision measures how reliable predicted same-cluster decisions are.

It looks at all sample pairs that were placed in the same predicted cluster. Among those pairs, it measures how many truly belong to the same ground-truth class.

High pairwise precision means the clustering makes few incorrect merges.

Pairwise recall

Pairwise recall measures how well true same-class pairs are recovered.

It looks at all sample pairs that belong to the same ground-truth class. Among those pairs, it measures how many were placed in the same predicted cluster.

High pairwise recall means the clustering makes few incorrect splits.

Pairwise F1

Pairwise F1 is the harmonic mean of pairwise precision and pairwise recall.

It balances two types of clustering errors:

merging samples that should be separate, and
splitting samples that should be together.

Pairwise F1 is often a better clustering-native alternative to classification F1.

BCubed precision

BCubed precision measures cluster purity from each sample’s point of view.

For each sample, it looks at the predicted cluster containing that sample and checks what fraction of that cluster has the same true label as the sample. These per-sample precision values are averaged across all samples.

High BCubed precision means samples tend to be placed in clusters containing mostly members of their own true class.

BCubed recall

BCubed recall measures class recovery from each sample’s point of view.

For each sample, it looks at the sample’s true class and checks what fraction of that class appears in the same predicted cluster as the sample. These per-sample recall values are averaged across all samples.

High BCubed recall means samples from the same true class tend to stay together.

BCubed F1

BCubed F1 is the harmonic mean of BCubed precision and BCubed recall.

It balances sample-level cluster purity and sample-level class recovery.

BCubed F1 is useful when clusters have different sizes or when you want a metric that evaluates clustering quality from the perspective of individual samples.

Requirements

numpy
scipy
scikit-learn

License

This project is licensed under the MIT License.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

a.semoglou

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

May 23, 2026

0.1.1

May 23, 2026

0.1.0

May 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extclustval-0.1.2.tar.gz (8.7 kB view details)

Uploaded May 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

extclustval-0.1.2-py3-none-any.whl (8.6 kB view details)

Uploaded May 23, 2026 Python 3

File details

Details for the file extclustval-0.1.2.tar.gz.

File metadata

Download URL: extclustval-0.1.2.tar.gz
Upload date: May 23, 2026
Size: 8.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for extclustval-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`02f3a3cd59e711709c6f8a079f17c1d2b90d10ae49728ad915eb19adb1a39283`
MD5	`8295bc65484977bcfdefffa5d0ee5dd4`
BLAKE2b-256	`b24a03d1f9a8187ade35363d3ceadc320f761895f4bb919fdbeafc652cf86242`

See more details on using hashes here.

Provenance

The following attestation bundles were made for extclustval-0.1.2.tar.gz:

Publisher: python-publish.yml on semoglou/extclustval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: extclustval-0.1.2.tar.gz
- Subject digest: 02f3a3cd59e711709c6f8a079f17c1d2b90d10ae49728ad915eb19adb1a39283
- Sigstore transparency entry: 1615014392
- Sigstore integration time: May 23, 2026
Source repository:
- Permalink: semoglou/extclustval@90ff79d8ec9c4c846a5f2f2537bb03c17c6519bb
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/semoglou
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@90ff79d8ec9c4c846a5f2f2537bb03c17c6519bb
- Trigger Event: release

File details

Details for the file extclustval-0.1.2-py3-none-any.whl.

File metadata

Download URL: extclustval-0.1.2-py3-none-any.whl
Upload date: May 23, 2026
Size: 8.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for extclustval-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`597cbd8ca75cfc69dd584fa17ffdab1aaf3a4e532bcfeb68ceed74c9e4028daf`
MD5	`4bcf9ecfec036ba7d5a5953832ae8354`
BLAKE2b-256	`c30c4f456834f9f504da7d4941937750ce6822d29537adb2c40fb92fba09e55f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for extclustval-0.1.2-py3-none-any.whl:

Publisher: python-publish.yml on semoglou/extclustval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: extclustval-0.1.2-py3-none-any.whl
- Subject digest: 597cbd8ca75cfc69dd584fa17ffdab1aaf3a4e532bcfeb68ceed74c9e4028daf
- Sigstore transparency entry: 1615014409
- Sigstore integration time: May 23, 2026
Source repository:
- Permalink: semoglou/extclustval@90ff79d8ec9c4c846a5f2f2537bb03c17c6519bb
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/semoglou
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@90ff79d8ec9c4c846a5f2f2537bb03c17c6519bb
- Trigger Event: release

extclustval 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

extclustval

Related packages

Metrics included

Standard external clustering metrics

Additional clustering validation metrics

Pairwise metrics

BCubed metrics

Installation

Quick start

Dictionary output

Notes about clustering accuracy

Notes about purity

Cached properties

Metric definitions

Rand Index (RI)

Adjusted Rand Index (ARI)

Normalized Mutual Information (NMI)

Adjusted Mutual Information (AMI)

Homogeneity

Completeness

V-measure

Fowlkes-Mallows Index (FMI)

Purity

Inverse purity

Clustering accuracy

Pairwise precision

Pairwise recall

Pairwise F1

BCubed precision

BCubed recall

BCubed F1

Requirements

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance