OverlapIndex (OI), an Incremental Cluster Validity index for identifying the degree of overlap of data classes.
Project description
OverlapIndex (OI)
This package provides an implementation of the Overlap Index (OI), a cluster-validity measure designed to quantify the degree of overlap between data classes or clusters. The OI can be updated online with ARTMAP-based backends, or computed in batch with offline clustering backends, making it useful for streaming, continual learning, large-scale representation analysis, and embedding-space diagnostics.
The implementation supports multiple swappable clustering backends:
- Fuzzy ARTMAP and Hypersphere ARTMAP for incremental / online updates.
- KMeans and MiniBatchKMeans for offline centroid-based analysis.
- BallCover for offline greedy landmark-ball covers, useful when the goal is to preserve class-support geometry for downstream shape or topology analysis.
Installation
To install OverlapIndex, simply use pip:
pip install overlapindex
That installs the default batch-oriented dependencies. To enable the incremental ART backends as well, install the optional ART extra:
pip install "overlapindex[art]"
The core package and optional art extra support Python 3.9 through 3.14.
Or to install directly from the most recent source:
pip install git+https://github.com/NiklasMelton/OverlapIndex.git@develop
Overview
The Overlap Index is bounded in the interval [0, 1] and has the following interpretation:
-
OI = 1.0
Indicates perfect class separation (no overlap). -
OI = 0.5
Indicates complete overlap between classes. -
OI < 0.5
Indicates a degenerate or pathological case in the data distribution.
The index is computed incrementally by tracking shared cluster activations between pairs of classes and aggregating class-wise overlap into a global measure.
Key Properties
-
Incremental and Offline Modes
ARTMAP backends support streaming updates viaadd_sampleand mini-batch updates viaadd_batch. Offline backends such asKMeans,MiniBatchKMeans, andBallCoversupport batch computation throughadd_batch. -
Label-Aware
Can be applied both to labeled raw data and to intermediate representations (e.g., neural network activations). -
Geometry-Agnostic
Works well on arbitrary geometric structures of data. No geometric constraints are assumed.
Typical Use Cases
The Overlap Index can be used in several settings:
-
Unsupervised clustering evaluation
As an iCVI, OI provides insight into the quality of a clustering partition as it evolves over time. -
Class separability analysis
Measures the degree of overlap in labeled datasets without requiring a classifier. -
Representation monitoring in deep learning
Tracks how class separation changes across layers or training epochs. -
Backbone evaluation for transfer learning
Compares feature extractors, where higher OI values indicate better class separation in the backbone embeddings.
Implementation Notes
- ART-based clustering is performed using
artlib’sFuzzyARTMAPorHypersphereARTMAP. artlibis an optional dependency and is only required when using the"Fuzzy"or"Hypersphere"backends.- Offline centroid backends fit one clustering model per class and concatenate the resulting class-owned prototypes into global cluster ids.
- The
BallCoverbackend fits one greedy ball cover per class and treats ball centers as class-owned prototypes. - Normalize input features before fitting. Examples in this repository use
MinMaxScalerfor convenience. - ART backends complement-code inputs internally and therefore require features in the
[0, 1]interval. - Offline backends (
KMeans,MiniBatchKMeans, andBallCover) consume normalized features directly and do not apply complement coding. - Overlap is estimated by monitoring shared best-matching units (BMUs) or top prototype activations between class pairs.
- The global OI is computed as the mean of per-class minimum pairwise overlap scores.
Basic Usage
from sklearn.preprocessing import MinMaxScaler
from overlapindex import OverlapIndex
# Normalize features before fitting.
X = MinMaxScaler().fit_transform(X)
# MiniBatchKMeans is the default backend and is recommended for most offline use cases.
oi = OverlapIndex(
kmeans_k=10,
kmeans_kwargs={"random_state": 0},
)
# sklearn-style API
oi.fit(X, y)
score = oi.index
The fitted value is available through oi.index. For users who prefer update methods that return the current score directly, add_batch(X, y) is also supported.
Online ARTMAP Usage
from overlapindex import OverlapIndex
# For ARTMAP backends, batches should already be scaled into [0, 1].
oi = OverlapIndex(
model_type="Hypersphere",
rho=0.9,
match_tracking="MT+",
)
for X_batch, y_batch in stream:
oi.partial_fit(X_batch, y_batch)
score = oi.index
For single-sample streams, ARTMAP backends also support add_sample(x, y), which updates the model and returns the current score directly. Labeled mini-batches can also be passed to add_batch(X, y).
API Styles
OverlapIndex supports both sklearn-style methods and direct score-returning update methods:
| Method | Returns | Typical use |
|---|---|---|
fit(X, y) |
self |
Full offline fitting on a labeled dataset. |
partial_fit(X, y) |
self |
Incremental batch updates for ARTMAP backends; offline backends refit on the provided batch. |
score() / score(X, y) |
float |
Read the current index, or refit on labeled data and return the new score. |
predict(X) |
np.ndarray |
Return the highest-scoring global prototype id for each sample. |
fit_predict(X, y) |
np.ndarray |
Fit and return per-sample prototype ids. |
add_batch(X, y) |
float |
Batch update when the current OI score is needed immediately. |
add_sample(x, y) |
float |
Single-sample online update for ARTMAP backends. |
After fit or partial_fit, read the current score from oi.index or call score().
For model_type="KMeans", model_type="MiniBatchKMeans", and
model_type="BallCover", partial_fit(X, y) is a convenience wrapper around
recomputing the index on the provided labeled batch. Only the ARTMAP backends
perform true incremental updates across calls.
If a batch is empty or contains only one unique class, OverlapIndex emits a
RuntimeWarning and leaves the score at its default value of 1.0.
Clustering Backends
OverlapIndex uses model_type="MiniBatchKMeans" by default and supports several backend families through the model_type parameter:
model_type |
Update mode | Description |
|---|---|---|
"Fuzzy" |
Online / batch | Incremental Fuzzy ARTMAP backend. Requires the optional art extra. |
"Hypersphere" |
Online / batch | Incremental Hypersphere ARTMAP backend. Requires the optional art extra. |
"KMeans" |
Offline batch only | Fits one scikit-learn KMeans model per class. |
"MiniBatchKMeans" |
Offline batch only | Default backend. Fits one scikit-learn MiniBatchKMeans model per class; recommended for larger datasets. |
"BallCover" |
Offline batch only | Fits one greedy landmark-ball cover per class. Useful when preserving class-support geometry is important. |
Offline backends should be used with fit or add_batch. They do not support add_sample because their prototypes are fit from a complete labeled batch.
KMeans backend
from overlapindex import OverlapIndex
OI = OverlapIndex(
model_type="KMeans",
kmeans_k=10,
kmeans_kwargs={"random_state": 0},
)
OI.fit(X, y)
score = OI.index
MiniBatchKMeans backend
from overlapindex import OverlapIndex
OI = OverlapIndex(
model_type="MiniBatchKMeans",
kmeans_k=10,
kmeans_kwargs={
"random_state": 0,
"batch_size": 8192,
"n_init": 1,
},
)
OI.fit(X, y)
score = OI.index
BallCover backend
from overlapindex import OverlapIndex
OI = OverlapIndex(
model_type="BallCover",
ballcover_k="auto",
ballcover_radius=0.25,
ballcover_kwargs={
"metric": "auto",
"cover_fraction": 1.0,
},
)
OI.fit(X, y)
score = OI.index
The BallCover backend supports one automatic cover parameter at a time:
ballcover_k="auto"with a fixedballcover_radiusgreedily adds balls until the requested cover fraction is reached.ballcover_k=<int>withballcover_radius="auto"selects a fixed number of landmarks and infers the radius needed to cover the requested fraction of samples.
metric="auto" uses Euclidean distance in lower-dimensional spaces and cosine geometry for high-dimensional inputs such as embedding vectors. Users can override this with metric="euclidean" or metric="cosine".
Iris Dataset Example
from sklearn.datasets import load_iris
import numpy as np
from overlapindex import OverlapIndex
# Load dataset
iris = load_iris()
# Feature matrix (shape: [150, 4])
X = iris.data.astype(np.float64)
# Target vector (shape: [150,])
y = iris.target.astype(np.int64)
# Normalize the data (required)
x_max = X.max(axis=0)
x_min = X.min(axis=0)
X = (X - x_min) / (x_max - x_min)
# Instantiate the OI object
OI = OverlapIndex()
# Calculate the Overlap Index
OI.fit(X, y)
print(OI.index)
# Output:
# 0.9266666666666666
Additional runnable examples are available in the examples/ directory.
Release Verification
For release testing, start from a fresh Poetry environment so the package under
test matches pyproject.toml and poetry.lock:
poetry env remove --all
poetry sync --with dev
poetry run python -c "from overlapindex import OverlapIndex; OverlapIndex(model_type='MiniBatchKMeans')"
poetry run python -m pytest -q tests/test_overlap_index_regression.py
poetry sync --with dev --extras art
poetry run python -c "from overlapindex import OverlapIndex; OverlapIndex(model_type='Hypersphere')"
poetry run python -m pytest -q tests/test_overlap_index_regression.py
poetry check
python -m build
twine check dist/*
The first install verifies that offline backends work without the optional
artlib dependency. The second install verifies the art extra and ARTMAP
backends.
Parameters
-
rho(float)
Vigilance parameter controlling cluster granularity for ARTMAP backends. -
r_hat(float, Hypersphere ARTMAP only)
Maximum cluster radius for the Hypersphere backend. -
model_type("Fuzzy" | "Hypersphere" | "KMeans" | "MiniBatchKMeans" | "BallCover")
Clustering backend used to create class-owned prototypes. Defaults to"MiniBatchKMeans". -
match_tracking(str)
Match-tracking strategy used during ARTMAP learning. -
kmeans_k(int or dict)
Number of clusters per class forKMeansandMiniBatchKMeansbackends. -
kmeans_kwargs(dict, optional)
Keyword arguments forwarded to the selected scikit-learn KMeans backend. -
ballcover_k(int, dict, or "auto")
Number of balls per class, class-specific ball counts, or"auto"for greedy fixed-radius covering. -
ballcover_radius(float, dict, or "auto")
Ball radius, class-specific radii, or"auto"when using a fixed number of balls. -
ballcover_kwargs(dict, optional)
Additional BallCover options such asmetric,cover_fraction,chunk_size,max_balls, andrandom_state.
The default parameters are intended for offline batch use with MiniBatchKMeans. For online or continual-learning workflows, explicitly choose model_type="Fuzzy" or model_type="Hypersphere". For very large ART-based runs, smaller rho values (0.5-0.7) may improve run-time performance.
Output
-
index
Global Overlap Index across all observed classes. -
singleton_index[y]
Minimum pairwise overlap score for classy. -
pairwise_index[(y, b)]
Pairwise overlap score between classesyandb.
Intended Audience
This package is intended for researchers and practitioners working on:
- incremental and continual learning,
- clustering validation,
- representation learning,
- transfer learning
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file overlapindex-0.1.3a1.tar.gz.
File metadata
- Download URL: overlapindex-0.1.3a1.tar.gz
- Upload date:
- Size: 26.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48b44501b9fef38aa23225cb0eb7da15df63b2968b949ed28d5273e7c8009d66
|
|
| MD5 |
2c96b46f3ab2db1f805e813ab3312e01
|
|
| BLAKE2b-256 |
460ea52243d357c1b0eaef100a3b24a1881295c48a9fd189d4e8bee101aa77d8
|
Provenance
The following attestation bundles were made for overlapindex-0.1.3a1.tar.gz:
Publisher:
pypi_publish.yml on NiklasMelton/OverlapIndex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
overlapindex-0.1.3a1.tar.gz -
Subject digest:
48b44501b9fef38aa23225cb0eb7da15df63b2968b949ed28d5273e7c8009d66 - Sigstore transparency entry: 1781435310
- Sigstore integration time:
-
Permalink:
NiklasMelton/OverlapIndex@3657310d4acd19b6d8efb386fb52613383f82476 -
Branch / Tag:
refs/tags/0.1.3a1 - Owner: https://github.com/NiklasMelton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@3657310d4acd19b6d8efb386fb52613383f82476 -
Trigger Event:
release
-
Statement type:
File details
Details for the file overlapindex-0.1.3a1-py3-none-any.whl.
File metadata
- Download URL: overlapindex-0.1.3a1-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de84d4b8952464eda3aae35aff9c8ff5e01bf0348df1252fc9b2c37ff17d5939
|
|
| MD5 |
20d56152456c99918203fa2e4bda8e3a
|
|
| BLAKE2b-256 |
c4b49401f107c8723621520e3698f1354d6c868f70a58ada9ef9c7ff9d70c16d
|
Provenance
The following attestation bundles were made for overlapindex-0.1.3a1-py3-none-any.whl:
Publisher:
pypi_publish.yml on NiklasMelton/OverlapIndex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
overlapindex-0.1.3a1-py3-none-any.whl -
Subject digest:
de84d4b8952464eda3aae35aff9c8ff5e01bf0348df1252fc9b2c37ff17d5939 - Sigstore transparency entry: 1781435402
- Sigstore integration time:
-
Permalink:
NiklasMelton/OverlapIndex@3657310d4acd19b6d8efb386fb52613383f82476 -
Branch / Tag:
refs/tags/0.1.3a1 - Owner: https://github.com/NiklasMelton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@3657310d4acd19b6d8efb386fb52613383f82476 -
Trigger Event:
release
-
Statement type: