OverlapIndex (OI), an Incremental Cluster Validity index for identifying the degree of overlap of data classes.
Project description
OverlapIndex (OI)
This package provides an implementation of the Overlap Index (OI), an incremental cluster validity index (iCVI) designed to quantify the degree of overlap between data classes or clusters. The OI is updated online, sample by sample or in batches, and is particularly suited for streaming, continual learning, and representation analysis.
The implementation is built on ARTMAP-based clustering (Fuzzy ART or Hypersphere ART), leveraging the dynamic clustering properties of Adaptive Resonance Theory to track class overlap as new data (and classes) arrive.
Installation
To install OverlapIndex, simply use pip:
pip install overlapindex
Or to install directly from the most recent source:
pip install git+https://github.com/NiklasMelton/OverlapIndex.git@develop
Overview
The Overlap Index is bounded in the interval [0, 1] and has the following interpretation:
-
OI = 1.0
Indicates perfect class separation (no overlap). -
OI = 0.5
Indicates complete overlap between classes. -
OI < 0.5
Indicates a degenerate or pathological case in the data distribution.
The index is computed incrementally by tracking shared cluster activations between pairs of classes and aggregating class-wise overlap into a global measure.
Key Properties
-
Incremental / Online
Supports streaming updates viaadd_sampleand mini-batch updates viaadd_batch. New classes can be introduced at any time, enabling analysis of incremental learning scenarios. -
Label-Aware
Can be applied both to labeled raw data and to intermediate representations (e.g., neural network activations). -
Geometry-Agnostic
Works well on arbitrary geometric structures of data. No geometric constraints are assumed.
Typical Use Cases
The Overlap Index can be used in several settings:
-
Unsupervised clustering evaluation
As an iCVI, OI provides insight into the quality of a clustering partition as it evolves over time. -
Class separability analysis
Measures the degree of overlap in labeled datasets without requiring a classifier. -
Representation monitoring in deep learning
Tracks how class separation changes across layers or training epochs. -
Backbone evaluation for transfer learning
Compares feature extractors, where higher OI values indicate better class separation in the backbone embeddings.
Implementation Notes
- ART-based clustering is performed using
artlib’sFuzzyARTMAPorHypersphereARTMAP. - Inputs are complement coded, following standard ART practice.
- Overlap is estimated by monitoring shared best-matching units (BMUs) between class pairs.
- The global OI is computed as the mean of per-class minimum pairwise overlap scores.
Basic Usage
from overlapindex import OverlapIndex
oi = OverlapIndex(
rho=0.9,
ART="Hypersphere",
match_tracking="MT+"
)
# Incremental update
for x, y in stream:
score = oi.add_sample(x, y)
# Or batch update
score = oi.add_batch(X, Y)
The returned value is the current Overlap Index after the update.
Iris Dataset Example
from sklearn.datasets import load_iris
import numpy as np
from overlapindex import OverlapIndex
# Load dataset
iris = load_iris()
# Feature matrix (shape: [150, 4])
X = iris.data.astype(np.float64)
# Target vector (shape: [150,])
y = iris.target.astype(np.int64)
# Normalize the data (required)
x_max = X.max(axis=0)
x_min = X.min(axis=0)
X = (X - x_min) / (x_max - x_min)
# Instantiate the OI object
OI = OverlapIndex()
# Calculate the Overlap Index
oi = OI.add_batch(X, y)
print(oi)
# Output:
# 0.9266666666666666
Parameters
-
rho(float)
Vigilance parameter controlling cluster granularity. -
r_hat(float, Hypersphere ART only)
Maximum cluster radius. -
ART("Fuzzy" | "Hypersphere")
Choice of ART module. -
match_tracking(str)
Match-tracking strategy used during ARTMAP learning.
The default parameters are likely to satisfy most use cases. For very large datasets,
it may be necessary to use smaller rho values (0.5-0.7) to improve run-time
performance.
Output
-
index
Global Overlap Index across all observed classes. -
singleton_index[y]
Minimum pairwise overlap score for classy. -
pairwise_index[(y, b)]
Pairwise overlap score between classesyandb.
Intended Audience
This package is intended for researchers and practitioners working on:
- incremental and continual learning,
- clustering validation,
- representation learning,
- transfer learning
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file overlapindex-0.1.1.tar.gz.
File metadata
- Download URL: overlapindex-0.1.1.tar.gz
- Upload date:
- Size: 13.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de587bcb0db2da455c9b5e4d08cc927338f75b7363442a854edd6f6984c895a4
|
|
| MD5 |
e189f4d65b2e2720208c146eaf61dced
|
|
| BLAKE2b-256 |
4d81dc9c0539d9975bde4f07d214e2098095b5dbbfd4ad33707269216d735c7e
|
Provenance
The following attestation bundles were made for overlapindex-0.1.1.tar.gz:
Publisher:
pypi_publish.yml on NiklasMelton/OverlapIndex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
overlapindex-0.1.1.tar.gz -
Subject digest:
de587bcb0db2da455c9b5e4d08cc927338f75b7363442a854edd6f6984c895a4 - Sigstore transparency entry: 1704343389
- Sigstore integration time:
-
Permalink:
NiklasMelton/OverlapIndex@44cb70d4f4b9bec66fad97e51f41f8a1c4d039af -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/NiklasMelton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@44cb70d4f4b9bec66fad97e51f41f8a1c4d039af -
Trigger Event:
release
-
Statement type:
File details
Details for the file overlapindex-0.1.1-py3-none-any.whl.
File metadata
- Download URL: overlapindex-0.1.1-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41d274ab244107ce1796da973884540d9b9b6405538b9f50376486d11f145c9c
|
|
| MD5 |
ade33ec94dd44acc1144b17d70295d09
|
|
| BLAKE2b-256 |
e224129bca7faf4f8d58d70896c061fb951d9eea11222740d0a01b490773636a
|
Provenance
The following attestation bundles were made for overlapindex-0.1.1-py3-none-any.whl:
Publisher:
pypi_publish.yml on NiklasMelton/OverlapIndex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
overlapindex-0.1.1-py3-none-any.whl -
Subject digest:
41d274ab244107ce1796da973884540d9b9b6405538b9f50376486d11f145c9c - Sigstore transparency entry: 1704343439
- Sigstore integration time:
-
Permalink:
NiklasMelton/OverlapIndex@44cb70d4f4b9bec66fad97e51f41f8a1c4d039af -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/NiklasMelton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_publish.yml@44cb70d4f4b9bec66fad97e51f41f8a1c4d039af -
Trigger Event:
release
-
Statement type: