Generic Anti-Clustering Algorithm for Creating Maximally Diverse Clusters

These details have not been verified by PyPI

Project links

Project description

Anti-clustering

A generic Python library for solving the anti-clustering problem. While clustering algorithms will achieve high similarity within a cluster and low similarity between clusters, the anti-clustering algorithms will achieve the opposite; namely to minimise similarity within a cluster and maximise the similarity between clusters. Currently, a handful of algorithms are implemented in this library:

An exact approach using a BIP formulation.
An enumerated exchange heuristic.
A simulated annealing heuristic.

Keep in mind anti-clustering is computationally difficult problem and may run slow even for small instance sizes. The current ILP does not finish in reasonable time when anti-clustering the Iris dataset (150 data points).

The two former approaches are implemented as described in following paper:
Papenberg, M., & Klau, G. W. (2021). Using anticlustering to partition data sets into equivalent parts. Psychological Methods, 26(2), 161–174. DOI. Preprint
The paper is accompanied by a library for the R programming language: anticlust.

Differently to the anticlust R package, this library currently only have one objective function. In this library the objective will maximise intra-cluster distance: Euclidean distance for numerical columns and Hamming distance for categorical columns.

Use cases

Within software testing, anti-clustering can be used for generating test and control groups in AB-testing. Example: You have a webshop with a number of users. The webshop is undergoing active development and you have a new feature coming up. This feature should be tested against as many different users as possible without testing against the entire user-base. For that you can create a maximally diverse subset of the user-base to test against (the A group). The remaining users (B group) will not test this feature. For dividing the user-base you can use the anti-clustering algorithms. A and B groups should be as similar as possible to have a reliable basis of comparison, but internally in group A (and B) the elements should be as dissimilar as possible.

This is just one use case, probably many more exists.

Installation

The anti-clustering package is available on PyPI. To install it, run the following command:

pip install anti-clustering

The package currently supports Python 3.8 and above.

Usage

The input to the algorithm is a Pandas dataframe with each row representing a data point. The output is the same dataframe with an extra column containing integer encoded cluster labels. Below is an example based on the Iris dataset:

from anti_clustering import ExactClusterEditingAntiClustering
from sklearn import datasets
import pandas as pd

iris_data = datasets.load_iris(as_frame=True)
iris_df = pd.DataFrame(data=iris_data.data, columns=iris_data.feature_names)

algorithm = ExactClusterEditingAntiClustering()

df = algorithm.run(
    df=iris_df,
    numerical_columns=list(iris_df.columns),
    categorical_columns=None,
    num_groups=2,
    destination_column='Cluster'
)

Contributions

If you have any suggestions or have found a bug, feel free to open issues. If you have implemented a new algorithm or know how to tweak the existing ones; PRs are very appreciated.

License

This library is licensed under the Apache 2.0 license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.1

Sep 4, 2025

0.4.0

Jun 13, 2024

0.3.0

Jul 18, 2023

0.2.1

Sep 9, 2022

0.2.0

Jul 7, 2022

0.1.0

Jun 27, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anti_clustering-0.4.1.tar.gz (13.4 kB view details)

Uploaded Sep 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anti_clustering-0.4.1-py3-none-any.whl (20.5 kB view details)

Uploaded Sep 4, 2025 Python 3

File details

Details for the file anti_clustering-0.4.1.tar.gz.

File metadata

Download URL: anti_clustering-0.4.1.tar.gz
Upload date: Sep 4, 2025
Size: 13.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.11.13 Linux/6.11.0-1018-azure

File hashes

Hashes for anti_clustering-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`04a5adabe362dedbac4fc596885043dd2a763d03342f44127be73729a17c629f`
MD5	`336467acd4792e790e9a1465dda2fa3e`
BLAKE2b-256	`8b26c37c1686b7ed0ef23262eeabd2a68b960436caacf66d82d29fa8c72e576c`

See more details on using hashes here.

File details

Details for the file anti_clustering-0.4.1-py3-none-any.whl.

File metadata

Download URL: anti_clustering-0.4.1-py3-none-any.whl
Upload date: Sep 4, 2025
Size: 20.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.11.13 Linux/6.11.0-1018-azure

File hashes

Hashes for anti_clustering-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f7fc681630cee0b3b365563db522c3709f1da92e48993f397fab4c69d45a54a`
MD5	`cfc52983268867f74108923acc6c5d2f`
BLAKE2b-256	`c952e958550c9c7ef70457a6f6885cea0d7ecc261858fbf7176b94a408aca0ea`

See more details on using hashes here.

anti-clustering 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Anti-clustering

Use cases

Installation

Usage

Contributions

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes