No project description provided

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

ClusterExplorer

This repository contains the code for ClusterExplorer, a novel explainability tool for black-box clustering pipelines. Our approach formulates the explanation of clusters as the identification of concise conjunctions of predicates that maximize the coverage of the cluster's data points while minimizing separation from other clusters.

Explaining the results of clustering pipelines

Our approach formulates the explanation of clusters as the identification of concise conjunctions of predicates that maximize the coverage of the cluster's data points while minimizing separation from other clusters. We achieve this by reducing the problem to generalized frequent-itemsets mining (gFIM), where items correspond to explanation predicates, and itemset frequency indicates coverage. To enhance efficiency, we leverage inherent problem properties and implement attribute selection to further reduce computational costs.

Source Code

The source code is located in the cluster-explorer/src directory. This directory contains the following key components:

Explainer:explainer.py Generates rule-based explanations for each cluster using frequent-itemsets mining.
Frequent Itemset Mining:gFIM.py Contains methods for frequent itemset mining..
Clustering Rule Evaluation:ScoreMetrics.py AnalyzeItemsets.py Provides methods to evaluate and summarize the quality of clustering rules based on metrics such as separation error, coverage, and conciseness.
Binning Methods:binning_methods.py Contains methods for binning numeric attributes, including equal width, equal frequency, decision tree-based, and multiclass optimal binning techniques.

Experiment Datasets

Cluster-Explorer was evaluated using a diverse set of 98 clustering results obtained from various clustering pipelines and algorithms. The datasets used in these experiments were sourced from the UCI Machine Learning Repository and cover a wide range of data shapes and sizes.

Datasets Overview

The datasets used in the experiments include:

Dataset	Rows	Attributes	Link
Urban Land Cover	168	148	Link
DARWIN	174	451	Link
Wine	178	13	Link
Flags	194	30	Link
Parkinson Speech	1,040	26	Link
Communities and Crime	1,994	128	Link
Turkiye Student Evaluation	5,820	33	Link
In-vehicle Coupon Recommendation	12,684	23	Link
Human Activity Recognition	10,299	561	Link
Quality Assessment of Digital Colposcopies	30,000	23	Link
RT-IoT2022	123,117	85	Link
Gender by Name	147,270	4	Link
Multivariate Gait Data	181,800	7	Link
Wave Energy Converters	288,000	49	Link
3D Road Network	434,874	4	Link
Year Prediction MSD	515,345	90	Link
Online Retail	1,067,371	8	Link
MetroPT-3 Dataset	1,516,948	15	Link
Taxi Trajectory	1,710,670	9	Link

Clustering Pipelines

The clustering results were generated using 16 different clustering pipelines, each combining various preprocessing steps and clustering algorithms (are located in clustering_pipelines.py). The preprocessing steps included standard scaling for numeric columns, one-hot encoding for categorical data, and dimensionality reduction using PCA. The clustering algorithms used were K-Means, DBSCAN, Birch, Spectral Clustering, and Affinity Propagation.

To use this, you need to provide the datasets folder (first save the datasets in this folder) and the folder to save the pipelines results.

Running the Experiments

For running the experiments (located in cluster-explorer/experiments), you need to provide the folder of the pipelines result for BaselinesExperiment.py. The results will be saved in cluster-explorer/experiments)

Additional Experiments

This folder contains information about our attribute-selection optimization on both the explanation quality and running times. For running the experiments (located in cluster-explorer/additional_experiments), you need to provide the folder of the pipelines result for P_ValueExperiment.py. The results will be saved in cluster-explorer/additional_experiments)

Use Cases and Examples

An example of a simple use case is provided in the example_notebook.ipynb file. In this notebook we generate an explanation rules set from the 'Wine' dataset. For each cluster, the ClusterExplorer generates a set of rules that explain the common properties of the wine samples within that cluster. These rules help in understanding why certain samples are grouped together and what distinguishes one cluster from another.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

somecha YuvalUner

Release history Release notifications | RSS feed

This version

1.0.2

Mar 17, 2025

1.0.1

Feb 7, 2025

1.0.0

Feb 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cluster_explorer-1.0.2.tar.gz (32.5 kB view details)

Uploaded Mar 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cluster_explorer-1.0.2-py3-none-any.whl (32.6 kB view details)

Uploaded Mar 17, 2025 Python 3

File details

Details for the file cluster_explorer-1.0.2.tar.gz.

File metadata

Download URL: cluster_explorer-1.0.2.tar.gz
Upload date: Mar 17, 2025
Size: 32.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cluster_explorer-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`6f2a0fe14835017dc4faf4d01c4954bc7a558605baba167b5c567dc28766ad4c`
MD5	`c0e3ab1f875f6278b290e3f5f8d63c95`
BLAKE2b-256	`f920ba1df7cb675097505e8aff181a89a5fe3ab523204d485fffd19589849e56`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cluster_explorer-1.0.2.tar.gz:

Publisher: python-publish.yml on analysis-bots/cluster-explorer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cluster_explorer-1.0.2.tar.gz
- Subject digest: 6f2a0fe14835017dc4faf4d01c4954bc7a558605baba167b5c567dc28766ad4c
- Sigstore transparency entry: 183313285
- Sigstore integration time: Mar 17, 2025
Source repository:
- Permalink: analysis-bots/cluster-explorer@4c08b2a8a0fec281e1ab98fe9612b760e450873d
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/analysis-bots
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@4c08b2a8a0fec281e1ab98fe9612b760e450873d
- Trigger Event: release

File details

Details for the file cluster_explorer-1.0.2-py3-none-any.whl.

File metadata

Download URL: cluster_explorer-1.0.2-py3-none-any.whl
Upload date: Mar 17, 2025
Size: 32.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cluster_explorer-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ae031fb328c424bb5cfd557646d5423f0a4823d896e9d487d1ca967d95405c4`
MD5	`393a062072a4e65e3bd9d8c67122552a`
BLAKE2b-256	`741d3b7507b5d8fed0fe14aa04388b73017e379292815c1d5c27b083b128b280`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cluster_explorer-1.0.2-py3-none-any.whl:

Publisher: python-publish.yml on analysis-bots/cluster-explorer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cluster_explorer-1.0.2-py3-none-any.whl
- Subject digest: 7ae031fb328c424bb5cfd557646d5423f0a4823d896e9d487d1ca967d95405c4
- Sigstore transparency entry: 183313292
- Sigstore integration time: Mar 17, 2025
Source repository:
- Permalink: analysis-bots/cluster-explorer@4c08b2a8a0fec281e1ab98fe9612b760e450873d
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/analysis-bots
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@4c08b2a8a0fec281e1ab98fe9612b760e450873d
- Trigger Event: release

cluster-explorer 1.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project description

ClusterExplorer

Explaining the results of clustering pipelines

Source Code

Experiment Datasets

Datasets Overview

Clustering Pipelines

Running the Experiments

Additional Experiments

Use Cases and Examples

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance