Cluster LOCO feature importance methods for clustering interpretability

Project description

ClusterLOCO

Cluster LOCO: Feature Importance for Interpreting Clusters

clusterloco is a Python package for Cluster LOCO feature importance methods for clustering interpretability. Cluster LOCO is a model-agnostic framework for quantifying feature importance in clustering. The package provides methods that evaluate how much removing a feature affects the generalizability and stability of a clustering solution, enabling feature-level interpretation for unsupervised learning workflows.

Installation

Package

Our package can be installed directly from PyPI:

pip install clusterloco

or for additional experiment dependencies, use

pip install "clusterloco[experiments]"

Development installation

Clone the repository and install the package in editable mode:

git clone https://github.com/DataSlingers/ClusterLOCO.git
cd ClusterLOCO
pip install -e .

To check that the package is correctly installed:

python -c "import clim; print(clim.__file__)"
python -c "from clim import ClusterLOCOMP; print('import ok')"

To build the package:

python -m pip install build
python -m build

This should create a source distribution and wheel in the dist/directory.

For experiment dependencies, such as anndata and scanpy, install:

pip install -e ".[experiments]"

Requirements

The core package requires Python 3.10 or higher. Core dependencies include:

numpy
scipy
pandas
scikit-learn
joblib
tqdm
matplotlib
seaborn
leidenalg
igraph

Optional experiment dependencies include:

anndata
scanpy

Get started

The package offers Cluster LOCO via data splitting, via minipatches and with adaptive recursive trimming. Two example notebooks for running our models are available under the example folder with simulated data and a real application to PBMC 68k data. For the latter you will need to install anndata and scanpy.

Cluster LOCO Split

Cluster LOCO Split is recommended for data with few features (less than 10 features).

from clim.data_splitting import Cluster_LOCO_Split
from clim.utils import hinge_error

Basic usage: for any sklearn clustering algorithm, default transfer classifier is RandomForestClassifier.

from sklearn.cluster import SpectralClustering
model = SpectralClustering(n_clusters=K) 
feature_importance, feature_importance_se = Cluster_LOCO_Split(X_train, X_test, model=model, error_metric=hinge_error, use_proba=True, seed=42)

Cluster LOCO-MP

Cluster LOCO-MP implements a minipatch ensemble version of Cluster LOCO. This approach is suited for large data.

from clim import ClusterLOCOMP

Basic usage: for any sklearn clustering algorithm, first fit() the minipatch model, then compute the feature importance via score(). We recommend to use parallelization during model fitting but not during computing scores where the overhead can be consequential.

g = ClusterLOCOMP(base_clusterer = model, base_classifier = RandomForestClassifier(), K=3, B=500)
g.fit(X, standardize=False, alpha_N = 0.2, alpha_M = 0.2, parallel=par)
out = g.score(error_metric=hinge_error, agg='mean', proba_error=True, parallel_features=False)

Cluster LOCO-RAMPART

Cluster LOCO-RAMPART is a sped-up version of Cluster LOCO-MP based on adaptive recursive trimming of active feature set. We recommend using this with high-dimensional data.

Basic usage:

from clim import ClusterLOCO_RAMPART, RAMPART
from clim.utils import transform_scores_to_ranking

RAMPART directly fits the model and computes the scores.

gen_fn = ClusterLOCO_RAMPART(base_clusterer=model, K=3, error_metric=hinge_error, parallel_MP=True,
    parallel={"n_jobs_features": 3, "backend": "loky", "prefer": "processes", "verbose": 0}, 
    standardize=False, alpha_N = 0.2, alpha_M = 0.2)
out = RAMPART(X, generalizability_fn=gen_fn, B=1000, ranking_fn=transform_scores_to_ranking, top_k=50)

Package structure

ClusterLOCO/
├── pyproject.toml
├── README.md
└── clim/
    ├── __init__.py
    ├── minipatches/
    ├── data_splitting/
    ├── models/
    └── utils/
└── benchmarking/
└── simulations/
└── example/
└── paper_figures/

This repository additionally contains code to reproduce the results from our paper: the folder paper_figures contains the notebook to make the main figures from our paper.

Citation

If you use this package, please cite the corresponding Cluster LOCO paper.

@preprint{he2026clusterloco,
  title={Cluster LOCO: Feature Importance for Interpreting Clusters},
  author={He, Claire and Allen, Genevera},
  url={https://arxiv.org/pdf/2606.14592},
  year={2026}
}

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jun 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clusterloco-0.1.0.tar.gz (39.3 kB view details)

Uploaded Jun 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clusterloco-0.1.0-py3-none-any.whl (42.3 kB view details)

Uploaded Jun 15, 2026 Python 3

File details

Details for the file clusterloco-0.1.0.tar.gz.

File metadata

Download URL: clusterloco-0.1.0.tar.gz
Upload date: Jun 15, 2026
Size: 39.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clusterloco-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0230e1a400eaee8e783c69d79a45caf52ce13ccde445d7eae5a2d30c02212912`
MD5	`1075bc5a14554421dd25b06104ba82bb`
BLAKE2b-256	`585cd373dd6a939cb6279176aa0f84f5ce4ab7fad9180432d5d51eb37e5105c0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for clusterloco-0.1.0.tar.gz:

Publisher: release.yml on DataSlingers/ClusterLOCO

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: clusterloco-0.1.0.tar.gz
- Subject digest: 0230e1a400eaee8e783c69d79a45caf52ce13ccde445d7eae5a2d30c02212912
- Sigstore transparency entry: 1827706366
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: DataSlingers/ClusterLOCO@8c4c81aa9625fa603713fe0711766327ebd4b17f
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/DataSlingers
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c4c81aa9625fa603713fe0711766327ebd4b17f
- Trigger Event: release

File details

Details for the file clusterloco-0.1.0-py3-none-any.whl.

File metadata

Download URL: clusterloco-0.1.0-py3-none-any.whl
Upload date: Jun 15, 2026
Size: 42.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clusterloco-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a2f86bc0c39f704a8659e5cb0e39ad3b545f8e315521438ea1fb4c98122739ab`
MD5	`a44623ba9cf4e5577adeb7a563867f84`
BLAKE2b-256	`9547a328af3cd66d9d40035dcddb02a6657f32dcc20b4b25e4bc7400951e6f36`

See more details on using hashes here.

Provenance

The following attestation bundles were made for clusterloco-0.1.0-py3-none-any.whl:

Publisher: release.yml on DataSlingers/ClusterLOCO

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: clusterloco-0.1.0-py3-none-any.whl
- Subject digest: a2f86bc0c39f704a8659e5cb0e39ad3b545f8e315521438ea1fb4c98122739ab
- Sigstore transparency entry: 1827706434
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: DataSlingers/ClusterLOCO@8c4c81aa9625fa603713fe0711766327ebd4b17f
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/DataSlingers
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c4c81aa9625fa603713fe0711766327ebd4b17f
- Trigger Event: release

clusterloco 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

ClusterLOCO

Cluster LOCO: Feature Importance for Interpreting Clusters

Installation

Package

Development installation

Requirements

Get started

Cluster LOCO Split

Cluster LOCO-MP

Cluster LOCO-RAMPART

Package structure

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance