An interpretability engine for clustered / segmented data.
Project description
Please note that this open-source library is currently in its beta phase. If you encounter any issues or have suggestions, we encourage you to share them. We are committed to addressing feedback promptly. Your contributions and ideas are greatly appreciated!.
ClusterLens
ClusterLens is an interpretability engine for clustered / segmented data.
You already have clusters - customer segments, user personas, product tiers, risk bands.
ClusterLens answers the harder questions:
- What actually drives each cluster?
- How is Cluster 1 different from Cluster 3 in a statistically meaningful way?
- Which features make Cluster A "high value" or "high risk" compared to others?
- How can I turn a big table into cluster narratives that non-ML stakeholders can read?
ClusterLens sits on top of any clustering method (k-means, GMM, HDBSCAN, rule-based labels, etc.).
All it requires is a DataFrame with a column that holds the cluster labels.
Key ideas
ClusterLens wraps a train-once, reuse-everywhere pipeline:
-
One shared train/test split, stratified by cluster.
-
A one-vs-rest classifier per cluster (RandomForest by default, optional LightGBM / XGBoost).
-
SHAP values computed on a held-out evaluation set for each cluster.
-
A set of utilities that reuse this shared state to give you interpretations and exports:
- Global & per-cluster classification metrics -
get_cluster_classification_stats() - Per-cluster feature rankings -
get_top_shap_features(...),plot_cluster_shap(...) - Contrastive importance between two clusters -
contrastive_importance(...) - Distribution plots across clusters -
compare_feature_across_clusters(...) - Markdown-ready cluster narratives -
generate_cluster_narratives(...) - A cluster summary table and export helpers:
- Global & per-cluster classification metrics -
It is built to be:
- Model-agnostic on the clustering side: ClusterLens never clusters; it interprets the labels you already have.
- Numerically honest: Combines SHAP with effect sizes (
Cohen's d, standardized median gaps, Cramér’s V, lifts). - Report-friendly: Outputs narratives and tables you can drop directly into notebooks, dashboards, or slide decks.
Installation
- From PyPI (recommended):
# Fresh install:
pip install clusterlens
# Upgrade to the latest version:
pip install -U clusterlens
# With optional extras (LightGBM, XGBoost):
pip install -U "clusterlens[lightgbm,xgboost]"
# To pin a specific version:
pip install "clusterlens==0.1.0"
- From GitHub (latest main):
# Install directly from the GitHub repo:
pip install "git+https://github.com/akthammomani/ClusterLens.git"
# With extras:
pip install "clusterlens[lightgbm,xgboost] @ git+https://github.com/akthammomani/ClusterLens.git"
- From a local clone:
git clone https://github.com/akthammomani/ClusterLens.git
cd ClusterLens
# standard install:
pip install .
# or editable (developer) install:
pip install -e .
- Inside a conda or virtual environment (recommended practice):
# Create and activate an environment, then install via pip:
conda create -n clusterlens-env python=3.10
conda activate clusterlens-env
pip install -U clusterlens # or use any of the commands above
After installation you should be able to do:
from clusterlens import ClusterAnalyzer
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clusterlens-0.1.0.tar.gz.
File metadata
- Download URL: clusterlens-0.1.0.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8c59875c8541409d9fdb939facd7259fe020f7cdb6c2d246af3449792763306
|
|
| MD5 |
2c84e82e12739f637fbc9f4c51df1220
|
|
| BLAKE2b-256 |
1a9f68c062d688f54267186e2b20bf0155c931ff151d2ff51d092ccad3ef8c84
|
File details
Details for the file clusterlens-0.1.0-py3-none-any.whl.
File metadata
- Download URL: clusterlens-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd82022594f0644c71c60c6aeaac9c44d39b6121dd542524058e57eeaa36a5df
|
|
| MD5 |
195345db19deaf330bc2978aa044533f
|
|
| BLAKE2b-256 |
396d3d8953d9bb86a16a41719343fd43fea982928bef664b19cdfb8e5e2e379a
|