Skip to main content

An interpretability engine for clustered / segmented data.

Project description

Python NumPy Pandas Matplotlib Seaborn SciPy scikit-learn RF Encoders Metrics SHAP LightGBM XGBoost Task Contrastive analysis Distributions Narratives Exports License Status

ClusterLens logo

Please note that this open-source library is currently in its beta phase. If you encounter any issues or have suggestions, we encourage you to share them. We are committed to addressing feedback promptly. Your contributions and ideas are greatly appreciated!.

ClusterLens

ClusterLens is an interpretability engine for clustered / segmented data.

You already have clusters - customer segments, user personas, product tiers, risk bands.
ClusterLens answers the harder questions:

  • What actually drives each cluster?
  • How is Cluster 1 different from Cluster 3 in a statistically meaningful way?
  • Which features make Cluster A "high value" or "high risk" compared to others?
  • How can I turn a big table into cluster narratives that non-ML stakeholders can read?

ClusterLens sits on top of any clustering method (k-means, GMM, HDBSCAN, rule-based labels, etc.).
All it requires is a DataFrame with a column that holds the cluster labels.

Key ideas

ClusterLens wraps a train-once, reuse-everywhere pipeline:

  1. One shared train/test split, stratified by cluster.

  2. A one-vs-rest classifier per cluster (RandomForest by default, optional LightGBM / XGBoost).

  3. SHAP values computed on a held-out evaluation set for each cluster.

  4. A set of utilities that reuse this shared state to give you interpretations and exports:

    • Global & per-cluster classification metrics - get_cluster_classification_stats()
    • Per-cluster feature rankings - get_top_shap_features(...), plot_cluster_shap(...)
    • Contrastive importance between two clusters - contrastive_importance(...)
    • Distribution plots across clusters - compare_feature_across_clusters(...)
    • Markdown-ready cluster narratives - generate_cluster_narratives(...)
    • A cluster summary table and export helpers:

It is built to be:

  • Model-agnostic on the clustering side: ClusterLens never clusters; it interprets the labels you already have.
  • Numerically honest: Combines SHAP with effect sizes (Cohen's d, standardized median gaps, Cramér’s V, lifts).
  • Report-friendly: Outputs narratives and tables you can drop directly into notebooks, dashboards, or slide decks.

Installation

  • From PyPI (recommended):
# Fresh install:
pip install clusterlens

# Upgrade to the latest version:
pip install -U clusterlens

# With optional extras (LightGBM, XGBoost):
pip install -U "clusterlens[lightgbm,xgboost]"

# To pin a specific version:
pip install "clusterlens==0.1.0"
  • From GitHub (latest main):
# Install directly from the GitHub repo:
pip install "git+https://github.com/akthammomani/ClusterLens.git"

# With extras:
pip install "clusterlens[lightgbm,xgboost] @ git+https://github.com/akthammomani/ClusterLens.git"
  • From a local clone:
git clone https://github.com/akthammomani/ClusterLens.git
cd ClusterLens

# standard install:
pip install .

# or editable (developer) install:
pip install -e .
  • Inside a conda or virtual environment (recommended practice):
# Create and activate an environment, then install via pip:
conda create -n clusterlens-env python=3.10
conda activate clusterlens-env
pip install -U clusterlens       # or use any of the commands above

After installation you should be able to do:

from clusterlens import ClusterAnalyzer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clusterlens-0.1.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clusterlens-0.1.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file clusterlens-0.1.0.tar.gz.

File metadata

  • Download URL: clusterlens-0.1.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for clusterlens-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c8c59875c8541409d9fdb939facd7259fe020f7cdb6c2d246af3449792763306
MD5 2c84e82e12739f637fbc9f4c51df1220
BLAKE2b-256 1a9f68c062d688f54267186e2b20bf0155c931ff151d2ff51d092ccad3ef8c84

See more details on using hashes here.

File details

Details for the file clusterlens-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: clusterlens-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for clusterlens-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd82022594f0644c71c60c6aeaac9c44d39b6121dd542524058e57eeaa36a5df
MD5 195345db19deaf330bc2978aa044533f
BLAKE2b-256 396d3d8953d9bb86a16a41719343fd43fea982928bef664b19cdfb8e5e2e379a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page