Skip to main content

Expanding Explainable K-Means Clustering

Project description

ExKMC

This repository is the official implementation of ExKMC: Expanding Explainable k-Means Clustering.

We study algorithms for k-means clustering, focusing on a trade-off between explainability and accuracy. We partition a dataset into k clusters via a small decision tree. This enables us to explain each cluster assignment by a short sequence of single-feature thresholds. While larger trees produce more accurate clusterings, they also require more complex explanations. To allow flexibility, we develop a new explainable k-means clustering algorithm, ExKMC, that takes an additional parameter k' ≥ k and outputs a decision tree with k' leaves. We use a new surrogate cost to efficiently expand the tree and to label the leaves with one of k clusters. We prove that as k' increases, the surrogate cost is non-increasing, and hence, we trade explainability for accuracy.

Installation

The package is on PyPI. Simply run:

pip install ExKMC

Usage

from ExKMC.Tree import Tree
from sklearn.datasets import make_blobs

# Create dataset
n = 100
d = 10
k = 3
X, _ = make_blobs(n, d, k, cluster_std=3.0)

# Initialize tree with up to 6 leaves, predicting 3 clusters
tree = Tree(k=k, max_leaves=2*k) 

# Construct the tree, and return cluster labels
prediction = tree.fit_predict(X)

# Tree plot saved to filename
tree.plot('filename')

Notebooks

Usage examples:

Citation

If you use ExKMC in your research we would appreciate a citation to the appropriate paper(s):

  • For IMM base tree you can read our ICML 2020 paper.
    @article{dasgupta2020explainable,
      title={Explainable $k$-Means and $k$-Medians Clustering},
      author={Dasgupta, Sanjoy and Frost, Nave and Moshkovitz, Michal and Rashtchian, Cyrus},
      journal={arXiv preprint arXiv:2002.12538},
      year={2020}
    }
    
  • For ExKMC expansion you can read our paper.
    @article{frost2020exkmc,
      title={ExKMC: Expanding Explainable $k$-Means Clustering},
      author={Frost, Nave and Moshkovitz, Michal and Rashtchian, Cyrus},
      journal={arXiv preprint arXiv:2006.02399},
      year={2020}
    }
    

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ExKMC-0.0.3.tar.gz (139.6 kB view hashes)

Uploaded source

Built Distribution

ExKMC-0.0.3-cp37-cp37m-win_amd64.whl (81.5 kB view hashes)

Uploaded cp37

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page