Skip to main content

Expanding Explainable K-Means Clustering

Project description

ExKMC

This repository is the official implementation of ExKMC: Expanding Explainable k-Means Clustering.

We study algorithms for k-means clustering, focusing on a trade-off between explainability and accuracy. We partition a dataset into k clusters via a small decision tree. This enables us to explain each cluster assignment by a short sequence of single-feature thresholds. While larger trees produce more accurate clusterings, they also require more complex explanations. To allow flexibility, we develop a new explainable k-means clustering algorithm, ExKMC, that takes an additional parameter k' ≥ k and outputs a decision tree with k' leaves. We use a new surrogate cost to efficiently expand the tree and to label the leaves with one of k clusters. We prove that as k' increases, the surrogate cost is non-increasing, and hence, we trade explainability for accuracy.

Installation

The package is on PyPI. Simply run:

pip install ExKMC

Usage

from ExKMC.Tree import Tree
from sklearn.datasets import make_blobs

# Create dataset
n = 100
d = 10
k = 3
X, _ = make_blobs(n, d, k, cluster_std=3.0)

# Initialize tree with up to 6 leaves, predicting 3 clusters
tree = Tree(k=k, max_leaves=2*k) 

# Construct the tree, and return cluster labels
prediction = tree.fit_predict(X)

# Tree plot saved to filename
tree.plot('filename')

Notebooks

Usage examples:

Citation

If you use ExKMC in your research we would appreciate a citation to the appropriate paper(s):

  • For IMM base tree you can read our ICML 2020 paper.
    @article{dasgupta2020explainable,
      title={Explainable $k$-Means and $k$-Medians Clustering},
      author={Dasgupta, Sanjoy and Frost, Nave and Moshkovitz, Michal and Rashtchian, Cyrus},
      journal={arXiv preprint arXiv:2002.12538},
      year={2020}
    }
    
  • For ExKMC expansion you can read our paper.
    @article{frost2020exkmc,
      title={ExKMC: Expanding Explainable $k$-Means Clustering},
      author={Frost, Nave and Moshkovitz, Michal and Rashtchian, Cyrus},
      journal={arXiv preprint arXiv:2006.02399},
      year={2020}
    }
    

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ExKMC-0.0.3.tar.gz (139.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ExKMC-0.0.3-cp37-cp37m-win_amd64.whl (81.5 kB view details)

Uploaded CPython 3.7mWindows x86-64

File details

Details for the file ExKMC-0.0.3.tar.gz.

File metadata

  • Download URL: ExKMC-0.0.3.tar.gz
  • Upload date:
  • Size: 139.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.3

File hashes

Hashes for ExKMC-0.0.3.tar.gz
Algorithm Hash digest
SHA256 97b1cd7bad2dff36b855b21943526592b2ff5f8ffd181d9f4da3f3bac3295c10
MD5 98a87a0867275f059902caa3d774f9ce
BLAKE2b-256 47a9e3870f54a6b7f44744e77c7fef50daae174292c8c477b3d186e0579b11cd

See more details on using hashes here.

File details

Details for the file ExKMC-0.0.3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: ExKMC-0.0.3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 81.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.3

File hashes

Hashes for ExKMC-0.0.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 cb7b1b03a1dafcfa87e922fef41e594cf48413aac1b712c1e42265653f8bea30
MD5 17d3e38b7f0b32b9f4e1294fb7b31504
BLAKE2b-256 e608478d03e4e77eeec206f0da7ae58e696ccb1e9ff8bdbcec48df1d7d73e5e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page