Expanding Explainable K-Means Clustering
Project description
ExKMC
This repository is the official implementation of ExKMC: Expanding Explainable k-Means Clustering.
We study algorithms for k-means clustering, focusing on a trade-off between explainability and accuracy. We partition a dataset into k clusters via a small decision tree. This enables us to explain each cluster assignment by a short sequence of single-feature thresholds. While larger trees produce more accurate clusterings, they also require more complex explanations. To allow flexibility, we develop a new explainable k-means clustering algorithm, ExKMC, that takes an additional parameter k' ≥ k and outputs a decision tree with k' leaves. We use a new surrogate cost to efficiently expand the tree and to label the leaves with one of k clusters. We prove that as k' increases, the surrogate cost is non-increasing, and hence, we trade explainability for accuracy.
Installation
The package is on PyPI. Simply run:
pip install ExKMC
Usage
from ExKMC.Tree import Tree
from sklearn.datasets import make_blobs
# Create dataset
n = 100
d = 10
k = 3
X, _ = make_blobs(n, d, k, cluster_std=3.0)
# Initialize tree with up to 6 leaves, predicting 3 clusters
tree = Tree(k=k, max_leaves=2*k)
# Construct the tree, and return cluster labels
prediction = tree.fit_predict(X)
# Tree plot saved to filename
tree.plot('filename')
Notebooks
Usage examples:
Citation
If you use ExKMC in your research we would appreciate a citation to the appropriate paper(s):
- For IMM base tree you can read our ICML 2020 paper.
@article{dasgupta2020explainable, title={Explainable $k$-Means and $k$-Medians Clustering}, author={Dasgupta, Sanjoy and Frost, Nave and Moshkovitz, Michal and Rashtchian, Cyrus}, journal={arXiv preprint arXiv:2002.12538}, year={2020} }
- For ExKMC expansion you can read our paper.
@article{frost2020exkmc, title={ExKMC: Expanding Explainable $k$-Means Clustering}, author={Frost, Nave and Moshkovitz, Michal and Rashtchian, Cyrus}, journal={arXiv preprint arXiv:2006.02399}, year={2020} }
Contact
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ExKMC-0.0.3-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb7b1b03a1dafcfa87e922fef41e594cf48413aac1b712c1e42265653f8bea30 |
|
MD5 | 17d3e38b7f0b32b9f4e1294fb7b31504 |
|
BLAKE2b-256 | e608478d03e4e77eeec206f0da7ae58e696ccb1e9ff8bdbcec48df1d7d73e5e4 |