Expanding Explainable K-Means Clustering
Project description
ExKMC
This repository is the official implementation of ExKMC: Expanding Explainable k-Means Clustering.
We study algorithms for k-means clustering, focusing on a trade-off between explainability and accuracy. We partition a dataset into k clusters via a small decision tree. This enables us to explain each cluster assignment by a short sequence of single-feature thresholds. While larger trees produce more accurate clusterings, they also require more complex explanations. To allow flexibility, we develop a new explainable k-means clustering algorithm, ExKMC, that takes an additional parameter k' ≥ k and outputs a decision tree with k' leaves. We use a new surrogate cost to efficiently expand the tree and to label the leaves with one of k clusters. We prove that as k' increases, the surrogate cost is non-increasing, and hence, we trade explainability for accuracy.
Installation
The package is on PyPI. Simply run:
pip install ExKMC
Usage
from ExKMC.Tree import Tree
from sklearn.datasets import make_blobs
# Create dataset
n = 100
d = 10
k = 3
X, _ = make_blobs(n, d, k, cluster_std=3.0)
# Initialize tree with up to 6 leaves, predicting 3 clusters
tree = Tree(k=k, max_leaves=2*k)
# Construct the tree, and return cluster labels
prediction = tree.fit_predict(X)
# Tree plot saved to filename
tree.plot('filename')
Notebooks
Usage examples:
Citation
If you use ExKMC in your research we would appreciate a citation to the appropriate paper(s):
- For IMM base tree you can read our ICML 2020 paper.
@article{dasgupta2020explainable, title={Explainable $k$-Means and $k$-Medians Clustering}, author={Dasgupta, Sanjoy and Frost, Nave and Moshkovitz, Michal and Rashtchian, Cyrus}, journal={arXiv preprint arXiv:2002.12538}, year={2020} }
- For ExKMC expansion you can read our paper.
@article{frost2020exkmc, title={ExKMC: Expanding Explainable $k$-Means Clustering}, author={Frost, Nave and Moshkovitz, Michal and Rashtchian, Cyrus}, journal={arXiv preprint arXiv:2006.02399}, year={2020} }
Contact
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ExKMC-0.0.3.tar.gz.
File metadata
- Download URL: ExKMC-0.0.3.tar.gz
- Upload date:
- Size: 139.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97b1cd7bad2dff36b855b21943526592b2ff5f8ffd181d9f4da3f3bac3295c10
|
|
| MD5 |
98a87a0867275f059902caa3d774f9ce
|
|
| BLAKE2b-256 |
47a9e3870f54a6b7f44744e77c7fef50daae174292c8c477b3d186e0579b11cd
|
File details
Details for the file ExKMC-0.0.3-cp37-cp37m-win_amd64.whl.
File metadata
- Download URL: ExKMC-0.0.3-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 81.5 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb7b1b03a1dafcfa87e922fef41e594cf48413aac1b712c1e42265653f8bea30
|
|
| MD5 |
17d3e38b7f0b32b9f4e1294fb7b31504
|
|
| BLAKE2b-256 |
e608478d03e4e77eeec206f0da7ae58e696ccb1e9ff8bdbcec48df1d7d73e5e4
|