Skip to main content

Auto-select optimal K-means clusters with advanced scoring

Project description

KScorer: Auto-select optimal K-means clusters with advanced scoring

Basic Usage

LIVE demo-notebook is here

Load Modules

In [1]: import numpy as np
   ...: import pandas as pd
   ...: from sklearn import datasets
   ...: from sklearn.metrics import balanced_accuracy_score
   ...: from sklearn.model_selection import train_test_split
   ...: from kscorer.kscorer import KScorer

Init KScorer

In [2]: ks = KScorer()

Get Data

In [3]: X, y = datasets.load_digits(return_X_y=True)
   ...: X.shape
Out[3]: (1797, 64)

Train/Test Split

In [4]: X_train, X_test, y_train, y_test = train_test_split(
   ...:     X, y, test_size=0.2, random_state=1234)

Fit KScorer (i.e. Perform Unsupervised Clustering)

In [5]: labels, centroids, _ = ks.fit_predict(X_train, retall=True)
100%|██████████| 13/13 [00:09<00:00,  1.39it/s]

Optimal Clusters

In [6]: ks.show()

image

In [7]: ks.optimal_
Out[7]: 10

Confusion Matrix

In [8]: labels_mtx = (pd.Series(y_train)
   ...:               .groupby([labels, y_train])
   ...:               .count()
   ...:               .unstack()
   ...:               .fillna(0))
   ...: # match arbitrary labels to ground-truth labels
   ...: order = []
   ...: 
   ...: for i, r in labels_mtx.iterrows():
   ...:     left = [x for x in np.unique(y_train) if x not in order]
   ...:     order.append(r.iloc[left].idxmax())
   ...: 
   ...: confusion_mtx = labels_mtx[order]
   ...: confusion_mtx
Out[8]:
5 9 4 2 0 6 1 7 8 3
0 124.0 5.0 1.0 0.0 0.0 0.0 2.0 7.0 4.0 2.0
1 12.0 95.0 0.0 0.0 0.0 0.0 0.0 0.0 9.0 90.0
2 2.0 0.0 122.0 0.0 1.0 2.0 0.0 1.0 0.0 0.0
3 0.0 0.0 0.0 108.0 0.0 0.0 22.0 0.0 1.0 20.0
4 1.0 2.0 1.0 0.0 147.0 0.0 0.0 0.0 0.0 0.0
5 2.0 1.0 0.0 0.0 2.0 145.0 3.0 0.0 4.0 0.0
6 0.0 1.0 2.0 22.0 0.0 0.0 67.0 7.0 57.0 6.0
7 0.0 5.0 8.0 0.0 0.0 0.0 0.0 130.0 4.0 6.0
8 0.0 15.0 0.0 9.0 0.0 0.0 0.0 0.0 57.0 21.0
9 0.0 22.0 3.0 0.0 0.0 1.0 48.0 0.0 6.0 2.0

Cluster Unseen Data (you would prefer to build classifier instead)

In [9]: labels_unseen = ks.predict(X_test, init=centroids)

Evaluate Accuracy

In [10]: y_clustd = pd.Series(labels).replace(dict(enumerate(order)))
    ...: y_unseen = pd.Series(labels_unseen).replace(dict(enumerate(order)))
In [11]: balanced_accuracy_score(y_train, y_clustd)  # train data
Out[11]: 0.6940733254455871
In [12]: balanced_accuracy_score(y_test, y_unseen)  # unseen data
Out[12]: 0.646615365026082

ToDo:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kscorer-2.2.0.tar.gz (46.1 kB view details)

Uploaded Source

Built Distribution

kscorer-2.2.0-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file kscorer-2.2.0.tar.gz.

File metadata

  • Download URL: kscorer-2.2.0.tar.gz
  • Upload date:
  • Size: 46.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for kscorer-2.2.0.tar.gz
Algorithm Hash digest
SHA256 f831fd3502b02e48f73b79be6e62f7bcb44482e57ea32d69bb0a2175ead08be7
MD5 9e7900bfed2360f8bff199aedac6c0c4
BLAKE2b-256 ef1ef5090f265c209121b2b4df8551efa218668b74bdf9a8d150250bfd207bbf

See more details on using hashes here.

File details

Details for the file kscorer-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: kscorer-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for kscorer-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fbcb55bb887e4261e908c696d3e5a3a1d04c39ce437b7c79f071fc9e63b25a11
MD5 563508c23d3da28d945f224b03fbaf0c
BLAKE2b-256 2f645338696d1b42ed9a88abde63c5d2293f636b963bf9c9e2ff6daf33a26d7f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page