Skip to main content

The breathing k-means algorithm

Project description

The Breathing K-Means Algorithm

An approximation algorithm for the k-means problem that (on average) is better (higher solution quality) and faster (lower CPU time usage) than k-means++.

Techreport: https://arxiv.org/abs/2006.15666 (submitted for publication)

Repo (with examples): https://github.com/gittar/breathing-k-means

API

The included class BKMeans is subclassed from scikit-learn's KMeans class and has, therefore, the same API. It can be used as a plug-in replacement for scikit-learn's KMeans.

There is one new parameters which can be ignored (left at default) for normal usage:

  • m (breathing depth), default: 5

The parameter m can also be used, however, to generate faster ( 1 < m < 5) or better (m>5) solutions. For details see the above techreport.

Installation

pip install bkmeans

Example 1: running on simple random data set

Code:

import numpy as np
from bkmeans import BKMeans

# generate random data set
X=np.random.rand(1000,2)

# create BKMeans instance
bkm = BKMeans(n_clusters=100)

# run the algorithm
bkm.fit(X)

# print SSE (inertia in scikit-learn terms)
print(bkm.inertia_)

Output:

1.1775040547902602

Example 2: comparison with k-means++ (multiple runs)

Code:

import numpy as np
from sklearn.cluster import KMeans
from bkmeans import BKMeans

# random 2D data set
X=np.random.rand(1000,2)

# number of centroids
k=100

for i in range(5):
    # kmeans++
    kmp = KMeans(n_clusters=k)
    kmp.fit(X)

    # breathing k-means
    bkm = BKMeans(n_clusters=k)
    bkm.fit(X)

    # relative SSE improvement of bkm over km++
    imp = 1 - bkm.inertia_/kmp.inertia_
    print(f"SSE improvement over k-means++: {imp:.2%}")

Output:

SSE improvement over k-means++: 3.38%
SSE improvement over k-means++: 4.16%
SSE improvement over k-means++: 6.14%
SSE improvement over k-means++: 6.79%
SSE improvement over k-means++: 4.76%

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bkmeans-1.1.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bkmeans-1.1-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file bkmeans-1.1.tar.gz.

File metadata

  • Download URL: bkmeans-1.1.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.10

File hashes

Hashes for bkmeans-1.1.tar.gz
Algorithm Hash digest
SHA256 afa094841428fdf58254fc7b442747d96a811ad40a7b30155a90bd4ba69342b2
MD5 80494a64338b200e7f7c305defa459b2
BLAKE2b-256 928a8c18822ee27ed2010cd095d57b98c8f2527d8005c5a0fc9f4b536120b12b

See more details on using hashes here.

File details

Details for the file bkmeans-1.1-py3-none-any.whl.

File metadata

  • Download URL: bkmeans-1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.10

File hashes

Hashes for bkmeans-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 963864bb552b79305cadce767df5de55cf674b6fc7c109be66a2db1b019fa4c2
MD5 ff0e99e61c37e92e494632f54c33e3e2
BLAKE2b-256 09ac0a7c3f2c964cf5eb53c222422046fe14d4c51c40f80aaad28fba0f7677a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page