Skip to main content

Implementation of the FLS++ algorithm for K-Means clustering.

Project description

Build Status License: MIT Supported Python version Stable Version

FLS++

An implementation of the FLS++ algorithm for k-means Clustering, as presented in [1]. This is an improvement of the LS++ algorithm presented by Lattanzi and Sohler in [2].

You can try FLS++ out on our Clustering Toolkit!

One can think of the algorithm as working in 3 phases:

  1. Initializing centers with k-means++ (as presented in [3]).
  2. Improving the center set by performing local search swaps (for details, see [1]).
  3. Converging the solution with LLoyd's algorithm (also known as "the" k-means algorithm).

In [2] it is shown that O(k log log k) local search iterations yield a constant factor approximation. However, in both [1] and [2] it is shown that, in practice, a very small number of iterations (e.g. 20) already yields very good results, at very reasonable runtime.

The interface is built in the same way as scikit-learn's KMeans for better compatibility.

In the following plots, we compare the performance of FLS++ (GFLS++) with various local search steps (5, 10, 15) with k-means++ (kM++), greedy k-means++ (GkM++) and the local search algorithm [2] with 25 local search steps (GLS++). The results are computed for the KDD Phy Test and the Tower datasets and averaged over 50 runs.

Boxplot Comparison for FLS++

References

[1] Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt. Local Search k-means++ with Foresight. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

[2] Silvio Lattanzi and Christian Sohler. A better k-means++ algorithm via local search. In Proc.444 of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 3662–3671.445 PMLR, 09–15 Jun 2019

[3] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In409 Proceedings of the 18th SODA, page 1027–1035, USA, 2007

Installation

pip install flspp

Example

from flspp import FLSpp

example_data = [
    [1.0, 1.0, 1.0],
    [1.1, 1.1, 1.1],
    [1.2, 1.2, 1.2],
    [2.0, 2.0, 2.0],
    [2.1, 2.1, 2.1],
    [2.2, 2.2, 2.2],
]

flspp = FLSpp(n_clusters=2)
labels = flspp.fit_predict(example_data)
centers = flspp.cluster_centers_

print(labels) # [1, 1, 1, 0, 0, 0]
print(centers) # [[2.1, 2.1, 2.1], [1.1, 1.1, 1.1]]

Development

Install poetry

curl -sSL https://install.python-poetry.org | python3 -

Install clang

sudo apt-get install clang

Set clang variables

export CXX=/usr/bin/clang++
export CC=/usr/bin/clang

Install the package

poetry install

If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace

poetry build

Run the tests

poetry run python -m unittest discover tests -v

Citation

If you use this code, please cite the following paper:

T. Conrads, L. Drexler, J. Könen, D. R. Schmidt, and M. Schmidt, "Local Search k-means++ with Foresight," in 22nd International Symposium on Experimental Algorithms (SEA 2024), 2024, pp. 7:1–7:20. doi: 10.4230/LIPIcs.SEA.2024.7.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flspp-0.1.7.tar.gz (100.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flspp-0.1.7-cp310-cp310-manylinux_2_39_x86_64.whl (510.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

File details

Details for the file flspp-0.1.7.tar.gz.

File metadata

  • Download URL: flspp-0.1.7.tar.gz
  • Upload date:
  • Size: 100.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for flspp-0.1.7.tar.gz
Algorithm Hash digest
SHA256 071a4aa62252a1907ebdda42402acfa75976e7df9ddc7955c120638a770fc77e
MD5 69438b957eb58b2cec7a19f1510947a2
BLAKE2b-256 45c8519f099d607d804602c6b1eb6f803cc734f45acdc7551d5527899f9bbc74

See more details on using hashes here.

File details

Details for the file flspp-0.1.7-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

  • Download URL: flspp-0.1.7-cp310-cp310-manylinux_2_39_x86_64.whl
  • Upload date:
  • Size: 510.9 kB
  • Tags: CPython 3.10, manylinux: glibc 2.39+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for flspp-0.1.7-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 a18e6e0e74b2c6080fb00b936780c1917206a68107964a5fa5be666346234395
MD5 11d2a0efc872df86a1a0e240c6bb8a8a
BLAKE2b-256 de2567338fc47702dcafe70621c994b2f337f3b7682712041c9704de08dd388c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page