Skip to main content

No project description provided

Project description

Build Status License: MIT Supported Python version Stable Version

FLS++

An implementation of the FLS++ algorithm for k-means Clustering, as presented in [1]. This is an improvement of the LS++ algorithm presented by Lattanzi and Sohler in [2].

One can think of the algorithm as working in 3 phases:

  1. Initializing centers with k-means++ (as presented in [3]).
  2. Improving the center set by performing local search swaps (for details, see [1]).
  3. Converging the solution with LLoyd's algorithm (also known as "the" k-means algorithm).

In [2] it is shown that O(k log log k) local search iterations yield a constant factor approximation. However, in both [1] and [2] it is shown that, in practice, a very small number of iterations (e.g. 20) already yields very good results, at very reasonable runtime.

The interface is built in the same way as scikit-learn's KMeans for better compatibility.

In the following plots, we compare the performance of FLS++ (GFLS++) with various local search steps (5, 10, 15) with k-means++ (kM++), greedy k-means++ (GkM++) and the local search algorithm [2] with 25 local search steps (GLS++). The results are computed for the KDD Phy Test and the Tower datasets and averaged over 50 runs.

Boxplot Comparison for FLS++

References

[1] Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt and Melanie Schmidt. "Local Search k-means++ with Foresight" (2024)

[2] Silvio Lattanzi and Christian Sohler. A better k-means++ algorithm via local search. In Proc.444 of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 3662–3671.445 PMLR, 09–15 Jun 2019

[3] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In409 Proceedings of the 18th SODA, page 1027–1035, USA, 2007

Installation

pip install flspp

Example

from flspp import FLSpp

example_data = [
    [1.0, 1.0, 1.0],
    [1.1, 1.1, 1.1],
    [1.2, 1.2, 1.2],
    [2.0, 2.0, 2.0],
    [2.1, 2.1, 2.1],
    [2.2, 2.2, 2.2],
]

flspp = FLSpp(n_clusters=2)
labels = flspp.fit_predict(example_data)
centers = flspp.cluster_centers_

print(labels) # [1, 1, 1, 0, 0, 0]
print(centers) # [[2.1, 2.1, 2.1], [1.1, 1.1, 1.1]]

Development

Install poetry

curl -sSL https://install.python-poetry.org | python3 -

Install clang

sudo apt-get install clang

Set clang variables

export CXX=/usr/bin/clang++
export CC=/usr/bin/clang

Install the package

poetry install

If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace

poetry build

Run the tests

poetry run python -m unittest discover tests -v

Citation

If you use this code, please cite the following paper:

Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt and Melanie Schmidt. "Local Search k-means++ with Foresight" (2024). Accepted at SEA 2024.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flspp-0.1.3.tar.gz (24.2 kB view hashes)

Uploaded Source

Built Distribution

flspp-0.1.3-cp310-cp310-manylinux_2_35_x86_64.whl (537.1 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.35+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page