Skip to main content

Implementation of the FLS++ algorithm for K-Means clustering.

Project description

Build Status License: MIT Supported Python version Stable Version

FLS++

An implementation of the FLS++ algorithm for k-means Clustering, as presented in [1]. This is an improvement of the LS++ algorithm presented by Lattanzi and Sohler in [2].

You can try FLS++ out on our Clustering Toolkit!

One can think of the algorithm as working in 3 phases:

  1. Initializing centers with k-means++ (as presented in [3]).
  2. Improving the center set by performing local search swaps (for details, see [1]).
  3. Converging the solution with LLoyd's algorithm (also known as "the" k-means algorithm).

In [2] it is shown that O(k log log k) local search iterations yield a constant factor approximation. However, in both [1] and [2] it is shown that, in practice, a very small number of iterations (e.g. 20) already yields very good results, at very reasonable runtime.

The interface is built in the same way as scikit-learn's KMeans for better compatibility.

In the following plots, we compare the performance of FLS++ (GFLS++) with various local search steps (5, 10, 15) with k-means++ (kM++), greedy k-means++ (GkM++) and the local search algorithm [2] with 25 local search steps (GLS++). The results are computed for the KDD Phy Test and the Tower datasets and averaged over 50 runs.

Boxplot Comparison for FLS++

References

[1] Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt. Local Search k-means++ with Foresight. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

[2] Silvio Lattanzi and Christian Sohler. A better k-means++ algorithm via local search. In Proc.444 of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 3662–3671.445 PMLR, 09–15 Jun 2019

[3] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In409 Proceedings of the 18th SODA, page 1027–1035, USA, 2007

Installation

pip install flspp

Example

from flspp import FLSpp

example_data = [
    [1.0, 1.0, 1.0],
    [1.1, 1.1, 1.1],
    [1.2, 1.2, 1.2],
    [2.0, 2.0, 2.0],
    [2.1, 2.1, 2.1],
    [2.2, 2.2, 2.2],
]

flspp = FLSpp(n_clusters=2)
labels = flspp.fit_predict(example_data)
centers = flspp.cluster_centers_

print(labels) # [1, 1, 1, 0, 0, 0]
print(centers) # [[2.1, 2.1, 2.1], [1.1, 1.1, 1.1]]

Development

Install poetry

curl -sSL https://install.python-poetry.org | python3 -

Install clang

sudo apt-get install clang

Set clang variables

export CXX=/usr/bin/clang++
export CC=/usr/bin/clang

Install the package

poetry install

If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace

poetry build

Run the tests

poetry run python -m unittest discover tests -v

Citation

If you use this code, please cite the following paper:

[![Build Status](https://github.com/algo-hhu/FLSpp/actions/workflows/mypy-flake-test.yml/badge.svg)](https://github.com/algo-hhu/FLSpp/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Supported Python version](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
[![Stable Version](https://img.shields.io/pypi/v/flspp?label=stable)](https://pypi.org/project/flspp/)

# FLS++

An implementation of the FLS++ algorithm for k-means Clustering, as presented in [1]. This is an improvement of the LS++ algorithm presented by Lattanzi and Sohler in [2].

One can think of the algorithm as working in 3 phases:

1. Initializing centers with k-means++ (as presented in [3]).
2. Improving the center set by performing local search swaps (for details, see [1]).
3. Converging the solution with LLoyd's algorithm (also known as "the" k-means algorithm).

In [2] it is shown that `O(k log log k)` local search iterations yield a constant factor approximation. However, in both [1] and [2] it is shown that, in practice, a very small number of iterations (e.g. 20) already yields very good results, at very reasonable runtime.

The interface is built in the same way as scikit-learn's KMeans for better compatibility.

In the following plots, we compare the performance of FLS++ (GFLS++) with various local search steps (5, 10, 15) with k-means++ (kM++), greedy k-means++ (GkM++) and the local search algorithm [2] with 25 local search steps (GLS++). The results are computed for the [KDD Phy Test](https://www.kdd.org/kdd-cup/view/kdd-cup-2004/data) and the [Tower](https://www.worldscientific.com/doi/abs/10.1142/S0218195908002787) datasets and averaged over 50 runs.


<p align="center">
  <img src="https://raw.githubusercontent.com/algo-hhu/FLSpp/main/images/boxplots.png" alt="Boxplot Comparison for FLS++"/>
</p>

## References

[1] Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt. Local Search k-means++ with Foresight. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


[2] Silvio Lattanzi and Christian Sohler. A better k-means++ algorithm via local search. In Proc.444
of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 3662–3671.445
PMLR, 09–15 Jun 2019

[3] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In409
Proceedings of the 18th SODA, page 1027–1035, USA, 2007

## Installation

```bash
pip install flspp

Example

from flspp import FLSpp

example_data = [
    [1.0, 1.0, 1.0],
    [1.1, 1.1, 1.1],
    [1.2, 1.2, 1.2],
    [2.0, 2.0, 2.0],
    [2.1, 2.1, 2.1],
    [2.2, 2.2, 2.2],
]

flspp = FLSpp(n_clusters=2)
labels = flspp.fit_predict(example_data)
centers = flspp.cluster_centers_

print(labels) # [1, 1, 1, 0, 0, 0]
print(centers) # [[2.1, 2.1, 2.1], [1.1, 1.1, 1.1]]

Development

Install poetry

curl -sSL https://install.python-poetry.org | python3 -

Install clang

sudo apt-get install clang

Set clang variables

export CXX=/usr/bin/clang++
export CC=/usr/bin/clang

Install the package

poetry install

If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace

poetry build

Run the tests

poetry run python -m unittest discover tests -v

Citation

If you use this code, please cite the following paper:

T. Conrads, L. Drexler, J. Könen, D. R. Schmidt, and M. Schmidt, "Local Search k-means++ with Foresight," in 22nd International Symposium on Experimental Algorithms (SEA 2024), 2024, pp. 7:1–7:20. doi: 10.4230/LIPIcs.SEA.2024.7.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flspp-0.1.6.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

flspp-0.1.6-cp310-cp310-manylinux_2_35_x86_64.whl (539.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.35+ x86-64

File details

Details for the file flspp-0.1.6.tar.gz.

File metadata

  • Download URL: flspp-0.1.6.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1023-azure

File hashes

Hashes for flspp-0.1.6.tar.gz
Algorithm Hash digest
SHA256 dea575dae1f45b8ed7f9bc08ff854a540e0f1fed18890fcf9c673fd4cb13cd7b
MD5 fcedda0675b6f60c76e5d64c95e76b81
BLAKE2b-256 2c456a5c0c9b18cb5308ed8f96d78d5f386689762e8852b8c10ad17262e31fdd

See more details on using hashes here.

File details

Details for the file flspp-0.1.6-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

  • Download URL: flspp-0.1.6-cp310-cp310-manylinux_2_35_x86_64.whl
  • Upload date:
  • Size: 539.7 kB
  • Tags: CPython 3.10, manylinux: glibc 2.35+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1023-azure

File hashes

Hashes for flspp-0.1.6-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 51788697c1487fbab1ffe690fd51c828425a1db5ef8caae13beff097f49da9ad
MD5 baf3f6df95d5224988e9241c98bd331f
BLAKE2b-256 cb162905b43230e6d9ec3d96bad03dac5dc8ffcecd4024402d268eb27e929299

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page