Implementation of the FLS++ algorithm for K-Means clustering.
Project description
FLS++
An implementation of the FLS++ algorithm for k-means Clustering, as presented in [1]. This is an improvement of the LS++ algorithm presented by Lattanzi and Sohler in [2].
You can try FLS++ out on our Clustering Toolkit!
One can think of the algorithm as working in 3 phases:
- Initializing centers with k-means++ (as presented in [3]).
- Improving the center set by performing local search swaps (for details, see [1]).
- Converging the solution with LLoyd's algorithm (also known as "the" k-means algorithm).
In [2] it is shown that O(k log log k)
local search iterations yield a constant factor approximation. However, in both [1] and [2] it is shown that, in practice, a very small number of iterations (e.g. 20) already yields very good results, at very reasonable runtime.
The interface is built in the same way as scikit-learn's KMeans for better compatibility.
In the following plots, we compare the performance of FLS++ (GFLS++) with various local search steps (5, 10, 15) with k-means++ (kM++), greedy k-means++ (GkM++) and the local search algorithm [2] with 25 local search steps (GLS++). The results are computed for the KDD Phy Test and the Tower datasets and averaged over 50 runs.
References
[1] Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt. Local Search k-means++ with Foresight. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
[2] Silvio Lattanzi and Christian Sohler. A better k-means++ algorithm via local search. In Proc.444 of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 3662–3671.445 PMLR, 09–15 Jun 2019
[3] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In409 Proceedings of the 18th SODA, page 1027–1035, USA, 2007
Installation
pip install flspp
Example
from flspp import FLSpp
example_data = [
[1.0, 1.0, 1.0],
[1.1, 1.1, 1.1],
[1.2, 1.2, 1.2],
[2.0, 2.0, 2.0],
[2.1, 2.1, 2.1],
[2.2, 2.2, 2.2],
]
flspp = FLSpp(n_clusters=2)
labels = flspp.fit_predict(example_data)
centers = flspp.cluster_centers_
print(labels) # [1, 1, 1, 0, 0, 0]
print(centers) # [[2.1, 2.1, 2.1], [1.1, 1.1, 1.1]]
Development
Install poetry
curl -sSL https://install.python-poetry.org | python3 -
Install clang
sudo apt-get install clang
Set clang variables
export CXX=/usr/bin/clang++
export CC=/usr/bin/clang
Install the package
poetry install
If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace
poetry build
Run the tests
poetry run python -m unittest discover tests -v
Citation
If you use this code, please cite the following paper:
[![Build Status](https://github.com/algo-hhu/FLSpp/actions/workflows/mypy-flake-test.yml/badge.svg)](https://github.com/algo-hhu/FLSpp/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Supported Python version](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
[![Stable Version](https://img.shields.io/pypi/v/flspp?label=stable)](https://pypi.org/project/flspp/)
# FLS++
An implementation of the FLS++ algorithm for k-means Clustering, as presented in [1]. This is an improvement of the LS++ algorithm presented by Lattanzi and Sohler in [2].
One can think of the algorithm as working in 3 phases:
1. Initializing centers with k-means++ (as presented in [3]).
2. Improving the center set by performing local search swaps (for details, see [1]).
3. Converging the solution with LLoyd's algorithm (also known as "the" k-means algorithm).
In [2] it is shown that `O(k log log k)` local search iterations yield a constant factor approximation. However, in both [1] and [2] it is shown that, in practice, a very small number of iterations (e.g. 20) already yields very good results, at very reasonable runtime.
The interface is built in the same way as scikit-learn's KMeans for better compatibility.
In the following plots, we compare the performance of FLS++ (GFLS++) with various local search steps (5, 10, 15) with k-means++ (kM++), greedy k-means++ (GkM++) and the local search algorithm [2] with 25 local search steps (GLS++). The results are computed for the [KDD Phy Test](https://www.kdd.org/kdd-cup/view/kdd-cup-2004/data) and the [Tower](https://www.worldscientific.com/doi/abs/10.1142/S0218195908002787) datasets and averaged over 50 runs.
<p align="center">
<img src="https://raw.githubusercontent.com/algo-hhu/FLSpp/main/images/boxplots.png" alt="Boxplot Comparison for FLS++"/>
</p>
## References
[1] Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt. Local Search k-means++ with Foresight. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
[2] Silvio Lattanzi and Christian Sohler. A better k-means++ algorithm via local search. In Proc.444
of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 3662–3671.445
PMLR, 09–15 Jun 2019
[3] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In409
Proceedings of the 18th SODA, page 1027–1035, USA, 2007
## Installation
```bash
pip install flspp
Example
from flspp import FLSpp
example_data = [
[1.0, 1.0, 1.0],
[1.1, 1.1, 1.1],
[1.2, 1.2, 1.2],
[2.0, 2.0, 2.0],
[2.1, 2.1, 2.1],
[2.2, 2.2, 2.2],
]
flspp = FLSpp(n_clusters=2)
labels = flspp.fit_predict(example_data)
centers = flspp.cluster_centers_
print(labels) # [1, 1, 1, 0, 0, 0]
print(centers) # [[2.1, 2.1, 2.1], [1.1, 1.1, 1.1]]
Development
Install poetry
curl -sSL https://install.python-poetry.org | python3 -
Install clang
sudo apt-get install clang
Set clang variables
export CXX=/usr/bin/clang++
export CC=/usr/bin/clang
Install the package
poetry install
If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace
poetry build
Run the tests
poetry run python -m unittest discover tests -v
Citation
If you use this code, please cite the following paper:
T. Conrads, L. Drexler, J. Könen, D. R. Schmidt, and M. Schmidt, "Local Search k-means++ with Foresight," in 22nd International Symposium on Experimental Algorithms (SEA 2024), 2024, pp. 7:1–7:20. doi: 10.4230/LIPIcs.SEA.2024.7.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file flspp-0.1.6.tar.gz
.
File metadata
- Download URL: flspp-0.1.6.tar.gz
- Upload date:
- Size: 25.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1023-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dea575dae1f45b8ed7f9bc08ff854a540e0f1fed18890fcf9c673fd4cb13cd7b |
|
MD5 | fcedda0675b6f60c76e5d64c95e76b81 |
|
BLAKE2b-256 | 2c456a5c0c9b18cb5308ed8f96d78d5f386689762e8852b8c10ad17262e31fdd |
File details
Details for the file flspp-0.1.6-cp310-cp310-manylinux_2_35_x86_64.whl
.
File metadata
- Download URL: flspp-0.1.6-cp310-cp310-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 539.7 kB
- Tags: CPython 3.10, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1023-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51788697c1487fbab1ffe690fd51c828425a1db5ef8caae13beff097f49da9ad |
|
MD5 | baf3f6df95d5224988e9241c98bd331f |
|
BLAKE2b-256 | cb162905b43230e6d9ec3d96bad03dac5dc8ffcecd4024402d268eb27e929299 |