No project description provided
Project description
FLS++
An implementation of the FLS++ algorithm for k-means Clustering, as presented in [1]. This is an improvement of the LS++ algorithm presented by Lattanzi and Sohler in [2].
One can think of the algorithm as working in 3 phases:
- Initializing centers with k-means++ (as presented in [3]).
- Improving the center set by performing local search swaps (for details, see [1]).
- Converging the solution with LLoyd's algorithm (also known as "the" k-means algorithm).
In [2] it is shown that O(k log log k)
local search iterations yield a constant factor approximation. However, in both [1] and [2] it is shown that, in practice, a very small number of iterations (e.g. 20) already yields very good results, at very reasonable runtime.
The interface is built in the same way as scikit-learn's KMeans for better compatibility.
In the following plots, we compare the performance of FLS++ (GFLS++) with various local search steps (5, 10, 15) with k-means++ (kM++), greedy k-means++ (GkM++) and the local search algorithm [2] with 25 local search steps (GLS++). The results are computed for the KDD Phy Test and the Tower datasets and averaged over 50 runs.
References
[1] Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt and Melanie Schmidt. "Local Search k-means++ with Foresight" (2024)
[2] Silvio Lattanzi and Christian Sohler. A better k-means++ algorithm via local search. In Proc.444 of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 3662–3671.445 PMLR, 09–15 Jun 2019
[3] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In409 Proceedings of the 18th SODA, page 1027–1035, USA, 2007
Installation
pip install flspp
Example
from flspp import FLSpp
example_data = [
[1.0, 1.0, 1.0],
[1.1, 1.1, 1.1],
[1.2, 1.2, 1.2],
[2.0, 2.0, 2.0],
[2.1, 2.1, 2.1],
[2.2, 2.2, 2.2],
]
flspp = FLSpp(n_clusters=2)
labels = flspp.fit_predict(example_data)
centers = flspp.cluster_centers_
print(labels) # [1, 1, 1, 0, 0, 0]
print(centers) # [[2.1, 2.1, 2.1], [1.1, 1.1, 1.1]]
Development
Install poetry
curl -sSL https://install.python-poetry.org | python3 -
Install clang
sudo apt-get install clang
Set clang variables
export CXX=/usr/bin/clang++
export CC=/usr/bin/clang
Install the package
poetry install
If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace
poetry build
Run the tests
poetry run python -m unittest discover tests -v
Citation
If you use this code, please cite the following paper:
Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt and Melanie Schmidt. "Local Search k-means++ with Foresight" (2024). Accepted at SEA 2024.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for flspp-0.1.3-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74ece1c7e39db268d2bbaeb73dfd6a9caed6b25eed8e65aae01368a5829fe583 |
|
MD5 | 44ee35990764182b9dbf2933da79bf2c |
|
BLAKE2b-256 | 5d95410db5affddbbeddcc0f254c6365744d59bf09a3d753754a7fad0a12620f |