Skip to main content

kMSR provides a selection of algorithms to solve the k-Min-Sum-Radii problem.

Project description

Build Status License: GPL v3 Supported Python version Stable Version

K-Min-Sum-Radii

kMSR provides various implementations to solve the k-Min-Sum-Radii problem. The k-Min-Sum-Radii problem is a clustering problem that aims to minimize the sum of the radii of the clusters. Given a set of points, the aim is to find $k$ balls such that the the sum of the radii of the balls is minimized. The package provides the following algorithms:

  • schmidt: The algorithm is described in this paper. In practice, this works well for clusters that are not too separated. The algorithm uses the parameters epsilon, n_u, and num_test_radii to control the trade-off between the quality of the solution and the runtime. Increase n_u for a more accurate solution.
  • heuristic: The algorithm is a simple heuristic that explores all possible combinations for the first cluster, and then selects the remaining centers as the points farthest from the radius of the first cluster. This algorithm works well in practice but it not practical for large datasets.
  • gonzales: This is the standard Gonzales algorithm for $k$-center.
  • kmeans: This is the k-means++ algorithm for $k$-means.

Although the last two algorithms are not specifically designed for the k-Min-Sum-Radii problem, they serve as useful baselines for comparing the performance of other algorithms. Additionally, an optimization unique to this problem has been integrated into all algorithms: intersecting balls are merged at the end, resulting in a more cost-effective solution.

You can try kMSR out on our Clustering Toolkit!

Installation

We highly recommend to install OpenMP. Parts of the code are parallelized and will be much faster. However, on Windows and MacOS the code also works without OpenMP. Nonetheless, the code was written for Linux and will achieve the best performance there.

On Linux, you can use the following command:

# Alpine
sudo apk add openmp-dev libgomp
# Ubuntu
sudo apt-get install libomp-dev libgomp1
# Debian
sudo apt-get install gcc libomp-dev libomp5 libgomp1
# ArchLinux
sudo pacnam -S openmp
ln -s libomp.so libomp.so.5

We have tested this on docker, so it might be a little different on your system.

On MacOS, you can use the following command:

 brew install llvm libomp

However, it might be that MacOS does not find the installed library. In build_extension.py, the paths are set manually. If it does not work for you, please clone the repository and run poetry build. You you see a message in red if your OpenMP is not found.

Then, you can install the package via pip:

pip install kmsr

Example

from kmsr import KMSR
from kmsr.plot import plot_multiple_results
from sklearn.datasets import make_blobs
from time import time

points, ground_truth = make_blobs(
    n_samples=100,
    n_features=2,
    centers=2,
    cluster_std=0.6,
    shuffle=True,
    random_state=42,
)

labels = []
centers = []
radii = []
titles = []
for algo in ["Schmidt", "Heuristic", "Gonzales", "KMeans"]:
    kmsr = KMSR(
        n_clusters=5,
        algorithm=algo,
        epsilon=0.5,
        n_u=10000,
        n_test_radii=10,
        random_state=42,
    )
    start = time()
    kmsr.fit(points)
    end = time() - start
    labels.append(kmsr.labels_)
    centers.append(kmsr.cluster_centers_)
    radii.append(kmsr.cluster_radii_)
    titles.append(f"{algo}: {sum(kmsr.cluster_radii_):.3f}, Time: {end:.3f}s")

plot_multiple_results(
    points,
    clusters=labels,
    centers=centers,
    radii=radii,
    title=titles,
)

Comparison of the Different Methods for kMSR

Development

Install poetry

curl -sSL https://install.python-poetry.org | python3 -

Install clang

sudo apt-get install clang

Set clang variables

export CXX=/usr/bin/clang++
export CC=/usr/bin/clang

Install the package

poetry install

If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace

poetry build

Run the tests

poetry run python -m unittest discover tests -v

Citation

If you use this code, please cite the following bachelor thesis:

N. Lenßen, "Experimentelle Analyse von Min-Sum-Radii Approximationsalgorithmen". Bachelorarbeit, Heinrich-Heine-Universität Düsseldorf, 2024.

Moreover, depending on the selection of the algorithm parameter, you should also cite the following paper for algorithm='schmidt':

L. Drexler, A. Hennes, A. Lahiri, M. Schmidt, and J. Wargalla, "Approximating Fair K-Min-Sum-Radii in Euclidean Space," in Lecture notes in computer science, 2023, pp. 119–133. doi: 10.1007/978-3-031-49815-2_9.

the following paper for algorithm='gonzales':

T. F. Gonzalez, "Clustering to minimize the maximum intercluster distance," Theoretical Computer Science, vol. 38, pp. 293–306, Jan. 1985, doi: 10.1016/0304-3975(85)90224-5.

and the following paper for algorithm='kmeans':

D. Arthur and S. Vassilvitskii, "k-means++: the advantages of careful seeding," Symposium on Discrete Algorithms, pp. 1027–1035, Jan. 2007, doi: 10.5555/1283383.1283494.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmsr-0.1.0.tar.gz (37.5 kB view details)

Uploaded Source

Built Distribution

kmsr-0.1.0-cp310-cp310-manylinux_2_35_x86_64.whl (775.8 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.35+ x86-64

File details

Details for the file kmsr-0.1.0.tar.gz.

File metadata

  • Download URL: kmsr-0.1.0.tar.gz
  • Upload date:
  • Size: 37.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1023-azure

File hashes

Hashes for kmsr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 91a1c004b9b5a7443c7d3d6fa9df180b0d0bc911283c6c7dfd761764d80e55cc
MD5 08e2cf159dd616fe9ea28e1ecaaedc2b
BLAKE2b-256 731d22ba1f492bcf6b0cccedeee825e681057f89bcd59fe005f5b0df4f7fcb1b

See more details on using hashes here.

File details

Details for the file kmsr-0.1.0-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

  • Download URL: kmsr-0.1.0-cp310-cp310-manylinux_2_35_x86_64.whl
  • Upload date:
  • Size: 775.8 kB
  • Tags: CPython 3.10, manylinux: glibc 2.35+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1023-azure

File hashes

Hashes for kmsr-0.1.0-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 e2d6ea84ca7a6da2709bee1522ab89235d39ae0a29b1e59fb25f5ff9308f2e6d
MD5 cf2645a140518d89bbb32800c448212f
BLAKE2b-256 8690cc1cfe4db07d16b96bd8a2e6eb6f652b88278f97d070078d09dfa1cb9618

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page