Skip to main content

A python package for optimal 1d k-means clustering.

Project description

build

fast1dkmeans

A Python library which implements several variations of optimal k-means clustering on 1D data, based on the algorithms presented by Gronlund et al. (2017). This package is inspired by the kmeans1d package but extends it by implementing additional algorithms, in particular those with reduced memory requirements O(n) instead of O(kn).

There are several different ways to compute the optimal k-means clustering in 1d. Currently the package implements the following methods:

  • "binary-search-interpolation" default [O(n lg(U) ), O(n) space, "wilber-interpolation"]
  • "dynamic-programming-kn" [O(kn), O(kn) space]
  • "dynamic-programming-space" [O(kn), O(n) space, "dp-linear"]
  • "binary-search-normal" [O(n lg(U) ), O(n) space, section 2.4, "wilber-binary"]

The code is written in Python and relies on the numba compiler for speed.

Requirements

fast1dkmeans relies on numpy and numba which currently support python 3.8-3.10.

Installation

fast1dkmeans is available on PyPI, the Python Package Index.

$ pip3 install fast1dkmeans

Example Usage

import fast1dkmeans

x = [4.0, 4.1, 4.2, -50, 200.2, 200.4, 200.9, 80, 100, 102]
k = 4

clusters = fast1dkmeans.cluster(x, k)

print(clusters)   # [1, 1, 1, 0, 3, 3, 3, 2, 2, 2]

Important notice: On first usage the the code is compiled once which may take about 30s. On subsequent usages this is no longer necessary and execution is much faster.

Tests

Tests are in tests/.

# Run tests
$ python3 -m pytest .

License

The code in this repository has an BSD 2-Clause "Simplified" License.

See LICENSE.

References

[1] Gronlund, Allan, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, and Mingzhou Song. "Fast Exact K-Means, k-Medians and Bregman Divergence Clustering in 1D." ArXiv:1701.07204 [Cs], January 25, 2017. http://arxiv.org/abs/1701.07204.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast1dkmeans-0.1.2.tar.gz (14.2 kB view hashes)

Uploaded Source

Built Distribution

fast1dkmeans-0.1.2-py3-none-any.whl (20.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page