A Python package for optimal 1D k-means clustering
Project description
kmeans1d
A Python library with an implementation of k-means clustering on 1D data, based on the algorithm from Xiaolin (1991), as presented by Gronlund et al. (2017, Section 2.2).
Globally optimal k-means clustering is NP-hard for multi-dimensional data. Lloyd's algorithm is a popular approach for finding a locally optimal solution. For 1-dimensional data, there are polynomial time algorithms. The algorithm implemented here is an O(kn + n log n) dynamic programming algorithm for finding the globally optimal k clusters for n 1D data points.
The code is written in C++, and wrapped with Python.
Requirements
kmeans1d supports Python 3.x.
Installation
kmeans1d is available on PyPI, the Python Package Index.
$ pip3 install kmeans1d
Example Usage
import kmeans1d
x = [4.0, 4.1, 4.2, -50, 200.2, 200.4, 200.9, 80, 100, 102]
k = 4
clusters, centroids = kmeans1d.cluster(x, k)
print(clusters) # [1, 1, 1, 0, 3, 3, 3, 2, 2, 2]
print(centroids) # [-50.0, 4.1, 94.0, 200.5]
Tests
Tests are in tests/.
# Run tests
$ python3 -m unittest discover tests -v
Development
The underlying C++ code can be built in-place, outside the context of pip
. This requires Python
development tools for building Python modules (e.g., the python3-dev
package on Ubuntu). gcc
,
clang
, and MSVC
have been tested.
$ python3 setup.py build_ext --inplace
The packages
GitHub action can be manually triggered (Actions
> packages
> Run workflow
) to build wheels
and a source distribution.
License
The code in this repository has an MIT License.
See LICENSE.
References
[1] Wu, Xiaolin. "Optimal Quantization by Matrix Searching." Journal of Algorithms 12, no. 4 (December 1, 1991): 663
[2] Gronlund, Allan, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, and Mingzhou Song. "Fast Exact K-Means, k-Medians and Bregman Divergence Clustering in 1D." ArXiv:1701.07204 [Cs], January 25, 2017. http://arxiv.org/abs/1701.07204.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for kmeans1d-0.4.0-cp32-abi3-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5943d68f6514309023ace23d8ad3f43703871e87a021e8024cdf4db79ed2f2e1 |
|
MD5 | a327d8ece92b60bb366631897a8346d4 |
|
BLAKE2b-256 | d55adb6e1f27d31db470e53a6b70009216891050a7396a70584c9600a5b98bd4 |
Hashes for kmeans1d-0.4.0-cp32-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad8017fdb82746c718215695b4290e9bd6ded0305740e2fe2b1ede697a14e947 |
|
MD5 | 2ad01bfbbc32032aebc10b6c03f7501d |
|
BLAKE2b-256 | ba0c63c31443d391f5452aa21e7d5e10106de06cd6ea42bf0a0689b01900139f |
Hashes for kmeans1d-0.4.0-cp32-abi3-macosx_11_0_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a052bca5bbacbd70a7f4846896439ebed29e96a2b04115c1fec480555cad70d |
|
MD5 | 27cc446ada593465911564900b00f7cf |
|
BLAKE2b-256 | 191957056e4566611f5da2166c52a9f88185c177a224d0ecab1dd632cffa1c88 |