Skip to main content

A Python package for Optimal 1D Clustering

Project description

Build Status

optimal1dclustering

Add k-medians and add a minimum cluster size parameter to https://github.com/dstein64/kmeans1d.

A Python library with an implementation of k-means and k-medians clustering on 1D data, based on the algorithm from Xiaolin (1991), as presented by Gronlund et al. (2017, Section 2.2).

The algorithm implemented in this library can also find the optimal clustering when clusters are required to have a minimum cluster size. It still finds the optimal clustering as the cost function is still monge concave.

Globally optimal clustering is NP-hard for multi-dimensional data. Lloyd's algorithm is a popular approach for finding a locally optimal solution. For 1-dimensional data, there are polynomial time algorithms. The algorithm implemented here is an O(kn + n log n) dynamic programming algorithm for finding the globally optimal k clusters for n 1D data points.

The code is written in C++, and wrapped with Python.

Requirements

optimal1dclustering supports Python 3.x.

Installation

optimal1dclustering is available on PyPI, the Python Package Index.

$ pip3 install optimal1dclustering

Example Usage

import optimal1dclustering

x = [4.0, 4.1, 4.2, -50, 200.2, 200.4, 200.9, 80, 100, 102]
k = 4 # Number of clusters
min_cluster_size = 2 # The minimum number of elements in each cluster
mode = 2 # 2 for k-means; 1 for k-medians

clusters, centroids = optimal1dclustering.cluster(x, k, min_cluster_size = min_cluster_size, mode = mode)

print(clusters)   # [0, 1, 1, 0, 3, 3, 3, 2, 2, 2]
print(centroids)  # [-23.0, 4.15, 94.0, 200.5]

Tests

Tests are in tests/.

# Run tests
$ python3 -m unittest discover tests -v

Development

The underlying C++ code can be built in-place, outside the context of pip. This requires Python development tools for building Python modules (e.g., the python3-dev package on Ubuntu). gcc, clang, and MSVC have been tested.

$ python3 setup.py build_ext --inplace

The packages GitHub action can be manually triggered (Actions > packages > Run workflow) to build wheels and a source distribution.

License

The code in this repository has an MIT License.

See LICENSE.

References

[1] Wu, Xiaolin. "Optimal Quantization by Matrix Searching." Journal of Algorithms 12, no. 4 (December 1, 1991): 663

[2] Gronlund, Allan, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, and Mingzhou Song. "Fast Exact K-Means, k-Medians and Bregman Divergence Clustering in 1D." ArXiv:1701.07204 [Cs], January 25, 2017. http://arxiv.org/abs/1701.07204.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimal1dclustering-1.0.2.tar.gz (10.4 kB view details)

Uploaded Source

Built Distributions

optimal1dclustering-1.0.2-cp312-cp312-win_amd64.whl (20.0 kB view details)

Uploaded CPython 3.12 Windows x86-64

optimal1dclustering-1.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (136.5 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

optimal1dclustering-1.0.2-cp312-cp312-macosx_10_9_universal2.whl (30.9 kB view details)

Uploaded CPython 3.12 macOS 10.9+ universal2 (ARM64, x86-64)

optimal1dclustering-1.0.2-cp311-cp311-win_amd64.whl (20.0 kB view details)

Uploaded CPython 3.11 Windows x86-64

optimal1dclustering-1.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (136.8 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

optimal1dclustering-1.0.2-cp311-cp311-macosx_10_9_universal2.whl (30.9 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

optimal1dclustering-1.0.2-cp310-cp310-win_amd64.whl (20.0 kB view details)

Uploaded CPython 3.10 Windows x86-64

optimal1dclustering-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (136.1 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

optimal1dclustering-1.0.2-cp310-cp310-macosx_10_9_universal2.whl (30.9 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

optimal1dclustering-1.0.2-cp39-cp39-win_amd64.whl (20.0 kB view details)

Uploaded CPython 3.9 Windows x86-64

optimal1dclustering-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (135.9 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

optimal1dclustering-1.0.2-cp39-cp39-macosx_10_9_universal2.whl (30.9 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

optimal1dclustering-1.0.2-cp38-cp38-win_amd64.whl (20.0 kB view details)

Uploaded CPython 3.8 Windows x86-64

optimal1dclustering-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (136.1 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

optimal1dclustering-1.0.2-cp38-cp38-macosx_11_0_universal2.whl (30.9 kB view details)

Uploaded CPython 3.8 macOS 11.0+ universal2 (ARM64, x86-64)

optimal1dclustering-1.0.2-cp37-cp37m-win_amd64.whl (20.0 kB view details)

Uploaded CPython 3.7m Windows x86-64

optimal1dclustering-1.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (137.1 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

optimal1dclustering-1.0.2-cp36-cp36m-win_amd64.whl (20.1 kB view details)

Uploaded CPython 3.6m Windows x86-64

optimal1dclustering-1.0.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (136.2 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

File details

Details for the file optimal1dclustering-1.0.2.tar.gz.

File metadata

  • Download URL: optimal1dclustering-1.0.2.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.4

File hashes

Hashes for optimal1dclustering-1.0.2.tar.gz
Algorithm Hash digest
SHA256 bf0f8e57d33d183a6fe1f7ed44a9c1022c9ee598b958d24144ed66f363d58dab
MD5 118f1f6796ce66c894cddc7564919b5b
BLAKE2b-256 1370f6948bcb96d1085bf305e4107faabbeb1758e400a7f7b5398fa19233d7d6

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5aece70e9277dc3091af8037798ea9e709cf3695cf24ed31ae2534fd39220ff7
MD5 6bff0e12ef6ec9a877d32929be338297
BLAKE2b-256 d8a897264883ae51ce0492833c24e167110a1726f92eafa7f72663bef3dc63a2

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7bcefe38eaafdd174564594e2f40225708e7821509d6e63396f7c06ea561e116
MD5 89be713b3117003f9a50b7a0aea4ba06
BLAKE2b-256 9391d189bde19e970e89eb77f6d3e9392c21c91704ac91707eeaa36c4b1ce2e0

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp312-cp312-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp312-cp312-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 63ce94f120a9f9523b96635c86c8c489e89362ccf3c0fc02452c896776a91da0
MD5 3073d5deb0cbcddec91191c0b56d031e
BLAKE2b-256 e5e33856dcc603fc08770c36c2314c071fe4f5a928869916469d0fcba2671dfb

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 ae581796ff7543a4166da294c955a013069a204f3ba288bfe93fbd6b63af7dcf
MD5 a5b5c548a8c9bf45ca173b7e88c054b7
BLAKE2b-256 26c8e842d76cc168d8fb3d8461382368526dacfb7a6b6d017801ab6da6c16404

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bd35d43d645b3146323821935a6108ecd10f9fd31882e1c2c8f1309a22368e23
MD5 f9ac220ce104ef5e3066315fa4cbb0fd
BLAKE2b-256 e85ff0c4b30a2f9fce94e2e1c6572ad6bedad52ee2464891e56a1f6d9f646047

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 14a4e7d676bb9febb3d5002241f3c1a6c3d04e0910a1772cc0b53614451329e2
MD5 921c6642cca4ae86f6ec6f2ae907ee3c
BLAKE2b-256 59945970b4b2f7b4b0d6fa1b42080a2f3cad34ae3d9447019741f0cfcd0a2397

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d3274841cb1c76091c1ca1b2d417317032de541fc24c26ca3ce0496c9785f2c3
MD5 d5789e5b1f5b99d872c75b9fe16606b0
BLAKE2b-256 87cae11d9d3bbae32ccde0903aa0980651256a9563e3e182a9a64b6556bcf280

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3df7fc60a2fdd37d4326360008ccbd90b515d8543f054df558f04425007d3d2c
MD5 f717864e3ed493b2caa1fda6a339911e
BLAKE2b-256 95e9e36d436ef367e1b08f2858fe7fc7832ff0d9a5a8b6c163d445fc95ad6ae4

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 732e6a3e623e433aab5e4e431327c4303fbf5d29c84512f9d219bf47481d4ced
MD5 ef892974e3e4e4acdc18fc5c5379dfcc
BLAKE2b-256 f18a86a09f402c8a45b4667ed8d48d0dd5008c7e5c621e1801e498a27324bf70

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 867ca7481d74ac3bc7ef84acb9ba73fc54624e8ba23d257a3d8b8a86398db823
MD5 fe0035f23bd91e1cc4567e40b218dc24
BLAKE2b-256 eff132db9d536a1f734d411d1cd7af8e952dfcb42beb9086b830a6c2ed75db24

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 de475b17a76f5a9155b5da7bb04a92da07f68e9ccf10008fe797ea6b74cb092e
MD5 1f0fbe75820ea0c9b82ef98510e085d1
BLAKE2b-256 f0ee02175fc043cc037754d86b7d2bbfaff5e06fe6bc191b685d6895f22f2e7a

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 b8c7a7ad9df52f4d540b9fb2627a9b380c03aab534b04f59664d46cfc4b328b3
MD5 db5d4c4a5106c5118c5379c02293ab2f
BLAKE2b-256 ae992fcad1d3d25b897067dc3173c7f8af1e772db7ec6b998cdc7c362a4375d3

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 327aca87b2d12c5dd40906dbf225c7735cd16e45f5bc85fd12cf154eb47a2796
MD5 b21f4f2a55e40ccfad695113bb7c63dd
BLAKE2b-256 e1220e3cfacd13b6b260bfe0db05719b394b55f297c074c9d83cac7c6586581c

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5495e83a71a3abf68c202afcdbd9c6ac8d55ca9a9e42d9f9baa4bc47bdc3c7f3
MD5 aba02eda1fb30bd6dac99259dd416d4f
BLAKE2b-256 5b6a0e9c7acf308834a85921565bbde39abca2f8ab6e72225c926fe30525f320

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp38-cp38-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp38-cp38-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 cd9024d0f390a0f4a94ea24c1c3ad7ce0d8e2ad339a4e8dc675804c8350765e5
MD5 8e1e47421aff841310a63f9afbdb67ed
BLAKE2b-256 f3e695acb3a68f3531aea8f511d9be039d62f81ebc281e539400b9e87252df62

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 26455a743380d180439b0f215fcc2127f988cb57215212d5150b096351461eab
MD5 139af352c376e1f5a780342983705d49
BLAKE2b-256 7bfe992d01428f2779997343be68c5b057c918b056185d5df4063948b29ef6e2

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 63b5a04c330ed18a70d9fbc8cab3333f985d5faf6e25e028db331e83a6bf60c0
MD5 5b84a7d30e17684e6a34ba36a1d5e54e
BLAKE2b-256 8d6ee6475a524649cbdee4c09f82f966c62ed766fc83391b3cc4a0efee27b693

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 26aaaaed03f403323c09631348df500bebb68f7ee19d293dbbbcbf8b68abfe7c
MD5 f3acf854d35336d0a4e812131bcf80db
BLAKE2b-256 6912de862d1587b07f8853688913eda629053fa8e0f23d6d7a20d065dd18596e

See more details on using hashes here.

File details

Details for the file optimal1dclustering-1.0.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for optimal1dclustering-1.0.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3bba541a504a235f6559e602831b3126f81997f3679e186c4353a787e5ac29b5
MD5 f3ab93d330fe64976a8200bd9b4a5416
BLAKE2b-256 c9fea0189218e36c3118d5e066afe4eb93bbfe3e242b982fd30acd55a8475bde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page