Skip to main content

Optimal univariate (1D) clustering based on Ckmeans.1d.dp

Project description

CKmeans: Optimal Univariate Clustering

Ckmeans clustering is an improvement on 1-dimensional (univariate) heuristic-based clustering approaches such as Jenks. The algorithm was developed by Haizhou Wang and Mingzhou Song (2011) as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.

Minimizing the difference within groups – what Wang & Song refer to as withinss, or within sum-of-squares – means that groups are optimally homogenous within and the data is split into representative groups. This is very useful for visualization, where one may wish to represent a continuous variable in discrete colour or style groups. This function can provide groups that emphasize differences between data.

Being a dynamic approach, this algorithm is based on two matrices that store incrementally-computed values for squared deviations and backtracking indexes.

Unlike the original implementation, this implementation does not include any code to automatically determine the optimal number of clusters: this information needs to be explicitly provided. It does provide the roundbreaks method to aid labelling, however.

Implementation

This library uses the ckmeans Rust crate, by the same author, implementing the ckmeans and breaks methods.

ckmeans(data, k)

Cluster data into k bins

Minimizing the difference within groups – what Wang & Song refer to as withinss, or within sum-of-squares, means that groups are optimally homogenous within groups and the data are split into representative groups. This is very useful for visualization, where one may wish to represent a continuous variable in discrete colour or style groups. This function can provide groups – or “classes” – that emphasize differences between data.

breaks(data, k)

Calculate k - 1 breaks in the data, distinguishing classes for labelling or visualisation

The boundaries of the classes returned by ckmeans are “ugly” in the sense that the values returned are the lower bound of each cluster, which aren't always practical for labelling, since they may have many decimal places. To create a legend, the values should be rounded — however the rounding might be either too loose (and would thus result in spurious decimal places), or too strict, resulting in classes ranging “from x to x”. A better approach is to choose the roundest number that separates the lowest point from a class from the highest point in the preceding class — thus giving just enough precision to distinguish the classes. This function is closer to what Jenks returns: k - 1 “breaks” in the data, useful for labelling.

This method is a port of the visionscarto method of the same name.

Benchmarks

Install optional dependencies, then run benchmark.py.

ckmeans-1d-dp is about 20 % faster, but note that it only returns indices identifying each cluster to which the input belongs; if you actually want to cluster your data, you need to do that yourself which I strongly suspect might be slower overall. On the other hand, if all you want is indices it may be a better choice.

Examples

from ckmeans import ckmeans
import numpy as np


data = np.array([1.0, 2.0, 3.0, 4.0, 100.0, 101.0, 102.0, 103.0])
clusters = 2
result = ckmeans(data, clusters)
assert result == [
    np.array([1.0, 2.0, 3.0, 4.0]),
    np.array([100.0, 101.0, 102.0, 103.0])
]
from ckmeans import breaks
import numpy as np


data = np.array([1.0, 2.0, 3.0, 4.0, 100.0, 101.0, 102.0, 103.0])
clusters = 2
result = breaks(data, clusters)
assert result == [50.0,]

License

Blue Oak Model License 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckmeans-0.2.7.tar.gz (10.0 kB view details)

Uploaded Source

Built Distributions

ckmeans-0.2.7-cp310-abi3-win_amd64.whl (146.0 kB view details)

Uploaded CPython 3.10+ Windows x86-64

ckmeans-0.2.7-cp310-abi3-win32.whl (134.4 kB view details)

Uploaded CPython 3.10+ Windows x86

ckmeans-0.2.7-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (253.7 kB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ x86-64

ckmeans-0.2.7-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (260.2 kB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ ARM64

ckmeans-0.2.7-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl (258.0 kB view details)

Uploaded CPython 3.10+ manylinux: glibc 2.5+ i686

ckmeans-0.2.7-cp310-abi3-macosx_11_0_arm64.whl (223.9 kB view details)

Uploaded CPython 3.10+ macOS 11.0+ ARM64

ckmeans-0.2.7-cp310-abi3-macosx_10_12_x86_64.whl (228.8 kB view details)

Uploaded CPython 3.10+ macOS 10.12+ x86-64

File details

Details for the file ckmeans-0.2.7.tar.gz.

File metadata

  • Download URL: ckmeans-0.2.7.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for ckmeans-0.2.7.tar.gz
Algorithm Hash digest
SHA256 27aae4cd1b5d934cd0da76e9b7835470726b6d9e22d348743b109b90d9af1621
MD5 7e63080a15dd9c7ec778a7c7ceb1f1e0
BLAKE2b-256 3337335d9b70ef8587199ff75647cb6c7466a3f7189d11ad402317e64e407dfd

See more details on using hashes here.

File details

Details for the file ckmeans-0.2.7-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: ckmeans-0.2.7-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 146.0 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for ckmeans-0.2.7-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f352236cc3c233c7bee66b7371bbc2aa5e2c67f0c899cb688fc808b5a7fffb29
MD5 eaa16859b3686e16eaeef231d572aed4
BLAKE2b-256 98fa74cb270eaa0ce19a0ff474812999d88a4389e1358c2bcef4895f3785dadb

See more details on using hashes here.

File details

Details for the file ckmeans-0.2.7-cp310-abi3-win32.whl.

File metadata

  • Download URL: ckmeans-0.2.7-cp310-abi3-win32.whl
  • Upload date:
  • Size: 134.4 kB
  • Tags: CPython 3.10+, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for ckmeans-0.2.7-cp310-abi3-win32.whl
Algorithm Hash digest
SHA256 f34e70fa5bedbccac64976a517d16ce7f31c89d30f3342f733f7908ace49d9f6
MD5 590416d0099097cd086aa550cf5663dd
BLAKE2b-256 f71ad3fbc64971e5b7ed69c872211b06cc9c709e61164bdb66f1097609da4d2b

See more details on using hashes here.

File details

Details for the file ckmeans-0.2.7-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.7-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9d8cdb28cbc345170c37253c2f039d9b7225502dfdac01095cdac0d5082872a0
MD5 ca5a3f34961b98756bd548feb5b6771a
BLAKE2b-256 0bebed1797e30e0190fccae08e12e71052633fad75aa656c8ae1b96ad5869562

See more details on using hashes here.

File details

Details for the file ckmeans-0.2.7-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.7-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c9df2be8fd8aed92d795f1a432165b6d36c0e340137c3c52395ee2e3dbdc8c98
MD5 2fb17fba3b21e24990fe9fb02d460a8f
BLAKE2b-256 5d1ccbff530372a34194e9a546bace2eb9983ad54ab7ba6861bf3dac004e662f

See more details on using hashes here.

File details

Details for the file ckmeans-0.2.7-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.7-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm Hash digest
SHA256 6904243a35dfee2b36d5e1cd4702b540c2ace424c1bc4ab1156823b9b2fe5c2b
MD5 fc8568d38147291d45be04fcaf23a0d4
BLAKE2b-256 e2256143edb8f68cdb3d0b3c94fc6a9c91a331c4ef02d08023b7ca2d0ec79538

See more details on using hashes here.

File details

Details for the file ckmeans-0.2.7-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.7-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2c90de486a1e3916070c117d47fa0b283fcca9483226d161618da2dd279fac64
MD5 ddbf9470987129ef0255181e3ea6157f
BLAKE2b-256 8b4a7b0c892f280eee9894caef7adc9886b6d3b3b9bff4ec2f6021e023b98a6b

See more details on using hashes here.

File details

Details for the file ckmeans-0.2.7-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.7-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ba2a06e66048bbf9941259b5b0587687a60b45d193ec0fd493bedc6580261cbc
MD5 e9245e4ad9e96fe804c456aed619b639
BLAKE2b-256 168a53ee026b337e7ee9d485ff2d9e9cd4386329a9161005d4e09ca884dfbffa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page