Skip to main content

Optimal univariate (1D) clustering based on Ckmeans.1d.dp

Project description

CKmeans: Optimal Univariate Clustering

Ckmeans clustering is an improvement on 1-dimensional (univariate) heuristic-based clustering approaches such as Jenks. The algorithm was developed by Haizhou Wang and Mingzhou Song (2011) as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.

Minimizing the difference within groups – what Wang & Song refer to as withinss, or within sum-of-squares – means that groups are optimally homogenous within and the data is split into representative groups. This is very useful for visualization, where one may wish to represent a continuous variable in discrete colour or style groups. This function can provide groups that emphasize differences between data.

Being a dynamic approach, this algorithm is based on two matrices that store incrementally-computed values for squared deviations and backtracking indexes.

Unlike the original implementation, this implementation does not include any code to automatically determine the optimal number of clusters: this information needs to be explicitly provided. It does provide the roundbreaks method to aid labelling, however.

Implementation

This library uses the ckmeans Rust crate, by the same author, implementing the ckmeans and breaks methods.

ckmeans(data, k)

Cluster data into k bins

Minimizing the difference within groups – what Wang & Song refer to as withinss, or within sum-of-squares, means that groups are optimally homogenous within groups and the data are split into representative groups. This is very useful for visualization, where one may wish to represent a continuous variable in discrete colour or style groups. This function can provide groups – or “classes” – that emphasize differences between data.

breaks(data, k)

Calculate k - 1 breaks in the data, distinguishing classes for labelling or visualisation

The boundaries of the classes returned by ckmeans are “ugly” in the sense that the values returned are the lower bound of each cluster, which aren't always practical for labelling, since they may have many decimal places. To create a legend, the values should be rounded — however the rounding might be either too loose (and would thus result in spurious decimal places), or too strict, resulting in classes ranging “from x to x”. A better approach is to choose the roundest number that separates the lowest point from a class from the highest point in the preceding class — thus giving just enough precision to distinguish the classes. This function is closer to what Jenks returns: k - 1 “breaks” in the data, useful for labelling.

This method is a port of the visionscarto method of the same name.

Benchmarks

Install optional dependencies, then run benchmark.py.

ckmeans-1d-dp is about 10 % slower than this package, but note that in addition, it only returns indices identifying each cluster to which the input belongs; if you actually want to cluster your data you need to do that yourself.

Examples

from ckmeans import ckmeans
import numpy as np


data = np.array([1.0, 2.0, 3.0, 4.0, 100.0, 101.0, 102.0, 103.0])
clusters = 2
result = ckmeans(data, clusters)
assert result == [
    np.array([1.0, 2.0, 3.0, 4.0]),
    np.array([100.0, 101.0, 102.0, 103.0])
]
from ckmeans import breaks
import numpy as np


data = np.array([1.0, 2.0, 3.0, 4.0, 100.0, 101.0, 102.0, 103.0])
clusters = 2
result = breaks(data, clusters)
assert result == [50.0,]

License

Blue Oak Model License 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckmeans-0.2.10.tar.gz (28.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ckmeans-0.2.10-cp310-abi3-win_amd64.whl (144.3 kB view details)

Uploaded CPython 3.10+Windows x86-64

ckmeans-0.2.10-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (266.0 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

ckmeans-0.2.10-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (260.0 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

ckmeans-0.2.10-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl (278.3 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.5+ i686

ckmeans-0.2.10-cp310-abi3-macosx_11_0_arm64.whl (234.7 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

ckmeans-0.2.10-cp310-abi3-macosx_10_12_x86_64.whl (248.1 kB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file ckmeans-0.2.10.tar.gz.

File metadata

  • Download URL: ckmeans-0.2.10.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ckmeans-0.2.10.tar.gz
Algorithm Hash digest
SHA256 32afceba4ffdd061297cf887c396e1a3610d641a26579f16a13e91ad74b824b9
MD5 8746dc7642d4c5f1d2cf6dadbcbf1cf2
BLAKE2b-256 203b862c27933ab0468b2cc6829c41c3e924cf6d00f2adf2b44083d392a01de7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ckmeans-0.2.10.tar.gz:

Publisher: CI.yml on urschrei/ckmeans_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ckmeans-0.2.10-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: ckmeans-0.2.10-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 144.3 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ckmeans-0.2.10-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 d442d7fc4fb9b0c59fa58d6e384ac4e0a326c71a4154472d5026c192bc3f251c
MD5 550933488b334efa815c947ca209053a
BLAKE2b-256 a87540a016122cc3079984ba2da2527284c09c714eb2f5420ca72a9435e7b4fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-win_amd64.whl:

Publisher: CI.yml on urschrei/ckmeans_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ckmeans-0.2.10-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.10-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6824ef40196435ab705153272a74e0c022fd80646f8619108a03ca44b2db3315
MD5 cb62569ebfe61a733eb69c105c6659ed
BLAKE2b-256 ad35d1d546dbd1c92db314405a71880bd2e164054e8989ec0072b7f46ee2b8a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: CI.yml on urschrei/ckmeans_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ckmeans-0.2.10-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.10-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6725d799c7a6bc4a771edc4b104b87d82f00a4b724426c3aebdc81a1da748e91
MD5 8a448614a9eaae77cc07de4bd42f4380
BLAKE2b-256 0b7f0e1c4d824ba0e50b823d2abc95643803c0480efbe02b5e6b881a0dcaea4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on urschrei/ckmeans_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ckmeans-0.2.10-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.10-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm Hash digest
SHA256 94a9c75800c25ae836cdb6b45f42388aa3ce700438f2303a0db543b93065b80d
MD5 abe25c23c359209cbc75895b4547ff31
BLAKE2b-256 a627b771e575791756a3e8cea832b5cc62309698a67a7c206bc24afb050a55aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl:

Publisher: CI.yml on urschrei/ckmeans_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ckmeans-0.2.10-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.10-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5d4d9d810079db2be9f583547249260c62eccd430e354dc7290a76642f812219
MD5 b9bb43563e74794dc13e713a696193ab
BLAKE2b-256 f4f95e81061cf3995b0b5549733333b135722394d52741485eca7e512e062d4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: CI.yml on urschrei/ckmeans_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ckmeans-0.2.10-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for ckmeans-0.2.10-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 eff8e8934537805357af18defe2a316f23289ddfa19e8a8e5dca8fd773dc220c
MD5 e81f5c0ec555587f9d40e3a575d7b338
BLAKE2b-256 23a6f17ea6f00a5548b630b7ed8e01fe0b53c689a8cf10acd04bb065a2688c5f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: CI.yml on urschrei/ckmeans_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page