Optimal univariate (1D) clustering based on Ckmeans.1d.dp
Project description
CKmeans: Optimal Univariate Clustering
Ckmeans clustering is an improvement on 1-dimensional (univariate) heuristic-based clustering approaches such as Jenks. The algorithm was developed by Haizhou Wang and Mingzhou Song (2011) as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.
Minimizing the difference within groups – what Wang & Song refer to as withinss, or within sum-of-squares – means that groups are optimally homogenous within and the data is split into representative groups. This is very useful for visualization, where one may wish to represent a continuous variable in discrete colour or style groups. This function can provide groups that emphasize differences between data.
Being a dynamic approach, this algorithm is based on two matrices that store incrementally-computed values for squared deviations and backtracking indexes.
Unlike the original implementation, this implementation does not include any code to automatically determine the optimal number of clusters: this information needs to be explicitly provided. It does provide the roundbreaks method to aid labelling, however.
Implementation
This library uses the ckmeans Rust crate, by the same author, implementing the ckmeans and breaks methods.
ckmeans(data, k)
Cluster data into k bins
Minimizing the difference within groups – what Wang & Song refer to as withinss,
or within sum-of-squares, means that groups are optimally homogenous within groups and the data are
split into representative groups. This is very useful for visualization, where one may wish to
represent a continuous variable in discrete colour or style groups. This function can provide
groups – or “classes” – that emphasize differences between data.
breaks(data, k)
Calculate k - 1 breaks in the data, distinguishing classes for labelling or visualisation
The boundaries of the classes returned by ckmeans are “ugly” in the sense that the values
returned are the lower bound of each cluster, which aren't always practical for labelling, since they
may have many decimal places. To create a legend, the values should be rounded — however the
rounding might be either too loose (and would thus result in spurious decimal places), or too
strict, resulting in classes ranging “from x to x”. A better approach is to choose the roundest
number that separates the lowest point from a class from the highest point in the preceding
class — thus giving just enough precision to distinguish the classes.
This function is closer to what Jenks returns: k - 1 “breaks” in the data, useful for labelling.
This method is a port of the visionscarto method of the same name.
Benchmarks
Install optional dependencies, then run benchmark.py.
ckmeans-1d-dp is about 10 % slower than this package, but note that in addition, it only returns indices identifying each cluster to which the input belongs; if you actually want to cluster your data you need to do that yourself.
Examples
from ckmeans import ckmeans
import numpy as np
data = np.array([1.0, 2.0, 3.0, 4.0, 100.0, 101.0, 102.0, 103.0])
clusters = 2
result = ckmeans(data, clusters)
assert result == [
np.array([1.0, 2.0, 3.0, 4.0]),
np.array([100.0, 101.0, 102.0, 103.0])
]
from ckmeans import breaks
import numpy as np
data = np.array([1.0, 2.0, 3.0, 4.0, 100.0, 101.0, 102.0, 103.0])
clusters = 2
result = breaks(data, clusters)
assert result == [50.0,]
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ckmeans-0.2.10.tar.gz.
File metadata
- Download URL: ckmeans-0.2.10.tar.gz
- Upload date:
- Size: 28.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32afceba4ffdd061297cf887c396e1a3610d641a26579f16a13e91ad74b824b9
|
|
| MD5 |
8746dc7642d4c5f1d2cf6dadbcbf1cf2
|
|
| BLAKE2b-256 |
203b862c27933ab0468b2cc6829c41c3e924cf6d00f2adf2b44083d392a01de7
|
Provenance
The following attestation bundles were made for ckmeans-0.2.10.tar.gz:
Publisher:
CI.yml on urschrei/ckmeans_py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ckmeans-0.2.10.tar.gz -
Subject digest:
32afceba4ffdd061297cf887c396e1a3610d641a26579f16a13e91ad74b824b9 - Sigstore transparency entry: 271644351
- Sigstore integration time:
-
Permalink:
urschrei/ckmeans_py@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Branch / Tag:
refs/tags/v0.2.10 - Owner: https://github.com/urschrei
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Trigger Event:
push
-
Statement type:
File details
Details for the file ckmeans-0.2.10-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: ckmeans-0.2.10-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 144.3 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d442d7fc4fb9b0c59fa58d6e384ac4e0a326c71a4154472d5026c192bc3f251c
|
|
| MD5 |
550933488b334efa815c947ca209053a
|
|
| BLAKE2b-256 |
a87540a016122cc3079984ba2da2527284c09c714eb2f5420ca72a9435e7b4fa
|
Provenance
The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-win_amd64.whl:
Publisher:
CI.yml on urschrei/ckmeans_py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ckmeans-0.2.10-cp310-abi3-win_amd64.whl -
Subject digest:
d442d7fc4fb9b0c59fa58d6e384ac4e0a326c71a4154472d5026c192bc3f251c - Sigstore transparency entry: 271644357
- Sigstore integration time:
-
Permalink:
urschrei/ckmeans_py@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Branch / Tag:
refs/tags/v0.2.10 - Owner: https://github.com/urschrei
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Trigger Event:
push
-
Statement type:
File details
Details for the file ckmeans-0.2.10-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: ckmeans-0.2.10-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 266.0 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6824ef40196435ab705153272a74e0c022fd80646f8619108a03ca44b2db3315
|
|
| MD5 |
cb62569ebfe61a733eb69c105c6659ed
|
|
| BLAKE2b-256 |
ad35d1d546dbd1c92db314405a71880bd2e164054e8989ec0072b7f46ee2b8a9
|
Provenance
The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
CI.yml on urschrei/ckmeans_py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ckmeans-0.2.10-cp310-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl -
Subject digest:
6824ef40196435ab705153272a74e0c022fd80646f8619108a03ca44b2db3315 - Sigstore transparency entry: 271644362
- Sigstore integration time:
-
Permalink:
urschrei/ckmeans_py@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Branch / Tag:
refs/tags/v0.2.10 - Owner: https://github.com/urschrei
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Trigger Event:
push
-
Statement type:
File details
Details for the file ckmeans-0.2.10-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: ckmeans-0.2.10-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 260.0 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6725d799c7a6bc4a771edc4b104b87d82f00a4b724426c3aebdc81a1da748e91
|
|
| MD5 |
8a448614a9eaae77cc07de4bd42f4380
|
|
| BLAKE2b-256 |
0b7f0e1c4d824ba0e50b823d2abc95643803c0480efbe02b5e6b881a0dcaea4e
|
Provenance
The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
CI.yml on urschrei/ckmeans_py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ckmeans-0.2.10-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl -
Subject digest:
6725d799c7a6bc4a771edc4b104b87d82f00a4b724426c3aebdc81a1da748e91 - Sigstore transparency entry: 271644367
- Sigstore integration time:
-
Permalink:
urschrei/ckmeans_py@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Branch / Tag:
refs/tags/v0.2.10 - Owner: https://github.com/urschrei
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Trigger Event:
push
-
Statement type:
File details
Details for the file ckmeans-0.2.10-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl.
File metadata
- Download URL: ckmeans-0.2.10-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl
- Upload date:
- Size: 278.3 kB
- Tags: CPython 3.10+, manylinux: glibc 2.5+ i686
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94a9c75800c25ae836cdb6b45f42388aa3ce700438f2303a0db543b93065b80d
|
|
| MD5 |
abe25c23c359209cbc75895b4547ff31
|
|
| BLAKE2b-256 |
a627b771e575791756a3e8cea832b5cc62309698a67a7c206bc24afb050a55aa
|
Provenance
The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl:
Publisher:
CI.yml on urschrei/ckmeans_py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ckmeans-0.2.10-cp310-abi3-manylinux1_i686.manylinux_2_5_i686.whl -
Subject digest:
94a9c75800c25ae836cdb6b45f42388aa3ce700438f2303a0db543b93065b80d - Sigstore transparency entry: 271644352
- Sigstore integration time:
-
Permalink:
urschrei/ckmeans_py@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Branch / Tag:
refs/tags/v0.2.10 - Owner: https://github.com/urschrei
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Trigger Event:
push
-
Statement type:
File details
Details for the file ckmeans-0.2.10-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: ckmeans-0.2.10-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 234.7 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d4d9d810079db2be9f583547249260c62eccd430e354dc7290a76642f812219
|
|
| MD5 |
b9bb43563e74794dc13e713a696193ab
|
|
| BLAKE2b-256 |
f4f95e81061cf3995b0b5549733333b135722394d52741485eca7e512e062d4e
|
Provenance
The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-macosx_11_0_arm64.whl:
Publisher:
CI.yml on urschrei/ckmeans_py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ckmeans-0.2.10-cp310-abi3-macosx_11_0_arm64.whl -
Subject digest:
5d4d9d810079db2be9f583547249260c62eccd430e354dc7290a76642f812219 - Sigstore transparency entry: 271644363
- Sigstore integration time:
-
Permalink:
urschrei/ckmeans_py@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Branch / Tag:
refs/tags/v0.2.10 - Owner: https://github.com/urschrei
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Trigger Event:
push
-
Statement type:
File details
Details for the file ckmeans-0.2.10-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: ckmeans-0.2.10-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 248.1 kB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eff8e8934537805357af18defe2a316f23289ddfa19e8a8e5dca8fd773dc220c
|
|
| MD5 |
e81f5c0ec555587f9d40e3a575d7b338
|
|
| BLAKE2b-256 |
23a6f17ea6f00a5548b630b7ed8e01fe0b53c689a8cf10acd04bb065a2688c5f
|
Provenance
The following attestation bundles were made for ckmeans-0.2.10-cp310-abi3-macosx_10_12_x86_64.whl:
Publisher:
CI.yml on urschrei/ckmeans_py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ckmeans-0.2.10-cp310-abi3-macosx_10_12_x86_64.whl -
Subject digest:
eff8e8934537805357af18defe2a316f23289ddfa19e8a8e5dca8fd773dc220c - Sigstore transparency entry: 271644359
- Sigstore integration time:
-
Permalink:
urschrei/ckmeans_py@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Branch / Tag:
refs/tags/v0.2.10 - Owner: https://github.com/urschrei
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@c06c1ca83df37d67a7d0ceaef7a5f02f727e645f -
Trigger Event:
push
-
Statement type: