Estimate the true number of clusters for k-means clustering using the Cluster Consistency Criterion (CCC).
Project description
Estimate the true number of clusters for k-means clustering using the Cluster Consistency Criterion (CCC).
This algorithm follows the rationale that true cluster centres should be similar in random split-halves of the data. If too maby clusters are specified, the cluster centres will become driven by random sampling error.
The CCC implements this as follows. For each number of clusters, the data are split into random halves for a given number of splits (e.g., 20). For each split, a k-means cluster analysis is run on each half separately. The distances between most-similar cluster centres are summed. The similarity score is e^(-distance_sum). The mean similarity score over random splits is the score for the given number of clusters.
The best estimate of the true number of clusters is determined by where the improvement in score drops off, which occurs when the number of clusters becomes higher than the true number of clusters.
The file test.py gives an example and simulation script. Usage is:
O = teg_CCC.get_best_k_CCC(X)
where X is a 2D array of shape N_Observations x N_Variables. There is an optional argument for max_n_clusters, set to 10 by default. The output is a dictionary with the estimate of true cluster centres (best_n) as well as the similarity score per number of clusters (scores_per_n) and the associated number of clusters (n_vec).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file teg_CCC-0.0.3.tar.gz
.
File metadata
- Download URL: teg_CCC-0.0.3.tar.gz
- Upload date:
- Size: 3.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdf745cdf4b4326ab9e61dd4cc90bd89522f73b6f9f44843c1cce904042f5e2b |
|
MD5 | f6c0f40020ffeacb427d04dd5fcbf44a |
|
BLAKE2b-256 | b56331da6fc8b67d3613cbab995a952979d0ddd39559f20b90f20702deaf7c85 |
File details
Details for the file teg_CCC-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: teg_CCC-0.0.3-py3-none-any.whl
- Upload date:
- Size: 3.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23d84b54aa26bb078d4d5719ef92e3b7937b5bdc06c7ea5c043dcc972ae635e5 |
|
MD5 | 3f65b1b82bdd34c242f827f347bdf047 |
|
BLAKE2b-256 | 6b1ddfa0378d7c647f7bb6f47341da4b0e1ab5812fdab196b71466326f5fefd0 |