Skip to main content

find the best k value of K-mean based on Gap statistics

Project description

this script is using the gap statistics to run k-means algorithm for many times to find the best K value for the dataset.

because k-mean really depends on the initial points and thus the results can be different given different initial points; therefore use sklearn packages to run many times with different initial ponits, and this can be one parameter for the gap statistics.

this module should be imported into other python scripts and combined with sklearn to find the best K value.

parameters:

refs: np.array or None, it is the replicated data that you want to compare with if there exists one; if no existing replicated/proper data, just use None, and the function will automatically generates them;

B: int, the number of replicated samples to run gap-statistics; it is recommended as 10, and it should not be changed/decreased that to a smaller value;

K: list, the range of K values to test on;

N_init: int, states the number of initial starting points for each K-mean running under sklearn, in order to get stable clustering result each time; you may not need such many starting points, so it can be reduced to a smaller number to quicken the computation;

n_jobs: int, clarifies the parallel computing, could fasten the computation, this can be only changed inside the script, not as an argument of the function;

# to install

pip install gapkmean

# to use as a module in python

from gapkmean import gap

# to find the best K value of K-mean algorithm

#note data should be an numpy.array gaps, s_k, K = gap.gap_statistic(data, refs=None, B=10, K=range(1,11), N_init = 10) bestKValue = gap.find_optimal_k(gaps, s_k, K)

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gapkmean-1.0.tar.gz (3.4 kB view details)

Uploaded Source

File details

Details for the file gapkmean-1.0.tar.gz.

File metadata

  • Download URL: gapkmean-1.0.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gapkmean-1.0.tar.gz
Algorithm Hash digest
SHA256 59b8ec805cff86857b57628211f5ff45485f52cdd66716e219ea60692145dc66
MD5 aab97ed10fdc789ac85148f452d10efe
BLAKE2b-256 2cab2a89ddef18df4ec8b5d352eaa34c78e7957443206e2932cfe91bdc5718fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page