Skip to main content

A simple library to plot the elbow plot for K-means clustering.

Project description

ElbowPlot

ElbowPlot is a Python library designed to facilitate the visualization of the optimal number of clusters in K-means clustering through the elbow method. This method is particularly useful in unsupervised learning to determine the ideal number of clusters by identifying the point at which the within-cluster sum of squares (WCSS) begins to diminish, forming an "elbow".

What is the Elbow Method?

The elbow method plots the values of the WCSS as the number of clusters increases. WCSS is the sum of squared distances between each point and the centroid in a cluster. As the number of clusters increases, WCSS continues to decrease as points will be closer to the centroids they are assigned to. The goal is to identify the number of clusters where the decrease in WCSS begins to level off (forming an elbow). Choosing the number of clusters beyond the elbow will not result in significant gains in performance and may lead to overfitting.

Installation

You can install ElbowPlot directly from PyPI:

pip install elbowplot

Dependencies

ElbowPlot requires the following Python libraries:

  • NumPy
  • Matplotlib
  • scikit-learn

These dependencies will be automatically installed when you install ElbowPlot.

Usage

Basic Example

Here's a simple example demonstrating how to use ElbowPlot with a synthetic dataset:

import numpy as np
from elbowplot.core import elbow_plot

# Generate some random data
np.random.seed(0)
data = np.random.rand(150, 2)  # 150 points in 2 dimensions

# Determine the optimal number of clusters by visualizing the elbow plot
elbow_plot(data, 10)  # Test from 1 to 9 clusters

Output

When you run the above code, you will see a plot with the number of clusters on the X-axis and the inertia (WCSS) on the Y-axis. The plot will have points marked for each number of clusters tested, and a line connecting these points. Look for the point where the inertia begins to decrease at a slower rate, which typically resembles an "elbow".

Understanding the Output

When you run the elbow_plot function, it generates a line plot that visualizes the relationship between the number of clusters and the within-cluster sum of squares (WCSS), also known as inertia. Here's what you should look for in the plot:

  • X-axis: Represents the number of clusters tested. In the example provided, this ranges from 1 to 9 clusters.
  • Y-axis: Represents the inertia for each cluster count. Inertia is calculated as the sum of the squared distances between each point and its nearest cluster center.

Key Features of the Plot

  • Data Points: Each point on the plot corresponds to the inertia calculated with a specific number of clusters.
  • Line Connecting Points: A line connects these points, making it easier to see the rate at which inertia decreases as the number of clusters increases.

Identifying the Elbow

The "elbow" point on the plot is the key feature to look for. It represents the number of clusters at which the decrease in inertia shifts from being rapid to more gradual. Here’s how you can identify it:

  1. Rapid Decline: Initially, as you increase the number of clusters from 1 onwards, the inertia decreases sharply.
  2. Leveling Off: After a certain number of clusters, this decrease slows significantly, indicating that adding more clusters does not contribute significantly to gaining better clustering performance. This point of inflection is known as the "elbow".

Example Interpretation

If the elbow occurs at 4 clusters, this suggests that increasing the number of clusters beyond 4 will result in diminishing returns in terms of lowering inertia. Thus, 4 can be considered an optimal number of clusters for the given data.

Visual Example

Below is a theoretical representation of an elbow plot:

elbow-method

Contributing

Contributions to ElbowPlot are welcome! Please fork the project, make your changes, and submit a pull request on GitHub.

License

ElbowPlot is open-source software licensed under the MIT license. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elbowplot-0.3.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

elbowplot-0.3.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file elbowplot-0.3.0.tar.gz.

File metadata

  • Download URL: elbowplot-0.3.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for elbowplot-0.3.0.tar.gz
Algorithm Hash digest
SHA256 66d5a5722c0050d4715ebfa811bb4823d5dec3648561460040aa3f67d406e9ed
MD5 394110628e725f4b6d3bd0f6d6760ff9
BLAKE2b-256 e1ad360ca3f9ffce39cf77d18c73b46bfd9f1a0224d2ee73a4352ce478c37e5a

See more details on using hashes here.

File details

Details for the file elbowplot-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: elbowplot-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for elbowplot-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 30faeb7effedb102a12ea5c848401b534b9d4482d654922601dda1a16c06cab3
MD5 58e9b8caf5067207907f3d0b3182dc46
BLAKE2b-256 0ab9b245e8aed870bd3d8e1c1aa2fb7b9526d622a81b182ba59959acd77c225a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page