Skip to main content

A function-based implementation of k-means clustering that maintains data association.

Project description

K-Means Clustering

PyPI version Docs

A repository documenting the implementation of k-Means clustering in Python. Usage examples can be found in the tests directory.

The thing that makes this k-means clustering module different from others is that it allows the user to specify the number of dimensions to use for the clustering operation.

For example, given some data where each element is of form

# Each element would actually be a Numpy array, but the following uses lists for readability.
[
  [1, 2, 3, 4, 5],
  [4, 6, 7, 8, 2],
  ...
]

specifying ndim=3 will result in only the first three elements of each data point being used for each operation.

This is useful for maintaining data association where it otherwise would be shuffled. An example of this is found in my implementation of image segmentation (segmentation.py) in this same project. Other examples of use could be for maintaining data association in object detection elements. Given some

[xmin, ymin, xmax, ymax, conf, label]  # [bounding box, conf, label]

we may want to cluster the data solely on bounding box information while also maintaining the confidence intervals for each detection for further processing.


Installation

$ python -m pip install kmeans-tjdwill

How it Works

Specifying the k value results in a dict[int: NDArray] where each NDArray contains the elements within the cluster. The keys of this dict range from 0 to k-1, allowing the key to also be used to index the corresponding cluster centroid from the centroid array.

Here is an example of the use of the cluster function:

>>> from kmeans import cluster
>>> import numpy as np
>>> np.random.seed(27)   # For reproducible results
>>> data = np.random.random((15, 5)).round(3)
>>> data[0]
array([0.426, 0.815, 0.735, 0.868, 0.383])
>>> # Cluster using only first two dimensions
>>> clusters, centroids = cluster(data, k=3, ndim=2, tolerance=0.001)
>>> centroids
array([[0.9004  , 0.79    ],
      [0.361375, 0.580125],
      [0.801   , 0.143   ]])
>>> clusters  # visually compare centroids with first two elements of each data entry.
{0: array([[0.979, 0.893, 0.21 , 0.742, 0.663],
     [0.887, 0.858, 0.749, 0.87 , 0.187],
     [0.966, 0.583, 0.092, 0.014, 0.837],
     [0.915, 0.705, 0.387, 0.706, 0.923],
     [0.755, 0.911, 0.242, 0.976, 0.304]]),
1: array([[0.426, 0.815, 0.735, 0.868, 0.383],
     [0.326, 0.373, 0.794, 0.151, 0.17 ],
     [0.081, 0.305, 0.783, 0.163, 0.071],
     [0.221, 0.726, 0.849, 0.929, 0.736],
     [0.477, 0.493, 0.595, 0.076, 0.117],
     [0.288, 0.684, 0.52 , 0.877, 0.924],
     [0.489, 0.596, 0.264, 0.992, 0.21 ],
     [0.583, 0.649, 0.911, 0.122, 0.676]]),
2: array([[0.701, 0.181, 0.599, 0.415, 0.514],
     [0.901, 0.105, 0.673, 0.87 , 0.561]])}

Features

  • k-means clustering (no side-effects)
  • k-means clustering w/ animation
    • (2-D & 3-D)
  • image segmentation via kmeans.segmentation.segment_img function

k-means Animation

Using the view_clustering function

2-D Case (Smallest Tolerance Possible)

kmeans2D_animate.webm

3-D Case (Tolerance = 0.001)

kmeans3D_animate.webm

Image Segmentation

Perform image segmentation based on color groups specified by the user.

Two options:

Averaged Colors

k=4

seg_groups04

k=10

seg_groups10

Random Colors

k=4

seg_rand_groups04_cpy


Developed With

  • Python (3.12.1)
  • Numpy (1.26.2)
  • Matplotlib (3.8.4)

However, no features specific to Python 3.12 were used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmeans_tjdwill-1.0.4.tar.gz (82.4 kB view details)

Uploaded Source

Built Distribution

kmeans_tjdwill-1.0.4-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file kmeans_tjdwill-1.0.4.tar.gz.

File metadata

  • Download URL: kmeans_tjdwill-1.0.4.tar.gz
  • Upload date:
  • Size: 82.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for kmeans_tjdwill-1.0.4.tar.gz
Algorithm Hash digest
SHA256 1bf207bf8da93887b9e8311e499d8a8dbe3685bc97f288a5a2ae7f463c159c15
MD5 51d40863c2c896b28302fa48809615a5
BLAKE2b-256 36b00e72a1ceb5c9961e0687fa08f532ade8d8a1700eb79a4997cb93dc963357

See more details on using hashes here.

File details

Details for the file kmeans_tjdwill-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for kmeans_tjdwill-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 81b914f99cb0ac6a68599aad9bbb4c4b6f3fe37390e6602568b9d446e9ffbfd9
MD5 8b1456394d947cf2246fcdc830c4a6ab
BLAKE2b-256 22b7ce0c31eb244b26c245e6355a0c5a7a6d7377e7d0696956dcd6730d326272

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page