Skip to main content

Cluster rgb vectors with divisive kmedians

Project description

Cluster Colors (or other vectors)

Processing and clustering colors from images presents some challenges:

  • Even a small (800x600) image will have up to 480,000 colors.
  • Solutions like PIL's Image.quantize, on the other hand, make the colors sample too coarse.
  • Even after reducing color variety, you're still dealing with 480,000 color instances.
  • Solutions like scikit-learn's KMeans handle some of these challenges, but are non-deterministic and not flexible in the ways that I'd like.

I provide three steps here:

Pool colors

Average similar colors. Specifically, this maps an 8-bit color space to an n-bit color space then averages colors in each bin. An argument, nbits, specifies the number of bits to use for each color channel. The default is 6, which reduces 17-million-ish possible colors to 300-thousand-ish possible colors. The downside is that the boundaries between n-bit bins are arbitrary. Heavy concentrations of near-identical colors will be split if a boundary passes through them.

Pooling colors from an image path will write a cache to your temp directory.

Cut colors

Reduce the number of colors by recursively splitting the color space along the longest axis. This is a median cut algorithm, but it's not constrained to x, y, or z axes. The longest axis is determined by the standard deviation of the colors in the cluster. I've made the cut just a little bit smarter than standard median cut, but this is essentially k-medoids without the re-distribution step, so it's more efficient, but not the best we can do. An argument, num, specifies the number of colors to reduce to. 512 is a good number, but if you're still missing some nuance, you can increase it.

Divisive and Agglomerative clustering

  • Both are deterministic.
  • Both handle frequency, weight, transparency.
  • Both allow a user-defined proximity matrix, so you can use whatever delta function you like as long as delta(a,a) is 0 and delta(a,b) is never 0. Common choices are Euclidean, squared Euclidean, and delta-e.
  • Divisive uses a variation of median cut followed by a kmediods-like reassignment step to conversion.
  • Agglomerative uses complete linkage.
  • Divisive is more robust to outliers and will give more even-sized clusters.
  • Divisive child clusters will not necessarily contain (or only contain) the members of the parent.
  • Agglomerative is more likely to separate outliers.
  • Agglomerative is heirarchical.

Divisive clustering is typically better for, "What are the five dominant colors in this image?"

Agglomerative clustering is typically better for, "How many colors do I need to represent this image with no more than delta==3 between any two cluster members?"

Installation

pip install cluster_colors

Basic usage

from cluster_colors import get_image_clusters

# find the five most dominant colors in an image
clusters = get_image_clusters(image_filename)
clusters.split_to_n(5)
exemplars = clusters.get_as_vectors()

# to save the cluster exemplars as an image file
show_clusters(split_clusters, "open_file_to_see_clusters")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cluster_colors-0.16.0.tar.gz (264.8 kB view details)

Uploaded Source

Built Distribution

cluster_colors-0.16.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file cluster_colors-0.16.0.tar.gz.

File metadata

  • Download URL: cluster_colors-0.16.0.tar.gz
  • Upload date:
  • Size: 264.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for cluster_colors-0.16.0.tar.gz
Algorithm Hash digest
SHA256 995cf5b3722eef24c131c23ec4a2b6c093a927cc1b049f5d1a0ae725d7b718a6
MD5 fa9eaac205471f4e444dfe33f81a7378
BLAKE2b-256 cd060121208da8a0ce9d0122a46cfade43d6ceb13fb2d5b724e5dc5f9ac98438

See more details on using hashes here.

File details

Details for the file cluster_colors-0.16.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cluster_colors-0.16.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3267cc6f75e4c49353ef75357adc3b17071363703421cd76b0547f85407e57bf
MD5 48c382db2f3ebad8bd4f36ccbee4c7e8
BLAKE2b-256 a219c2852aaa028253432958c7a9240ce97af984cfaaf2d2ad3276d9d5dd330d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page