Skip to main content

K-Means clustering constrained with minimum and maximum cluster size

Project description

PyPI Python Build Documentation

k-means-constrained

K-means clustering implementation whereby a minimum and/or maximum size for each cluster can be specified.

This K-means implementation modifies the cluster assignment step (E in EM) by formulating it as a Minimum Cost Flow (MCF) linear network optimisation problem. This is then solved using a cost-scaling push-relabel algorithm and uses Google's Operations Research tools's SimpleMinCostFlow which is a fast C++ implementation.

This package is inspired by Bradley et al.. The original Minimum Cost Flow (MCF) network proposed by Bradley et al. has been modified so maximum cluster sizes can also be specified along with minimum cluster size.

The code is based on scikit-lean's KMeans and implements the same API with modifications.

Ref:

  1. Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering." Microsoft Research, Redmond (2000): 1-8.
  2. Google's SimpleMinCostFlow C++ implementation

Installation

You can install the k-means-constrained from PyPI:

pip install k-means-constrained

It is supported on Python 3.10, 3.11, 3.12 and 3.13. Previous versions of k-means-constrained support older versions of Python and Numpy.

Example

More details can be found in the API documentation.

>>> from k_means_constrained import KMeansConstrained
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...                [4, 2], [4, 4], [4, 0]])
>>> clf = KMeansConstrained(
...     n_clusters=2,
...     size_min=2,
...     size_max=5,
...     random_state=0
... )
>>> clf.fit_predict(X)
array([0, 0, 0, 1, 1, 1], dtype=int32)
>>> clf.cluster_centers_
array([[ 1.,  2.],
       [ 4.,  2.]])
>>> clf.labels_
array([0, 0, 0, 1, 1, 1], dtype=int32)
Code only
from k_means_constrained import KMeansConstrained
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
                [4, 2], [4, 4], [4, 0]])
clf = KMeansConstrained(
     n_clusters=2,
     size_min=2,
     size_max=5,
     random_state=0
 )
clf.fit_predict(X)
clf.cluster_centers_
clf.labels_

Time complexity and runtime

k-means-constrained is a more complex algorithm than vanilla k-means and therefore will take longer to execute and has worse scaling characteristics.

Given a number of data points $n$ and clusters $c$, the time complexity of:

  • k-means: $\mathcal{O}(nc)$
  • k-means-constrained1: $\mathcal{O}((n^3c+n^2c^2+nc^3)\log(n+c)))$

This assumes a constant number of algorithm iterations and data-point features/dimensions.

If you consider the case where $n$ is the same order as $c$ ($n \backsim c$) then:

  • k-means: $\mathcal{O}(n^2)$
  • k-means-constrained1: $\mathcal{O}(n^4\log(n)))$

Below is a runtime comparison between k-means and k-means-constrained whereby the number of iterations, initializations, multi-process pool size and dimension size are fixed. The number of clusters is also always one-tenth the number of data points $n=10c$. It is shown above that the runtime is independent of the minimum or maximum cluster size, and so none is included below.

Data-points vs execution time for k-means vs k-means-constrained. Data-points=10*clusters. No min/max constraints

System details
  • OS: Linux-5.15.0-75-generic-x86_64-with-glibc2.35
  • CPU: AMD EPYC 7763 64-Core Processor
  • CPU cores: 120
  • k-means-constrained version: 0.7.3
  • numpy version: 1.24.2
  • scipy version: 1.11.1
  • ortools version: 9.6.2534
  • joblib version: 1.3.1
  • sklearn version: 1.3.0
---

1: Ortools states the time complexity of their cost-scaling push-relabel algorithm for the min-cost flow problem as $\mathcal{O}(n^2m\log(nC))$ where $n$ is the number of nodes, $m$ is the number of edges and $C$ is the maximum absolute edge cost.

Change log

  • v0.7.6 (2025-06-30) Add Python v3.13 and Linux ARM support.
  • v0.7.5 fix comment in README on Python version that is supported
  • v0.7.4 compatible with Numpy +v2.1.1. Added Python 3.12 support and dropped Python 3.8 and 3.9 support (due to Numpy). Linux ARM support has been dropped as we use GitHub runners to build the package and ARM machines was being emulated using QEMU. This however was producing numerical errors. GitHub should natively support Ubuntu ARM images soon and then we can start to re-build them.
  • v0.7.3 compatible with Numpy v1.23.0 to 1.26.4

Citations

If you use this software in your research, please use the following citation:

@software{Levy-Kramer_k-means-constrained_2018,
  author = {Levy-Kramer, Josh},
  month = apr,
  title = {{k-means-constrained}},
  url = {https://github.com/joshlk/k-means-constrained},
  year = {2018}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

k_means_constrained-0.7.6-cp313-cp313-win_amd64.whl (351.0 kB view details)

Uploaded CPython 3.13Windows x86-64

k_means_constrained-0.7.6-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

k_means_constrained-0.7.6-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

k_means_constrained-0.7.6-cp313-cp313-macosx_11_0_arm64.whl (365.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

k_means_constrained-0.7.6-cp313-cp313-macosx_10_13_x86_64.whl (403.0 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

k_means_constrained-0.7.6-cp312-cp312-win_amd64.whl (351.2 kB view details)

Uploaded CPython 3.12Windows x86-64

k_means_constrained-0.7.6-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

k_means_constrained-0.7.6-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

k_means_constrained-0.7.6-cp312-cp312-macosx_11_0_arm64.whl (369.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

k_means_constrained-0.7.6-cp312-cp312-macosx_10_13_x86_64.whl (406.0 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

k_means_constrained-0.7.6-cp311-cp311-win_amd64.whl (348.7 kB view details)

Uploaded CPython 3.11Windows x86-64

k_means_constrained-0.7.6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

k_means_constrained-0.7.6-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

k_means_constrained-0.7.6-cp311-cp311-macosx_11_0_arm64.whl (368.0 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

k_means_constrained-0.7.6-cp311-cp311-macosx_10_9_x86_64.whl (406.1 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

k_means_constrained-0.7.6-cp310-cp310-win_amd64.whl (348.3 kB view details)

Uploaded CPython 3.10Windows x86-64

k_means_constrained-0.7.6-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

k_means_constrained-0.7.6-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (2.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

k_means_constrained-0.7.6-cp310-cp310-macosx_11_0_arm64.whl (368.2 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

k_means_constrained-0.7.6-cp310-cp310-macosx_10_9_x86_64.whl (405.9 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

File details

Details for the file k_means_constrained-0.7.6-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 a828bbc81dde339e7ae97934bf9dcb43c5b49884d1d9c8fb0ffa117f5c998233
MD5 bcec394c28d659ae3e376b2860a01097
BLAKE2b-256 11222173f6e47fcfaa726ee5879bb65779408ac8fbb7914fc53d71eed223aaa1

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 876391bb2f1a79ece65d1dba8a76545da09b9ee4cb287f467e861c1043caaf4f
MD5 9520ab6e8541cab359fde6e217194d8e
BLAKE2b-256 2248f5a3ade755968e70c1e87f31042148d9c916c73c1c47333bcdc60e133268

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2e2d9842493bf934eafadb68b689b715e1b2880ba58af05263ee2e11ccb3c6d7
MD5 fa6c998e4b48ba6886d1a502dc31e236
BLAKE2b-256 41dc794e2964e6bdf1d02c1f2418a611c2b58f5ab3f7234e39f64ec76b39d7b9

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 24ae15921827e655cfc13e6c9138e7f6b9be3bc488b2030eb19461de66aff42e
MD5 81530720734dfcd18c31423e43d8744f
BLAKE2b-256 6137ecad03a3a58c7dc3c6afbcd57bdeeaccd03a09e64fc65fdca866aa4a3d12

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 4cdf3de34fb7f24a78af05e3eaec063747258b8f71513faf85b72e9f601bc351
MD5 83482c1bc6edb73b9e12482d51fba837
BLAKE2b-256 53e96863a4ef2377f78c374bec886861c3d14bccafaf6a992a5bc7ed77dfca81

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 599eb9e22e2c99b50dd7d9bf434789de7c0831972c508789be40a25b1796469b
MD5 7fa2d1c7f30f7c552cb5fa4a21628531
BLAKE2b-256 76ee8ddeacfde46e7853336fd47e1b12fddb2be6eff018d19c66107b93adcb8b

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 736d97ff642dfaaf0df4b63632b827ba4b1466794e111970fbfc99e2c52286a2
MD5 d70031517736d6aec8cd00f353d4f622
BLAKE2b-256 fd9c47b45cd0e23603bc47a51c9acdb49746cf5b90748a5b3c3a83808fb16736

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9eb967fbf28b6bd91fdfed3bc543fae4973d8a474a3aa6dabd141236ebb36682
MD5 4042298f9176f7e9bb2c59a40e9c7d71
BLAKE2b-256 dc70c01caaf06e75756cf7e65f000fd4c64c5630621a783881cd9c3856316c9c

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 34e08d5b707905df635ba1145e639269745c56c347157717265c3e48c2a04ba6
MD5 25349345faed5c0896625e35e7fe73f1
BLAKE2b-256 e8986ad4db7b84caafb8481d057493e2a853b72cdcc87bb36860e2898791e5c4

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 05734675db6d5d43888394690640d283a2c1d5529435ecba87e172dffc806189
MD5 f628fd8e5a139bda91fe0fc43c5e6df3
BLAKE2b-256 87f21b29a2e6d72d5a71edc1e8d0f3ce4cb085c1aedfc16176070911febf48c1

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a5783224f20ef4703dba4b21a2d0dc22e3b314a6ed5c064f6767021bd932a93b
MD5 d0e77491bb70303354a01d08b2b5c4b2
BLAKE2b-256 d9bf41b2347e0e270cddc70855cd46a1e9d2c28aa42e66fe7f6792d2e5dcffe4

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 28425e8f70d6035d482ff00a661c69c7a3e400403583916d596306daf3e79199
MD5 7802082ae229255e2005c99b0053504f
BLAKE2b-256 4fb5ca4fb952a62f8ba51c508bca6777df5a59844ffef4042805f3452dc6251d

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 eaaa77929eb6728beb16a2bd755104eee8068d807a1cbfce9a8b206f0cb3b2e2
MD5 da637757213094dae18071686e10d3e8
BLAKE2b-256 e68731e4a135d01f3ed5fa80a20bc0189bad43fa23036756b533a647971cbeb1

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ef667d9191321cef7ef4ef2132c82c017b44e8fb334a91e975167024685b89be
MD5 d53ea22e18d0863381f53fc6320217fb
BLAKE2b-256 b5f5986367e8885f731c4aa17905b6d2d0d73a91c1211adc9d8127e3cc1b8471

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c841ea6629786697bb486ceeac0bec59d016794b468110ca6d521446ba12c995
MD5 f48428c76d4e7685b0e452ac3414201f
BLAKE2b-256 23a43c0475c95aedf573883babdf91a4017541ef2d7998be4d9061b12b272419

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 ad191818fe2c2a216ac1bc8a69790d3654cffaa88797e2f48c67f3f5c5315b39
MD5 799ebef9cd3eb1692275de945d54dfdf
BLAKE2b-256 3d9ff21ffae3d5e221d3944a50047919a00e2639bccaededab0051bb0a6f6c30

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5fb57b1e6c4c2bdc2f80b9c993f97345bfa8987269da4ca50e860acd93f18126
MD5 68fd25a6544f3621e13102f0f34b5288
BLAKE2b-256 97bd94e80a384deb8d91a99cffee8b9f93ca7411b72ba288ec7a00903ac2fa69

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3629dfe2fa0629cc1ab8864f84e1b19651ab272e4016af0b95791215d0573692
MD5 86a357ef29ce2eb646f0a073fc5020b8
BLAKE2b-256 ee680e64cb119952ad98c27edf1e392127e00dfb69ce7185794230d66be0a0a6

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6fe7b59e45b8ef21720e16a4b75c69e242c5b447bc4ce230c0c33b9caf91f87b
MD5 6c57f8a2cd55f8ad47b2b24717ef1b60
BLAKE2b-256 93932daec766eedb537b564c11208377459bfa589a229c0a1dd83a4199b1ca31

See more details on using hashes here.

File details

Details for the file k_means_constrained-0.7.6-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for k_means_constrained-0.7.6-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fe6dc2acfeafda2e7bdb0c3d290dbb0b4b57078ad47990a1ba783a26ed374286
MD5 7a0a445a0a564241648dca755eda0dd7
BLAKE2b-256 0dfc28296143be8eb33940f649888efc6daa2a7c9d9059f89ae89b443a7d366b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page