Skip to main content

Size Constrained Clustering solver

Project description

Size Constrained Clustering Solver

Build Status PyPI version GitHub codecov PyPI - Downloads

Implementation of Size Constrained Clustering. Size constrained clustering can be treated as an optimization problem. Details could be found in a set of reference paper.

Installation

Requirement Python >= 3.6, Numpy >= 1.13, Cython >= 0.29

  • install from PyPI
pip install size-constrained-clustering

Methods

  • Fuzzy C-means Algorithm: Similar to KMeans, but use membership probability, not 0 or 1
  • Same Size Contrained KMeans Heuristics: Use Heuristics methods to reach same size clustering
  • Same Size Contrained KMeans Inspired by Minimum Cost Flow Problem
  • Minimum and Maximum Size Constrained KMeans Inspired by Minimum Cost Flow Problem
  • Deterministic Annealling Algorithm: Input target cluster distribution, return correspondent clusters
  • Shrinkage Clustering: base algorithm and minimum size constraints

Usage:

# setup
from size_constrained_clustering import fcm, equal, minmax, shrinkage
# by default it is euclidean distance, but can select others
from sklearn.metrics.pairwise import haversine_distances
import numpy as np

Fuzzy C-means

n_samples = 2000
n_clusters = 4
centers = [(-5, -5), (0, 0), (5, 5), (7, 10)]
X, _ = make_blobs(n_samples=n_samples, n_features=2, cluster_std=1.0,
                    centers=centers, shuffle=False, random_state=42)
model = fcm.FCM(n_clusters)
# use other distance function: e.g. haversine distance
# model = fcm.FCM(n_clusters, distance_func=haversine_distances)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_

alt text

Equal Size Constraint

n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)
# use minimum cost flow framework to solve
model = equal.SameSizeKMeansMinCostFlow(n_clusters)
# use heuristics method to solve
model = equal.SameSizeKMeansHeuristics(n_clusters)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_

alt text

Cluster size: 667, 667 and 666 in the figure above.

Minimum and Maximum Size Constraint

n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)
model = minmax.MinMaxKMeansMinCostFlow(n_clusters, size_min=400,   size_max=800)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_

alt text

Cluster size: 753, 645 and 602 in the figure above.

Deterministic Annealing

n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)
# distribution is the distribution of cluster sizes
model = da.DeterministicAnnealing(n_clusters, distribution=[0.1, 0.6, 0.3])
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_

alt text

Cluster size: 1200, 600 and 200 in the figure above, corresponding to distribution [0.6, 0.3, 0.1]

Shrinkage Clustering

The result might be not available.

n_samples = 1000
n_clusters = 4
centers = [(-5, -5), (0, 0), (5, 5), (7, 10)]
X, _ = make_blobs(n_samples=n_samples, n_features=2, cluster_std=1.0, centers=centers, shuffle=False, random_state=42)

model = shrinkage.Shrinkage(n_clusters, size_min=100)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_

alt text

Copyright

Copyright (c) 2020 Jing Wang. Released under the MIT License.

Third-party copyright in this distribution is noted where applicable.

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

size_constrained_clustering-0.1.2.tar.gz (51.5 kB view details)

Uploaded Source

Built Distribution

size_constrained_clustering-0.1.2-cp36-cp36m-macosx_10_7_x86_64.whl (433.6 kB view details)

Uploaded CPython 3.6m macOS 10.7+ x86-64

File details

Details for the file size_constrained_clustering-0.1.2.tar.gz.

File metadata

  • Download URL: size_constrained_clustering-0.1.2.tar.gz
  • Upload date:
  • Size: 51.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.6

File hashes

Hashes for size_constrained_clustering-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e952df49d3c746df1fa427b3d753d8c81eb825b26fe37dd540019c12aef13d36
MD5 d0cede0a2382a4cf3e7ec24331aa3567
BLAKE2b-256 922e242ce69829c4713c88f7be0562ddf2bbf0accf3340040472cea6ca6fcf68

See more details on using hashes here.

File details

Details for the file size_constrained_clustering-0.1.2-cp36-cp36m-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for size_constrained_clustering-0.1.2-cp36-cp36m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 babd1a14332464f502b0785614edc700c6d39fed7f4a27a19ea090130c95cd1a
MD5 74448f09832fdcea13321b41084ccdbc
BLAKE2b-256 f919f4f1dab7ec5f0e882dfc22e146063a85a3981960af6fdb99a4ac52a3fc93

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page