Size Constrained Clustering solver
Project description
Size Constrained Clustering Solver
Implementation of Size Constrained Clustering. Size constrained clustering can be treated as an optimization problem. Details could be found in a set of reference paper.
Installation
Requirement Python >= 3.6, Numpy >= 1.13, Cython >= 0.29
- install from PyPI
pip install size-constrained-clustering
Methods
- Fuzzy C-means Algorithm: Similar to KMeans, but use membership probability, not 0 or 1
- Same Size Contrained KMeans Heuristics: Use Heuristics methods to reach same size clustering
- Same Size Contrained KMeans Inspired by Minimum Cost Flow Problem
- Minimum and Maximum Size Constrained KMeans Inspired by Minimum Cost Flow Problem
- Deterministic Annealling Algorithm: Input target cluster distribution, return correspondent clusters
- Shrinkage Clustering: base algorithm and minimum size constraints
Usage:
# setup
from size_constrained_clustering import fcm, equal, minmax, shrinkage
# by default it is euclidean distance, but can select others
from sklearn.metrics.pairwise import haversine_distances
import numpy as np
Fuzzy C-means
n_samples = 2000
n_clusters = 4
centers = [(-5, -5), (0, 0), (5, 5), (7, 10)]
X, _ = make_blobs(n_samples=n_samples, n_features=2, cluster_std=1.0,
centers=centers, shuffle=False, random_state=42)
model = fcm.FCM(n_clusters)
# use other distance function: e.g. haversine distance
# model = fcm.FCM(n_clusters, distance_func=haversine_distances)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
Equal Size Constraint
n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)
# use minimum cost flow framework to solve
model = equal.SameSizeKMeansMinCostFlow(n_clusters)
# use heuristics method to solve
model = equal.SameSizeKMeansHeuristics(n_clusters)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
Cluster size: 667, 667 and 666 in the figure above.
Minimum and Maximum Size Constraint
n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)
model = minmax.MinMaxKMeansMinCostFlow(n_clusters, size_min=400, size_max=800)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
Cluster size: 753, 645 and 602 in the figure above.
Deterministic Annealing
n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)
# distribution is the distribution of cluster sizes
model = da.DeterministicAnnealing(n_clusters, distribution=[0.1, 0.6, 0.3])
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
Cluster size: 1200, 600 and 200 in the figure above, corresponding to distribution [0.6, 0.3, 0.1]
Shrinkage Clustering
The result might be not available.
n_samples = 1000
n_clusters = 4
centers = [(-5, -5), (0, 0), (5, 5), (7, 10)]
X, _ = make_blobs(n_samples=n_samples, n_features=2, cluster_std=1.0, centers=centers, shuffle=False, random_state=42)
model = shrinkage.Shrinkage(n_clusters, size_min=100)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
Copyright
Copyright (c) 2020 Jing Wang. Released under the MIT License.
Third-party copyright in this distribution is noted where applicable.
Reference
- Clustering with Capacity and Size Constraints: A Deterministic Approach
- Deterministic Annealing, Clustering and Optimization
- Deterministic Annealing, Constrained Clustering, and Opthiieation
- Shrinkage Clustering
- Clustering with size constraints
- Data Clustering with Cluster Size Constraints Using a Modified k-means Algorithm
- KMeans Constrained Clustering Inspired by Minimum Cost Flow Problem
- Same Size Kmeans Heuristics Methods
- Google's Operations Research tools's
SimpleMinCostFlow
- Cluster KMeans Constrained
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file size_constrained_clustering-0.1.2.tar.gz
.
File metadata
- Download URL: size_constrained_clustering-0.1.2.tar.gz
- Upload date:
- Size: 51.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e952df49d3c746df1fa427b3d753d8c81eb825b26fe37dd540019c12aef13d36 |
|
MD5 | d0cede0a2382a4cf3e7ec24331aa3567 |
|
BLAKE2b-256 | 922e242ce69829c4713c88f7be0562ddf2bbf0accf3340040472cea6ca6fcf68 |
File details
Details for the file size_constrained_clustering-0.1.2-cp36-cp36m-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: size_constrained_clustering-0.1.2-cp36-cp36m-macosx_10_7_x86_64.whl
- Upload date:
- Size: 433.6 kB
- Tags: CPython 3.6m, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | babd1a14332464f502b0785614edc700c6d39fed7f4a27a19ea090130c95cd1a |
|
MD5 | 74448f09832fdcea13321b41084ccdbc |
|
BLAKE2b-256 | f919f4f1dab7ec5f0e882dfc22e146063a85a3981960af6fdb99a4ac52a3fc93 |