Size Constrained Clustering solver
Project description
Size Constrained Clustering Solver
Implementation of Size Constrained Clustering. Size constrained clustering can be treated as an optimization problem. Details could be found in a set of reference paper.
Installation
Requirement Python >= 3.6, Numpy >= 1.13, Cython >= 0.29
- install from PyPI
pip install size-constrained-clustering
Methods
- Fuzzy C-means Algorithm: Similar to KMeans, but use membership probability, not 0 or 1
- Same Size Contrained KMeans Heuristics: Use Heuristics methods to reach same size clustering
- Same Size Contrained KMeans Inspired by Minimum Cost Flow Problem
- Minimum and Maximum Size Constrained KMeans Inspired by Minimum Cost Flow Problem
- Deterministic Annealling Algorithm: Input target cluster distribution, return correspondent clusters
- Shrinkage Clustering: base algorithm and minimum size constraints
Usage:
# setup
from size_constrained_clustering import fcm, equal, minmax, shrinkage
# 默认都是欧式距离计算,可接受其它distance函数,比如haversine
from sklearn.metrics.pairwise import haversine_distances
Fuzzy C-means
n_samples = 2000
n_clusters = 4
centers = [(-5, -5), (0, 0), (5, 5), (7, 10)]
X, _ = make_blobs(n_samples=n_samples, n_features=2, cluster_std=1.0,
centers=centers, shuffle=False, random_state=42)
model = fcm.FCM(n_clusters)
# use other distance function: e.g. haversine distance
# model = fcm.FCM(n_clusters, distance_func=haversine_distances)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
![alt text][https://github.com/jingw2/size_constrained_clustering/tree/master/pic/fcm.png]
等大聚类
n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)
# 使用minmax flow方式求解
model = equal.SameSizeKMeansMinCostFlow(n_clusters)
# 使用heuristics方法求解
model = equal.SameSizeKMeansHeuristics(n_clusters)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
![alt text][https://github.com/jingw2/size_constrained_clustering/tree/master/pic/equal.png]
图中共2000个正态分布的点,聚成3类,分别有667,667和666个点。
最小和最大规模限制
n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)
model = minmax.MinMaxKMeansMinCostFlow(n_clusters, size_min=400, size_max=800)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
![alt text][https://github.com/jingw2/size_constrained_clustering/tree/master/pic/minmax.png]
获取结果聚类size分别为753, 645, 602。
Deterministic Annealing
n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)
# distribution 表明各cluster目标的比例
model = da.DeterministicAnnealing(n_clusters, distribution=[0.1, 0.6, 0.3])
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
![alt text][https://github.com/jingw2/size_constrained_clustering/tree/master/pic/da.png]
获取的结果cluster size分别为:1200,600和200。对应比例为0.6, 0.3和0.1。
Shrinkage Clustering
只能保证收敛到局部最优,且获取的结果不一定可用。
n_samples = 1000
n_clusters = 4
centers = [(-5, -5), (0, 0), (5, 5), (7, 10)]
X, _ = make_blobs(n_samples=n_samples, n_features=2, cluster_std=1.0, centers=centers, shuffle=False, random_state=42)
model = shrinkage.Shrinkage(n_clusters, size_min=100)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_
![alt text][https://github.com/jingw2/size_constrained_clustering/tree/master/pic/shrinkage.png]
Copyright
Copyright (c) 2020 Jing Wang. Released under the MIT License.
Third-party copyright in this distribution is noted where applicable.
Reference
- Clustering with Capacity and Size Constraints: A Deterministic Approach
- Deterministic Annealing, Clustering and Optimization
- Deterministic Annealing, Constrained Clustering, and Opthiieation
- Shrinkage Clustering
- Clustering with size constraints
- Data Clustering with Cluster Size Constraints Using a Modified k-means Algorithm
- KMeans Constrained Clustering Inspired by Minimum Cost Flow Problem
- Same Size Kmeans Heuristics Methods
- Google's Operations Research tools's
SimpleMinCostFlow
- Cluster KMeans Constrained
TO DO
- Size constraint API
- Examples to show
- Readme Modification, badges, travis CI
- Upload PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for size_constrained_clustering-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27acf88ea48ed5b8690f1f0869ecfb28da805bdd86cb3d9b37a9eceaa9b9908a |
|
MD5 | 93f8d779bf08e4a51cf134fed17e6a48 |
|
BLAKE2b-256 | dea2eb43a58139986f3280b378125b850a33d663021426f202027720fd080e6a |
Hashes for size_constrained_clustering-0.1.1-cp36-cp36m-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91db7b0b398b911bfc129850d8064489ea8f582707ee5b503931beb4d486fe57 |
|
MD5 | dc7c460d4635e471b6ca40ba8536b3c4 |
|
BLAKE2b-256 | 7e5ce2e167deb90332edd05c97e0c3c53b020488782aa4e2611b1bb2aa9b024f |