This is just a copy of the original repo https://github.com/Behrouz-Babaki/COP-Kmeans
Project description
COP-Kmeans
This is an implementations of the Constrained K-means algorithm, introduced by Wagstaff et al. This implementation is developed according to the description of algorithm as presented in [1].
The COP-Kmeans algorithm
This is the COP-Kmeans algorithm, as described in [1]:
Usage
usage: run_ckm.py [-h] [--ofile OFILE] [--n_rep N_REP] [--m_iter M_ITER] [--tol TOL] dfile cfile k
Run COP-Kmeans algorithm
positional arguments:
dfile data file
cfile constraint file
k number of clusters
optional arguments:
-h, --help show this help message and exit
--ofile OFILE file to store the output
--n_rep N_REP number of times to repeat the algorithm
--m_iter M_ITER maximum number of iterations of the main loop
--tol TOL tolerance for deciding on convergence
To see a run of the algorithm on example data and constraints, run the script runner.sh
in the examples
directory.
Package install
Run this command to install this package.
% python setup.py install
Here is simple example to call this module
import numpy
from copkmeans.cop_kmeans import cop_kmeans
input_matrix = numpy.random.rand(100, 500)
must_link = [(0, 10), (0, 20), (0, 30)]
cannot_link = [(1, 10), (2, 10), (3, 10)]
clusters, centers = cop_kmeans(dataset=input_matrix, k=5, ml=must_link,cl=cannot_link)
In the variable, clusters
, you could see list of integer which has cluster number according to index of given data.
Citing
If you want to cite this implementation, you can use the following bibtex entry (other formats are also available):
@misc{behrouz_babaki_2017_831850,
author = {Behrouz Babaki},
title = {COP-Kmeans version 1.5},
month = jul,
year = 2017,
doi = {10.5281/zenodo.831850},
url = {https://doi.org/10.5281/zenodo.831850}
}
There's more ...
Other implementations
- Mateusz Zawiślak has a java implementation of the COP-Kmeans algorithm.
- The R package conclust contains an implementation of COP-Kmeans, among a number of other constrained clustering algorithms.
Other types of constraints
There is another version of constrained Kmeans that handles size constraints [2]. A python implementation of the algorithm (and its extensions) is available here.
Exact algorithms for constrained clustering
In 2013-14, I was working on developing an integer linear programming formulation for an instance of the constrained clustering problem. The approach that I chose was Branch-and-Price (also referred to as column-generation). In the initialization step of my algorithm, I needed another algorithm that can produce solutions of reasonably good quality very quickly. The algorithm COP-Kmeans turned out to be exactly what I was looking for. Interested in knowing more about my own work? Go to my homepage, from where you can access my paper [3] and the corresponding code.
There is also a body of work on using constraint programming for exact constrained clustering. In particular, [4] is the state-of-the art in exact constrained clustering.
References
-
Wagstaff, K., Cardie, C., Rogers, S., & Schrödl, S. (2001, June). Constrained k-means clustering with background knowledge. In ICML (Vol. 1, pp. 577-584).
-
Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering." Microsoft Research, Redmond (2000): 1-8.
-
Babaki, B., Guns, T., & Nijssen, S. (2014). Constrained clustering using column generation. In Integration of AI and OR Techniques in Constraint Programming (pp. 438-454). Springer International Publishing.
-
Guns, Tias, Christel Vrain, and Khanh-Chuong Duong. "Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering." 22nd European Conference on Artificial Intelligence. 2016.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file htcopkmeans-1.5.tar.gz
.
File metadata
- Download URL: htcopkmeans-1.5.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e6bee40629111686b30501debdec31ccb83b9d904c5c53b71b6472eda11088e |
|
MD5 | ad7dc4fcc80de17b071bc77ebe1e43d3 |
|
BLAKE2b-256 | 2497bf2e39847745781958a3cb14157ec09978408e3ad25e16d39f726edbf401 |
File details
Details for the file htcopkmeans-1.5-py3-none-any.whl
.
File metadata
- Download URL: htcopkmeans-1.5-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c607a9de51336602aed00c2027df8ffcdd835ecc0d9d5b0dc69cebe2da626b5 |
|
MD5 | 3079199d8986e63c9d3b8f3f19e90647 |
|
BLAKE2b-256 | 2713d213bcc012dbfcbdfdcfa35361a9e878c3516cfe8b44c4eb09aa281a9e78 |