Skip to main content

This is just a copy of the original repo https://github.com/Behrouz-Babaki/COP-Kmeans

Project description

COP-Kmeans

DOI

This is an implementations of the Constrained K-means algorithm, introduced by Wagstaff et al. This implementation is developed according to the description of algorithm as presented in [1].

The COP-Kmeans algorithm

This is the COP-Kmeans algorithm, as described in [1]:

Usage

usage: run_ckm.py [-h] [--ofile OFILE] [--n_rep N_REP] [--m_iter M_ITER] [--tol TOL] dfile cfile k

Run COP-Kmeans algorithm

positional arguments:
  dfile            data file
  cfile            constraint file
  k                number of clusters

optional arguments:
  -h, --help       show this help message and exit
  --ofile OFILE    file to store the output
  --n_rep N_REP    number of times to repeat the algorithm
  --m_iter M_ITER  maximum number of iterations of the main loop
  --tol TOL        tolerance for deciding on convergence

To see a run of the algorithm on example data and constraints, run the script runner.sh in the examples directory.

Package install

Run this command to install this package.

% python setup.py install

Here is simple example to call this module

import numpy
from copkmeans.cop_kmeans import cop_kmeans
input_matrix = numpy.random.rand(100, 500)
must_link = [(0, 10), (0, 20), (0, 30)]
cannot_link = [(1, 10), (2, 10), (3, 10)]
clusters, centers = cop_kmeans(dataset=input_matrix, k=5, ml=must_link,cl=cannot_link)

In the variable, clusters, you could see list of integer which has cluster number according to index of given data.

Citing

If you want to cite this implementation, you can use the following bibtex entry (other formats are also available):

@misc{behrouz_babaki_2017_831850,
  author       = {Behrouz Babaki},
  title        = {COP-Kmeans version 1.5},
  month        = jul,
  year         = 2017,
  doi          = {10.5281/zenodo.831850},
  url          = {https://doi.org/10.5281/zenodo.831850}
}

There's more ...

Other implementations

Other types of constraints

There is another version of constrained Kmeans that handles size constraints [2]. A python implementation of the algorithm (and its extensions) is available here.

Exact algorithms for constrained clustering

In 2013-14, I was working on developing an integer linear programming formulation for an instance of the constrained clustering problem. The approach that I chose was Branch-and-Price (also referred to as column-generation). In the initialization step of my algorithm, I needed another algorithm that can produce solutions of reasonably good quality very quickly. The algorithm COP-Kmeans turned out to be exactly what I was looking for. Interested in knowing more about my own work? Go to my homepage, from where you can access my paper [3] and the corresponding code.

There is also a body of work on using constraint programming for exact constrained clustering. In particular, [4] is the state-of-the art in exact constrained clustering.

References

  1. Wagstaff, K., Cardie, C., Rogers, S., & Schrödl, S. (2001, June). Constrained k-means clustering with background knowledge. In ICML (Vol. 1, pp. 577-584).

  2. Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering." Microsoft Research, Redmond (2000): 1-8.

  3. Babaki, B., Guns, T., & Nijssen, S. (2014). Constrained clustering using column generation. In Integration of AI and OR Techniques in Constraint Programming (pp. 438-454). Springer International Publishing.

  4. Guns, Tias, Christel Vrain, and Khanh-Chuong Duong. "Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering." 22nd European Conference on Artificial Intelligence. 2016.

Project details


Release history Release notifications | RSS feed

This version

1.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htcopkmeans-1.5.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

htcopkmeans-1.5-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file htcopkmeans-1.5.tar.gz.

File metadata

  • Download URL: htcopkmeans-1.5.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for htcopkmeans-1.5.tar.gz
Algorithm Hash digest
SHA256 9e6bee40629111686b30501debdec31ccb83b9d904c5c53b71b6472eda11088e
MD5 ad7dc4fcc80de17b071bc77ebe1e43d3
BLAKE2b-256 2497bf2e39847745781958a3cb14157ec09978408e3ad25e16d39f726edbf401

See more details on using hashes here.

File details

Details for the file htcopkmeans-1.5-py3-none-any.whl.

File metadata

  • Download URL: htcopkmeans-1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for htcopkmeans-1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2c607a9de51336602aed00c2027df8ffcdd835ecc0d9d5b0dc69cebe2da626b5
MD5 3079199d8986e63c9d3b8f3f19e90647
BLAKE2b-256 2713d213bcc012dbfcbdfdcfa35361a9e878c3516cfe8b44c4eb09aa281a9e78

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page