Skip to main content

A constrained KMeans algorithm.

Project description

Constrained KMeans

Modified version of KMeans algorithm that takes into account partial information about the data.

Given a partial list of known labels init_labels, Constrained KMeans finds a cluster configuration that complies with init_labels. init_labels is the same length as x.shape[0], which is why a second array can_change masks out which labels should be marked as known and which labels can change. Formally, the output of the algorithm is an array labels such that np.all((labels[can_change == 0] == init_labels[can_change == 0])) is True.

Can be installed via (requires Python>=3.7)

pip install ConstrainedKMeans

Example basic usage:

import numpy as np
from matplotlib import pyplot as plt

from ConstrainedKMeans import ConstrainedKMeans as CKM

def run_test(n_points):
    ckm = CKM(n_clusters=10)

    # Generate random dataset
    # For visualization purposes, initialize 2d data
    x = np.random.random((n_points, 2))
    # Generate random labels
    init_labels = np.random.randint(0, 10, n_points)
    # Generate 0s with probability 0.2
    # these shall mask the "known" labels
    can_change = np.random.binomial(2, 0.7, n_points)

    labels = ckm.fit_predict(x, can_change, init_labels)

    plt.scatter(x[:, 0], x[:, 1], c=labels)
    plt.show()

if __name__ == '__main__':
    run_test(1000)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ConstrainedKMeans-1.2.tar.gz (225.3 kB view hashes)

Uploaded Source

Built Distribution

ConstrainedKMeans-1.2-py3-none-any.whl (6.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page