Skip to main content

Fair K-Means produces a fair clustering assignment according to the fairness definition of Chierichetti et al. Each point has a binary color, and the goal is to assign the points to clusters such that the number of points with different colors in each cluster is the same and the cost of the clusters is minimized.

Project description

Build Status License: MIT Supported Python version Stable Version

Fair K-Means

Fair K-Means produces a fair clustering assignment according to the fairness definition of Chierichetti et al. [1]. Each point has a binary color assigned to it. The goal is to assign the points to clusters such that the number of points with different colors in each cluster is the same. The algorithm also works with weights, so each point can participate with a different weight in the coloring.

The algorithm works as follows, assuming that the binary colors are red and blue:

  1. A matching between the red and blue points is computed such that the cost (the point distances) of the matching is minimized.
  2. The mean of each matched pair is computed.
  3. A K-Means++ clustering of all the means is computed, and the point pairs are assigned to the clusters of their means.

The matching between the red and blue points is computed using the Lemon C++ Library. The library is included in the package and does not need to be installed separately. Only the needed files were included, and a complete version of the library can be found here. A copyright notice is included here.

You can try Fair K-Means out on our Clustering Toolkit!

References

[1] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii, Fair clustering through fairlets, Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS), 2017, pp. 5036–5044.

Installation

pip install fair-kmeans

Example

from fair_kmeans import FairKMeans

example_data = [
    [1.0, 1.0, 1.0],
    [1.1, 1.1, 1.1],
    [1.2, 1.2, 1.2],
    [2.0, 2.0, 2.0],
    [2.1, 2.1, 2.1],
    [2.2, 2.2, 2.2],
]

example_colors = [1, 1, 1, 0, 0, 0]

km = FairKMeans(n_clusters=2, random_state=0)
km.fit(example_data, color=example_colors)
labels = km.labels_
centers = km.cluster_centers_

print(labels) # [1, 0, 0, 1, 0, 0]
print(centers) # [[1.65, 1.65, 1.65], [1.5, 1.5, 1.5]]

Example with Weights

from fair_kmeans import FairKMeans

example_data = [
    [1.0, 1.0, 1.0],
    [1.1, 1.1, 1.1],
    [1.2, 1.2, 1.2],
    [2.0, 2.0, 2.0],
    [2.1, 2.1, 2.1],
    [2.2, 2.2, 2.2],
]

example_colors = [1, 1, 1, 0, 0, 0]
example_weights = [2, 2, 1, 1, 1, 3]

km = FairKMeans(n_clusters=2, random_state=0)
km.fit(example_data, color=example_colors, sample_weight=example_weights)
labels = km.labels_
centers = km.cluster_centers_

print(labels) # [1, 1, 0, 1, 1, 0]
print(centers) # [[0.85, 0.85, 0.85], [1.28, 1.28, 1.28]]

Development

Install poetry

curl -sSL https://install.python-poetry.org | python3 -

Install clang

sudo apt-get install clang

Set clang variables

export CXX=/usr/bin/clang++
export CC=/usr/bin/clang

Install the package

poetry install

If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace

poetry build

Run the tests

poetry run python -m unittest discover tests -v

Citation

If you use this code, please cite the following paper:

M. Schmidt, C. Schwiegelshohn, and C. Sohler, "Fair Coresets and Streaming Algorithms for Fair k-means," in Lecture notes in computer science, 2020, pp. 232–251. doi: 10.1007/978-3-030-39479-0_16.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fair_kmeans-0.1.2.tar.gz (80.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fair_kmeans-0.1.2-cp310-cp310-manylinux_2_39_x86_64.whl (605.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

File details

Details for the file fair_kmeans-0.1.2.tar.gz.

File metadata

  • Download URL: fair_kmeans-0.1.2.tar.gz
  • Upload date:
  • Size: 80.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.16 Linux/6.8.0-1017-azure

File hashes

Hashes for fair_kmeans-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4cfaecb859419eeba28408c576752ee50e6bdef6066a17f76b1b72d3f5e6d909
MD5 ee5b30874d30874ad2b3474225734f36
BLAKE2b-256 2e437824b053a8b2101d58cdfd1ea95a839d514f2b37a9fc6ccf99710a2e92cc

See more details on using hashes here.

File details

Details for the file fair_kmeans-0.1.2-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for fair_kmeans-0.1.2-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 92865b2ee5994a201b18d7ddf176d779b4a55b1ff7bd6129e71e7a11f81ae1ea
MD5 00b7445279e43cbd92b44858f62fa5ab
BLAKE2b-256 53f6c094337659a2124103963ec06fabcfc947036d8f42ccec01f1b2296650d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page