Fair K-Means produces a fair clustering assignment according to the fairness definition of Chierichetti et al. Each point has a binary color, and the goal is to assign the points to clusters such that the number of points with different colors in each cluster is the same and the cost of the clusters is minimized.
Project description
Fair K-Means
Fair K-Means produces a fair clustering assignment according to the fairness definition of Chierichetti et al. [1]. Each point has a binary color assigned to it. The goal is to assign the points to clusters such that the number of points with different colors in each cluster is the same. The algorithm also works with weights, so each point can participate with a different weight in the coloring.
The algorithm works as follows, assuming that the binary colors are red and blue:
- A matching between the red and blue points is computed such that the cost (the point distances) of the matching is minimized.
- The mean of each matched pair is computed.
- A K-Means++ clustering of all the means is computed, and the point pairs are assigned to the clusters of their means.
The matching between the red and blue points is computed using the Lemon C++ Library. The library is included in the package and does not need to be installed separately. Only the needed files were included, and a complete version of the library can be found here. A copyright notice is included here.
You can try Fair K-Means out on our Clustering Toolkit!
References
[1] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii, Fair clustering through fairlets, Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS), 2017, pp. 5036–5044.
Installation
pip install fair-kmeans
Example
from fair_kmeans import FairKMeans
example_data = [
[1.0, 1.0, 1.0],
[1.1, 1.1, 1.1],
[1.2, 1.2, 1.2],
[2.0, 2.0, 2.0],
[2.1, 2.1, 2.1],
[2.2, 2.2, 2.2],
]
example_colors = [1, 1, 1, 0, 0, 0]
km = FairKMeans(n_clusters=2, random_state=0)
km.fit(example_data, color=example_colors)
labels = km.labels_
centers = km.cluster_centers_
print(labels) # [1, 0, 0, 1, 0, 0]
print(centers) # [[1.65, 1.65, 1.65], [1.5, 1.5, 1.5]]
Example with Weights
from fair_kmeans import FairKMeans
example_data = [
[1.0, 1.0, 1.0],
[1.1, 1.1, 1.1],
[1.2, 1.2, 1.2],
[2.0, 2.0, 2.0],
[2.1, 2.1, 2.1],
[2.2, 2.2, 2.2],
]
example_colors = [1, 1, 1, 0, 0, 0]
example_weights = [2, 2, 1, 1, 1, 3]
km = FairKMeans(n_clusters=2, random_state=0)
km.fit(example_data, color=example_colors, sample_weight=example_weights)
labels = km.labels_
centers = km.cluster_centers_
print(labels) # [1, 1, 0, 1, 1, 0]
print(centers) # [[0.85, 0.85, 0.85], [1.28, 1.28, 1.28]]
Development
Install poetry
curl -sSL https://install.python-poetry.org | python3 -
Install clang
sudo apt-get install clang
Set clang variables
export CXX=/usr/bin/clang++
export CC=/usr/bin/clang
Install the package
poetry install
If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace
poetry build
Run the tests
poetry run python -m unittest discover tests -v
Citation
If you use this code, please cite the following paper:
M. Schmidt, C. Schwiegelshohn, and C. Sohler, "Fair Coresets and Streaming Algorithms for Fair k-means," in Lecture notes in computer science, 2020, pp. 232–251. doi: 10.1007/978-3-030-39479-0_16.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fair_kmeans-0.1.3.tar.gz.
File metadata
- Download URL: fair_kmeans-0.1.3.tar.gz
- Upload date:
- Size: 79.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49a6a7663b688d461d38735df39fadb1ddf652c74d6b55be9a606814830841a8
|
|
| MD5 |
f7821d302fad87d974448b6e631c5a75
|
|
| BLAKE2b-256 |
9fb798b5c7580158f8b2fc2a0c8615d560274ac534498c5967abcea9794ba48a
|
File details
Details for the file fair_kmeans-0.1.3-cp310-cp310-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: fair_kmeans-0.1.3-cp310-cp310-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 605.7 kB
- Tags: CPython 3.10, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6551a23179618d5b867764c770a6971e913d186dbf7aa9ced40c3b3248d0d1ab
|
|
| MD5 |
428ac134b3f6d526e133b72e54207a3b
|
|
| BLAKE2b-256 |
c04a1008b256f5dfa2f84803ff752ca3bbef28a6b24f1970d9f1d24d96c9f0d5
|