A library for clusterlet induction, that is, sets of fair clusters.
Project description
Clusfairing
Clusfairing is a Python library collecting algorithms for fair clustering, mainly aimed at clusterlet-based approaches. A clusterlet defines (like fair coresets and fairlets) a clustering of data which respects some notion of fairness or balance. Under some assumptions, (centroids of) clusterlets can be clustered themselves achieve a fair clustering where each cluster approximately follows the original label distribution.
Quickstart
Installation
mkvirtualenv -p python3.12 clusfairing
pip install -r requirements.txt
Getting started
import numpy
from clusterlets.extractors import RandomExtractor
# generate some data
data = numpy.random.rand(1000, 5)
labels = numpy.random.choice([0, 1], 1000, replace=True)
# creates a clusterlet extractor
extractor = RandomExtractor(random_state=42)
# extracts clusterlets, assigning a clusterlet to each data point
extracted_clusterlets = extractor.extract(data, labels, size_per_label="auto")
# you can access the data of each clusterlet!
for clusterlet in extracted_clusterlets:
print(data[clusterlet.index])
Clusterlets. The Clusterlet class implements a clusterlet, which is defined by
_id: intan id to identify itlabel_frequecies: Optional[numpy.ndarray]label frequencies associated to idcentroid: Optional[numpy.ndarray]a centroidindex: Optional[numpy.ndarray]indicating which instances of the starting data compose this clusterlet. Used in place of the data itself for a lighter object
Since clusterlets are extracted from a dataset, data[clusterlet.index] yields the data of the clusterlet.
Clusterlets support == and hash, thus can be aggregated into set[Clusterlet], and used as dictionary keys.
Extractors. Clusterlets are extracted with a set of extractors implementing the ClusterletExtractor interface
(extractors.*),
which extracts clusterlets through the extract(data, labels) method.
RandomExtractorselects random subsets of each label, then pairs them to satisfy dataset balance. One can also specify how many samples per label each clusterlet must have with a parameter dictionarysize_per_labelKMeansExtractorclusters each label separately (through K-Means), creating label-specific clusterlets. Then, matches clusterlets to achieve both clustering and balance.
The KMeansExtractor is an implementation of the ClusteringExtractor interface, which can be adapted to any
clustering algorithm by overriding the cluster(data) method.
Matchers. Matchers (extractors.matches.*) are objects which "match" existing clusterlets, creating larger ones,
i.e., they cluster clusterlets.
A Matcher implements a match(clusterlets, **kwargs) method, which is given a list of clusterings (one per
label), and a desired label balance to achieve.
Currently, we implement:
PinballMatcher, which provides matches by hoppinghopstimes through two sets of clusterlets of different labels, each hop following the clusterlet of opposite label at minimum distance.GreedyPinballMatcher, which greedily matches clusterlets maximizing some given objective:GreedyBalanceMatchermaximizes label balanceGreedyDPbMatchermaximizes clusterlet distance
CentroidMatcher, which creates a set of candidate partitions of the set of clusterlets, then scores them for balance and compactness. Note: only a subsample of sizesample_sizeis tested due to the superexponential number of possible partitions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clusterlets-0.0.1.tar.gz.
File metadata
- Download URL: clusterlets-0.0.1.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53308a2cd2a3053565bd7d8a0e21d92fd34faee81ddfff253379e6d94c661715
|
|
| MD5 |
ebc4d2bdb2e092b872f01e2a19f5b0fa
|
|
| BLAKE2b-256 |
771e4b09f928d3006df77282af85a0519575516019bda1f840420fb44223b928
|
File details
Details for the file clusterlets-0.0.1-py3-none-any.whl.
File metadata
- Download URL: clusterlets-0.0.1-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3025d2a00524e07589debfe47df2c9b0c723284a0a87768208008b3997bf9f72
|
|
| MD5 |
f5a1495b98d0aa933d7f2b48511c76c0
|
|
| BLAKE2b-256 |
688a047da97b3069feba6c7ca9550f922fe634e08e434eb932ed544e754fd633
|