A toolkit for developing group-aware ML methods
Project description
Installation
This library requires at least Python 3.12. Install it from pypi:
pip install fair-forge
or from GitHub:
pip install git+https://github.com/wearepal/fair-forge.git
If you want to use the neural-network-based methods, you need to add the nn extras:
pip install 'fair-forge[nn]'
or
pip install 'fair-forge[nn] @ git+https://github.com/wearepal/fair-forge.git'
Usage
fair-forge provides two main components: metrics and methods. Besides these, there are various utility functions to help with common tasks and also a few example datasets.
The core data type used in forge-fair is numpy arrays: all the methods and metrics expect numpy arrays as input. If you have data in a different form, it is usually easy to convert it to numpy arrays:
- Pandas: to_numpy()
- Polars: to_numpy()
- PyTorch: numpy()
- TensorFlow: make_ndarray()
Metrics
There are group-aware metrics and non-group-aware metrics. The non-group-aware metrics are callables with this function signature:
import numpy as np
from numpy.typing import NDArray
type Float = float | np.float16 | np.float32 | np.float64
def tpr(y_true: NDArray[np.int32], y_pred: NDArray[np.int32]) -> Float: ...
In other words, a non-group-aware metric accepts two numpy arrays — one with the true labels and one with the predicted labels — and returns a single Float. The API of the non-group-aware metrics is chosen such that any metric from scikit-learn can be used — for example, accuracy.
Group-aware metrics take an additional parameter, the group labels:
def cv(
y_true: NDArray[np.int32],
y_pred: NDArray[np.int32],
*,
groups: NDArray[np.int32],
) -> Float:
A very important function is fair_forge.as_group_metric(). It takes in a non-group-aware metric, and turns it into one or more group-aware metrics. This is done by first computing the metric value per group, and these individual metric values are then aggregated in different ways — for example, by taking the minimum or the ratio of the values. Here is how one would construct a robust accuracy metric (minimum accuracy across all groups):
import fair_forge as ff
from sklearn.metrics import accuracy_score
# Construct a metric for the minimum accuracy over all groups
(robust_accuracy,) = ff.as_group_metric(
(accuracy_score,), agg=ff.MetricAgg.MIN
)
# Use it as a group-aware metric
robust_accuracy(y_true=y_true, y_pred=y_pred, groups=groups)
Methods
The group-aware vs non-group-aware distinction also exists for the methods provided in this library. The non-group-aware methods simply follow the scikit-learn API for an estimator (inheriting from BaseEstimator adds some mixin methods which are needed):
from sklearn.base import BaseEstimator
class Method(BaseEstimator):
def fit(self, X: NDArray[np.float32], y: NDArray[np.int32]) -> Self:
pass
def predict(self, X: NDArray[np.float32]) -> NDArray[np.int32]:
pass
The methods can be used like normal scikit-learn estimators.
On the other hand, we have the group-based methods, which take an additional parameter, the group labels:
from sklearn.base import BaseEstimator
class GroupMethod(BaseEstimator):
def fit(self, X: NDArray[np.float32], y: NDArray[np.int32], *, group: NDArray[np.int32]) -> Self:
pass
def predict(self, X: NDArray[np.float32]) -> NDArray[np.int32]:
pass
These methods can use the group information at training time to produce fairer models.
Besides methods which output a machine learning model, there are also methods which transform the data. These then have a transform method instead of a predict method:
from sklearn.base import BaseEstimator
class GroupBasedTransform(BaseEstimator):
def fit(
self, X: NDArray[np.float32], y: NDArray[np.int32], *, groups: NDArray[np.int32]
) -> Self:
pass
def transform(self, X: NDArray[np.float32]) -> NDArray[np.float32]:
pass
def fit_transform(
self, X: NDArray[np.float32], y: NDArray[np.int32], *, groups: NDArray[np.int32]
) -> NDArray[np.float32]:
pass
(Unfortunately, you have to implement fit_transform manually, because otherwise it will not have the groups parameter.)
Such transformation methods can then be combined with non-group-aware methods with scikit-learn’s Pipeline:
from sklearn import config_context
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
# Pipeline will only forward the `groups` argument if we
# set `enable_metadata_routing` to `True`.
with config_context(enable_metadata_routing=True):
estimator = LinearSVC(random_state=42, max_iter=100)
transform = GroupBasedTransform(random_state=42)
# We need to explicitly request here that the transformation's
# `fit` function gets the `groups` argument.
transform.set_fit_request(groups=True)
pipeline = Pipeline([("transform", transform), ("estimator", estimator)])
# This will call `fit_and_transform` on the Transformation
pipeline.fit(train_x, train_y, groups=train_groups)
preds = pipeline.predict(test_x)
Utilities
fair-forge provides many useful components for running experiments and collecting results:
- example datasets (like Adult)
- train-test splitting
- facilities for running multiple methods and evaluating them with multiple metrics
For more information on this, see the documentation.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fair_forge-0.4.0.tar.gz.
File metadata
- Download URL: fair_forge-0.4.0.tar.gz
- Upload date:
- Size: 405.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0695ab3960f6187cd9d7e4428c7ce2f6149ae6328c6b9b0828a0cf079bc70376
|
|
| MD5 |
7b936be2e38cee784db0771a76f90c50
|
|
| BLAKE2b-256 |
0d79fc1441e2678103de3d758601261be4efb9ec3ff194f7977ac6e5e967f7a9
|
File details
Details for the file fair_forge-0.4.0-py3-none-any.whl.
File metadata
- Download URL: fair_forge-0.4.0-py3-none-any.whl
- Upload date:
- Size: 409.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbdc1a8535564ef6398e67550e0757da398913d2dd7c5e4f6de1f3260baaa579
|
|
| MD5 |
5b798b3365ab74ac139932fb21e535ed
|
|
| BLAKE2b-256 |
740af5480e72b763b02301779ca1c8addae625b943bf6007f55df7f014072ff0
|